to reflect the new license. These used slightly different spellings that
defeated my regular expressions.
We understand that people may be surprised that we're moving the header
entirely to discuss the new license. We checked this carefully with the
Foundation's lawyer and we believe this is the correct approach.
Essentially, all code in the project is now made available by the LLVM
project under our new license, so you will see that the license headers
include that license only. Some of our contributors have contributed
code under our old license, and accordingly, we have retained a copy of
our old license notice in the top-level files in each project and
repository.
llvm-svn: 351648
and also the follow-up r351315.
The new test is failing on the buildbots.
> Make sure that OMPT is enabled in runtime entry points that access internals
> of the runtime. Else, return an appropiate value indicating an error or that
> the data is not available.
>
> Patch provided by @sconvent
>
> Reviewers: jlpeyton, omalyshe, hbae, Hahnfeld, joachim.protze
>
> Reviewed By: joachim.protze
>
> Tags: #openmp, #ompt
>
> Differential Revision: https://reviews.llvm.org/D47717
llvm-svn: 351431
Add omp_pause_resource and omp_pause_resource_all API and enum, plus stub for
internal implementation. Implemented callable helper function to do local pause,
and added basic functionality for hard and soft pause.
Patch by Terry Wilmarth
Differential Revision: https://reviews.llvm.org/D55078
llvm-svn: 351372
The compiler warns about an unused variable/statement:
runtime/src/kmp_affinity.cpp:4958:18: warning: statement has no effect [-Wunused-value]
KA_TRACE(1000, ; {
^
runtime/src/kmp_debug.h:84:24: note: in definition of macro 'KA_TRACE'
__kmp_debug_printf x; \
^
Instead of the unused reference to this function, this patch now calls the function
with an empty string. The call to this function should have no effect.
Patch provided by joachim.protze
Reviewers: jlpeyton, hbae, AndreyChurbanov
Reviewed By: AndreyChurbanov
Tags: #openmp, #ompt
Differential Revision: https://reviews.llvm.org/D56775
llvm-svn: 351323
Make sure that OMPT is enabled in runtime entry points that access internals
of the runtime. Else, return an appropiate value indicating an error or that
the data is not available.
Patch provided by @sconvent
Reviewers: jlpeyton, omalyshe, hbae, Hahnfeld, joachim.protze
Reviewed By: joachim.protze
Tags: #openmp, #ompt
Differential Revision: https://reviews.llvm.org/D47717
llvm-svn: 351311
Using proc_bind clause on a nested #pragma omp parallel region
with KMP_AFFINITY set causes an assertion error. This assertion occurs because
the place-partition-var is not properly initialized in the nested master threads.
Trying to get an intuitive result with KMP_AFFINITY + proc_bind is difficult
because of how the KMP_AFFINITY gtid-to-place mapping occurs. This
patch creates an initial place list no matter what affinity mechanism is used.
For KMP_AFFINITY, the place-partition-var is initialized to all the places.
Differential Revision: https://reviews.llvm.org/D55795
llvm-svn: 351227
This change fixes the sanity issue reported in Bug 40042.
Lock function definitions for the three lock kinds were added
to disambiguate calls to the lock functions done directly and indirectly.
Bugzilla: https://bugs.llvm.org/show_bug.cgi?id=40042
Patch by Hansang Bae
Differential Revision: https://reviews.llvm.org/D56103
llvm-svn: 351224
Make __ompt_implicit_task_end a static function and remove the inline part. Remove
pId variable that is unused. This fixes small regression in SPEC kdtree benchmark.
Also reformat some of __ompt_implicit_task_end.
Differential Revision: https://reviews.llvm.org/D55788
llvm-svn: 351221
The omp-tools.h file is generated from the OpenMP spec to ensure that the interface
is implemented as specified.
The other changes are necessary to update the interface implementation to the
final version as published in 5.0.
The omp-tools.h header was previously called ompt.h, currently a copy under this name
is installed for legacy tools.
Patch partially perpared by @sconvent
Reviewers: AndreyChurbanov, hbae, Hahnfeld
Reviewed By: hbae
Tags: #openmp, #ompt
Differential Revision: https://reviews.llvm.org/D55579
llvm-svn: 351197
Summary:
Two things:
1. Those two variables had the wrong sigdness, which was resulting in "sign mismatch in comparison" warning.
2. The whole `kmp_debugger.cpp` wasn't being built, or rather, it was being built as-if `USE_DEBUGGER` was off,
thus, nothing provided the definition of `__kmp_omp_debug_struct_info`, `__kmp_debugging`.
Makes sense, because `USE_DEBUGGER` is set in `kmp_config.h`, which is not included explicitly.
It is included by `kmp.h`, but that one is only included inside of the `#if USE_DEBUGGER` block..
I *think* this is the only source file with this issue,
everything else seem to `#include` either `kmp.h` or `kmp_config.h`.
The alternative solution would be to add `add_compile_options(-include kmp_config.h)` in CMake.
I did verify that `__kmp_omp_debug_struct_info` becomes available with this patch.
Fixes [[ https://bugs.llvm.org/show_bug.cgi?id=38612 | PR38612 ]].
Reviewers: AndreyChurbanov, jlpeyton, Hahnfeld
Reviewed By: jlpeyton
Subscribers: guansong, jfb, openmp-commits
Tags: #openmp
Differential Revision: https://reviews.llvm.org/D55783
llvm-svn: 351019
Add omp_get_device_num() function for 5.0 which returns the number of the
device the current thread is running on. Currently, we are leaving it to the
compiler to handle this properly if it is called inside target.
Also, did some cleanup and updating of duplicate device API functions (in both
libomp and libomptarget) to make them into weak functions that check for the
symbol from libomptarget, and will call the version in libomptarget if it is
present. If any additional device API functions are implemented also in
libomptarget in the future, we should add the dlsym calls to the host functions.
Also, if the omp_target_* functions are to be implemented for the host (this has
been requested), they should attempt to call the libomptarget versions as well.
Patch by Terry Wilmarth
Differential Revision: https://reviews.llvm.org/D55578
llvm-svn: 350352
This patch updates the implementation of the ompt_frame_t, ompt_wait_id_t
and ompt_state_t. The final version of the OpenMP 5.0 spec added the "t"
for these types.
Furthermore the structure for ompt_frame_t changed and allows to specify
that the reenter frame belongs to the runtime.
Patch partially prepared by Simon Convent
Reviewers: hbae
llvm-svn: 349458
Summary:
I have discovered this because i wanted to experiment with
building static libomp (with openmp-4.0 support only)
for debugging purposes.
There are three kinds of problems here:
1. `__kmp_compare_and_store_acq()` simply does not exist.
It was added in D47903 by @jlpeyton.
I'm guessing `__kmp_atomic_compare_store_acq()` was meant.
2. In `__kmp_is_ticket_lock_initialized()`,
`lck->lk.initialized` is `std::atomic<bool>`,
while `lck` is `kmp_ticket_lock_t *`.
Naturally, they can't be equality-compared.
Either, it should return the value read from `lck->lk.initialized`,
or do what `__kmp_is_queuing_lock_initialized()` does,
compare the passed pointer with the field in the struct
pointed by the pointer. I think the latter is correct-er choice here.
3. Tests were not versioned.
They assume that `LIBOMP_OMP_VERSION` is at the latest version.
This does not touch LIBOMP_OMP_VERSION=30. That is still broken.
Reviewers: jlpeyton, Hahnfeld, AndreyChurbanov
Reviewed By: AndreyChurbanov
Subscribers: guansong, jfb, openmp-commits, jlpeyton
Tags: #openmp
Differential Revision: https://reviews.llvm.org/D55496
llvm-svn: 349260
The value returned by __kmp_now_nsec() can overflow 32-bit values causing
incorrect values to be returned. The overflow can end up causing a divide
by zero error because in __kmp_initialize_system_tick(), the value
(__kmp_now_nsec() - nsec) can end up being much larger than the numerator:
1e6 * (delay + (now - goal))
during a pathological timing where the current time calculated is much larger
than nsec. When this happens, the value of __kmp_ticks_per_msec is set to zero
which is then used as the denominator in the KMP_NOW_MSEC() macro leading to
the divide by zero error.
Differential Revision: https://reviews.llvm.org/D55300
llvm-svn: 349090
This patch adds the affinity format functionality introduced in OpenMP 5.0.
This patch adds: Two new environment variables:
OMP_DISPLAY_AFFINITY=TRUE|FALSE
OMP_AFFINITY_FORMAT=<string>
and Four new API:
1) omp_set_affinity_format()
2) omp_get_affinity_format()
3) omp_display_affinity()
4) omp_capture_affinity()
The affinity format functionality has two ICV's associated with it:
affinity-display-var (bool) and affinity-format-var (string).
The affinity-display-var enables/disables the functionality through the
envirable OMP_DISPLAY_AFFINITY. The affinity-format-var is a formatted
string with the special field types beginning with a '%' character
similar to printf
For example, the affinity-format-var could be:
"OMP: host:%H pid:%P OStid:%i num_threads:%N thread_num:%n affinity:{%A}"
The affinity-format-var is displayed by every thread implicitly at the beginning
of a parallel region when any thread's affinity has changed (including a brand
new thread being spawned), or explicitly using the omp_display_affinity() API.
The omp_capture_affinity() function can capture the affinity-format-var in a
char buffer. And omp_set|get_affinity_format() allow the user to set|get the
affinity-format-var explicitly at runtime. omp_capture_affinity() and
omp_get_affinity_format() both return the number of characters needed to hold
the entire string it tried to make (not including NULL character). If not
enough buffer space is available,
both these functions truncate their output.
Differential Revision: https://reviews.llvm.org/D55148
llvm-svn: 349089
Disable KMP_HAVE_QUAD when building via gcc on NetBSD system,
as the build fails due to unimplemented builtins:
.../kmp_atomic.cpp.o: In function `__kmpc_atomic_cmplx16_mul':
.../kmp_atomic.cpp:1332: undefined reference to `__multc3'
.../kmp_atomic.cpp.o: In function `__kmpc_atomic_cmplx16_div':
.../kmp_atomic.cpp:1334: undefined reference to `__divtc3'
...
Differential Revision: https://reviews.llvm.org/D55478
llvm-svn: 348886
Switch NetBSD from reading /proc (which is broken) to getloadavg()
(which is already used by Darwin). NetBSD discourages using procfs
in favor of system API calls.
Differential Revision: https://reviews.llvm.org/D55486
llvm-svn: 348885
Summary:
Use the sysctl(3) function to check whether an address is mapped
into the address space.
Reviewers: mgorny, joerg, #openmp
Reviewed By: mgorny
Subscribers: openmp-commits
Tags: #openmp
Differential Revision: https://reviews.llvm.org/D55549
llvm-svn: 348874
Summary: _lwp_self() returns current Thread Id in a numeric version on NetBSD.
Reviewers: joerg, mgorny, #openmp
Reviewed By: mgorny
Subscribers: llvm-commits, openmp-commits, #openmp
Tags: #openmp
Differential Revision: https://reviews.llvm.org/D55497
llvm-svn: 348873
Fix two build issues:
1) Recent commit 348756 accidentally included Unix clang compilers
to use immintrin.h when only clang-cl should be using it leading
to the following error:
openmp-llvm/runtime/src/kmp_lock.cpp:2035:25: error: always_
inline function '_xbegin' requires target feature 'rtm', but would be inlined into function
'__kmp_test_adaptive_lock_only' that is compiled without support for 'rtm'
kmp_uint32 status = _xbegin();
This patch changes the guard to use immintrin.h to only use clang-cl instead of all clang
2) gcc-8 gives a warning about multiline comment in kmp_runtime.cpp:
This patch just changes it to a two line comment
openmp-llvm/runtime/src/kmp_runtime.cpp:7697:8: warning: multi-line comment [-Wcomment]
#endif // KMP_OS_LINUX || KMP_OS_DRAGONFLY || KMP_OS_FREEBSD || KMP_OS_NETBSD \
llvm-svn: 348783
Summary: This patch permits OpenMP to build and work (with both gcc and clang) on OpenBSD. It mostly follows what was done for FreeBSD and NetBSD, except OpenBSD does not have pthread_getattr_np support, so it follows OS X in that one instance.
Reviewers: #openmp, krytarowski
Reviewed By: krytarowski
Subscribers: guansong, jfb, emaste, mgorny, krytarowski, #openmp
Tags: #openmp
Differential Revision: https://reviews.llvm.org/D34280
llvm-svn: 348726
Summary:
Additions mostly follow FreeBSD and NetBSD and are not intrusive.
There is similar patch for OpenBSD: https://reviews.llvm.org/D34280
The -lm was being omitted due to -Wl,--as-needed in cmake rule, similar patch is in freebsd-ports/devel/llvm-devel port.
Simple OpenMP programs compile and work as expected:
$ clang-devel ~/omp_hello.c -fopenmp -I/usr/local/llvm-devel/include
$ LD_LIBRARY_PATH=/usr/local/llvm-devel/lib OMP_NUM_THREADS=100 ./a.out
The assertion in LLVMgold.so when -fopenmp was used together with -flto in 20170524 snapshot is no longer triggered on current svn-trunk and works fine as in llvm-4.0 with our local patches.
Reviewers: #openmp, krytarowski
Reviewed By: krytarowski
Subscribers: dexonsmith, jfb, krytarowski, guansong, gregrodgers, emaste, mgorny, mehdi_amini
Differential Revision: https://reviews.llvm.org/D35129
llvm-svn: 348725
There is a conflict between libomptarget and libomp concerning some of the
standard OpenMP device API which needs further intestigation.
llvm-svn: 347932
This patch adds __kmpc_omp_reg_task_with_affinity to register affinity
information for tasks. For now, the affinity information is not used,
and the function always succeeds. This also adds the kmp_task_affinity_info_t
structure to store the task affinity information.
Patch by Terry Wilmarth
Differential Revision: https://reviews.llvm.org/D55026
llvm-svn: 347907
This change renames ompt_mutex_impl_unknown to ompt_mutex_impl_none,
following the name change in the specification.
Patch by Hansang Bae
Differential Revision: https://reviews.llvm.org/D54347
llvm-svn: 347802
* Fix calculation of string length.
* Remove NULL-check of pointer which has been dereferenced.
Patch by Andrey Churbanov
Differential Revision: https://reviews.llvm.org/D54948
llvm-svn: 347801
There is low probability that array th_hot_teams can be
accessed out of bound (when many nested levels are requested
to keep hot teams via KMP_HOT_TEAMS_MAX_LEVEL). The patch
adds the check of index that fixes the problem.
Patch by Andrey Churbanov
Differential Revision: https://reviews.llvm.org/D54950
llvm-svn: 347800
Add omp_get_device_num() function for 5.0 which returns the number of the device
the current thread is running on. Also, did some cleanup and updating of device
API functions to make them into weak functions that should be replaced with
libomptarget functions when libomptarget is present.
Patch by Terry Wilmarth
Differential Revision: https://reviews.llvm.org/D54342
llvm-svn: 347799
Initializing an ompt_data_t object using the pointer union member is potentially
unsafe in 32-bit programs. This change fixes the issue
by using the constant, ompt_data_none.
Patch by Hansang Bae
Differential Revision: https://reviews.llvm.org/D52046
llvm-svn: 343785
On Windows, child workers are terminated by the parent during the normal
program exit process (ExitProcess()) and they are not able to finish generating
their OpenMP events. We can force manual library shut down in __kmpc_end() to
fix this at least for the cases where __kmpc_end() is properly inserted.
Patch by Hansang Bae
Differential Revision: https://reviews.llvm.org/D52628
llvm-svn: 343619
Patch suggested by Kelvin Li: removed optional "kind=" part of kind-selector
for variables with long names and kind names.
Differential Revision: https://reviews.llvm.org/D52712
llvm-svn: 343475
This patch puts the __kmpc_critical_with_hint function in dllexports
and also replaces some OMP_45_ENABLED to OMP_50_ENABLED
Differential Revision: https://reviews.llvm.org/D52380
llvm-svn: 343143
Balanced affinity only updated the thread's affinity with the operating system.
This change also has the thread's private mask reflect that change as well so
that any API that probes the thread's affinity mask will report the correct
mask value.
Differential Revision: https://reviews.llvm.org/D52379
llvm-svn: 343142
This patch updates the ittnotify sources to the latest
corresponding with Intel(R) VTune(TM) Amplifier 2018
Differential Revision: https://reviews.llvm.org/D52378
llvm-svn: 343139
This change improves the performance of 376.kdtree by giving the compiler an
opportunity to do inlining and other optimizations for the call path,
__kmpc_omp_task_complete_if0()->__kmp_task_finish(), which is one of the hot
paths in the program; some functions in kmp_taskdeps.cpp were moved to the new
header file, kmp_taskdeps.h to achieve this.
Patch by Hansang Bae
Differential Revision: https://reviews.llvm.org/D51889
llvm-svn: 343138
This change includes miscellaneous improvements as follows:
1) Added ompt_get_proc_id() implementation for Windows
2) Added parser and print tool for omp-tool-var, just in case it needs
to be printed (OMP_DISPLAY_ENV)
3) omp_control_tool is exported on Windows
Patch by Hansang Bae
Differential Revision: https://reviews.llvm.org/D50538
llvm-svn: 343137
Some types and callback signatures have changed from TR6 to TR7.
Major changes (only adding signatures and stubs):
(-remove idle callback) done by D48362
-add reduction and dispatch callback
-add get_task_memory and finalize_tool runtime entry points
-ompt_invoker_t becomes ompt_parallel_flag_t
-more types of sync_regions
Patch provided by Simon Convent
Reviewers: hbae, protze.joachim
Differential Revision: https://reviews.llvm.org/D50774
llvm-svn: 341834
Add atomic hint flags to the enum.
The hint parameter type was changed to uint32_t in __kmpc_critical_with_hint()
Patch by Olga Malysheva
Differential Revision: https://reviews.llvm.org/D51235
llvm-svn: 341694
ident flags reserved for atomic hints.
This patch adds omp_sync_hint_t to omp.h and omp_sync_hint_kind to omp_lib.h.
For better maintainability the list of macros for ident flags was replaced with
a enum. The new KMP_IDENT_ATOMIC_HINT_MASK was added to the enum to
support possible future atomic hints.
Also fix omp_lib.h.var to be under 72 chars again after 5.0 OpenMP Memory commit
Patch by Olga Malysheva
Differential Revision: https://reviews.llvm.org/D51233
llvm-svn: 341693
Implemented omp_alloc, omp_free, omp_{set,get}_default_allocator entries,
and OMP_ALLOCATOR environment variable.
Added support for HBW memory on Linux if libmemkind.so library is accessible
(dynamic library only, no support for static libraries).
Only used stable API (hbwmalloc) of the memkind library
though we may consider using experimental API in future.
The ICV def-allocator-var is implemented per implicit task similar to
place-partition-var. In the absence of a requested allocator, the uses the
default allocator.
Predefined allocators (the only ones currently available) are made similar
for C and Fortran, - pointers (long integers) with values 1 to 8.
Patch by Andrey Churbanov
Differential Revision: https://reviews.llvm.org/D51232
llvm-svn: 341687
The __kmp_execute_tasks_template() function reads the task_team and
current_task from the thread structure. There appears to be a pathological
timing where the number of threads in the hot team decreases and so a
thread is put in the pool via __kmp_free_thread(). It could be the case that:
1) A thread reads th_task_team into task_team local variables
and is then interrupted by the OS
2) Master frees the thread and sets current task and task team to NULL
3) The thread reads current_task as NULL
When this happens, current_task is dereferenced and a segfault occurs.
This patch just checks for current_task to not be NULL as well.
Differential Revision: https://reviews.llvm.org/D50651
llvm-svn: 340632
If hot teams are not being used, this code could seg fault without the added
check, and does so when composability is used in conjunction with nesting.
The fix prevents the segfault.
Differential Revision: https://reviews.llvm.org/D50649
llvm-svn: 340629
Exclude nested explicit tasks from timing, only outer level explicit task
counted and its time added to barrier arrive time for the thread.
Differential Revision: https://reviews.llvm.org/D50584
llvm-svn: 340628
The idle callback was removed from the spec as of TR7.
This removes it from the implementation.
Patch provided by Simon Convent
Reviewers: hbae, protze.joachim
Differential Revision: https://reviews.llvm.org/D48362
llvm-svn: 339771
This change fixes an incorrect behavior of the omp_control_tool function when
called from Fortran applications. A tool callback function for this event is
supposed to get NULL for the third argument according to the specification, but
the current implementation just passes a garbage value. A possible fix is to use
the OPTIONAL attribute for the third argument.
Patch by Hansang Bae
Differential Revision: https://reviews.llvm.org/D50565
llvm-svn: 339585
This patch cleans up unused functions, variables, sign compare issues, and
addresses some -Warning flags which are now enabled including -Wcast-qual.
Not all the warning flags in LibompHandleFlags.cmake are enabled, but some
are with this patch.
Some __kmp_gtid_from_* macros in kmp.h are switched to static inline functions
which allows us to remove the awkward definition of KMP_DEBUG_ASSERT() and
KMP_ASSERT() macros which used the comma operator. This had to be done for the
innumerable -Wunused-value warnings related to KMP_DEBUG_ASSERT()
Differential Revision: https://reviews.llvm.org/D49105
llvm-svn: 339393
From the bug report, the runtime needs to initialize the nproc variables
(inside middle init) for each root when the task is encountered, otherwise,
a segfault can occur.
Bugzilla: https://bugs.llvm.org/show_bug.cgi?id=36720
Differential Revision: https://reviews.llvm.org/D49996
llvm-svn: 338313
Summary:
When OMPT is not supported the __kmp_omp_task() function is passed the parameters in the wrong order. This is a fix related to patch D47709.
Reviewers: Hahnfeld, sconvent, caomhin, jlpeyton
Reviewed By: Hahnfeld
Subscribers: guansong, openmp-commits
Differential Revision: https://reviews.llvm.org/D50001
llvm-svn: 338295
This change introduces GOMP doacross compatibility. There are 12 new interface
functions 6 for long type and 6 for unsigned long long type:
GOMP_doacross_post, GOMP_doacross_wait, GOMP_loop_doacross_[schedule]_start
where schedule can be static, dynamic, guided, or runtime.
These functions just translate the parameters if necessary and send them
to the corresponding kmp function.
E.g., GOMP_doacross_post() -> __kmpc_doacross_post()
For the GOMP_doacross_post function, there is template specialization to
account for when long is a four byte vs an eight byte type. If it is a
four byte type, then a temporary array has to be created to convert the
four byte integers into eight byte integers and then sending that into
__kmpc_doacross_post(). Because GOMP_doacross_wait uses varargs, it
always needs a temporary array and does not need template specialization.
Differential Revision: https://reviews.llvm.org/D49857
llvm-svn: 338280
This change fixes build errors when building a runtime with adaptive lock stats
enabled. Most of the errors were due to the recent changes in the runtime, but
it seems that we have not tried to build this debug runtime on Windows for a
long time.
Patch by Hansang Bae
Differential Revision: https://reviews.llvm.org/D49823
llvm-svn: 338277
1) Remove unnecessary data from list node structure
2) Remove timerPair in favor of pushing/popping explicitTimers.
This way, nested timers will work properly.
3) Fix #pragma omp critical timers
4) Add histogram capability
5) Add KMP_STATS_FILE formatting capability
6) Have time partitioned into serial & parallel by introducing
partitionedTimers::exchange(). This also counts the number of serial regions
in the executable.
7) Fix up the timers around OMP loops so that scheduling overhead and work are
both counted correctly.
8) Fix up the iterations statistics so they count the number of iterations the
thread receives at each loop scheduling event
9) Change timers so there is only one RDTSC read per event change
10) Fix up the outdated comments for the timers
Differential Revision: https://reviews.llvm.org/D49699
llvm-svn: 338276
Fix the order of callbacks related to the taskloop construct.
Add the iteration_count to work callbacks (according to the spec).
Use kmpc_omp_task() instead of kmp_omp_task() to include OMPT callbacks.
Add a testcase.
Patch by Simon Convent
Reviewed by: protze.joachim, hbae
Subscribers: openmp-commits
Differential Revision: https://reviews.llvm.org/D47709
llvm-svn: 338146
The ompt/tasks/task_types.c testcase did not test untied tasks properly. Now,
frame addresses are tested and two scheduling points are added at which the
task can switch to another thread. Due to scheduling effects, the frame address
could be NULL.
This needed a restructure of the way OMPT callbacks are called.
__ompt_task_finish() now as an extra parameter, whether a task is completed.
Its invocation has been moved into __kmp_task_finish(). Thus, the order of the
writes to the frame addresses is not subject to scheduling effects anymore.
Patch by Simon Convent
Reviewed by: protze.joachim, hbae
Subscribers: openmp-commits
Differential Revision: https://reviews.llvm.org/D49181
llvm-svn: 338145
This function was not enabled by default and not exported when manually
tweaking the build flags. Additionally it was hard to use since there
is no corresponding __kmp_ft_page_free().
The code itself is questionable because the returned memory address
is padded by an extra pointer which stores the unpadded start of the
allocated region (this would need to be freed).
Differential Revision: https://reviews.llvm.org/D49802
llvm-svn: 338052
This change fixes possibly invalid access to the internal data structure during
library shutdown. In a heavily oversubscribed situation, the library shutdown
sequence can reach the point where resources are deallocated while there still
exist threads in their final spinning loop. The added loop in
__kmp_internal_end() checks if there are such busy-waiting threads and blocks
the shutdown sequence if that is the case. Two versions of kmp_wait_template()
are now used to minimize performance impact.
Patch by Hansang Bae
Differential Revision: https://reviews.llvm.org/D49452
llvm-svn: 337486
This patch introduces the logic implementing hierarchical scheduling.
First and foremost, hierarchical scheduling is off by default
To enable, use -DLIBOMP_USE_HIER_SCHED=On during CMake's configure stage.
This work is based off if the IWOMP paper:
"Workstealing and Nested Parallelism in SMP Systems"
Hierarchical scheduling is the layering of OpenMP schedules for different layers
of the memory hierarchy. One can have multiple layers between the threads and
the global iterations space. The threads will go up the hierarchy to grab
iterations, using possibly a different schedule & chunk for each layer.
[ Global iteration space (0-999) ]
(use static)
[ L1 | L1 | L1 | L1 ]
(use dynamic,1)
[ T0 T1 | T2 T3 | T4 T5 | T6 T7 ]
In the example shown above, there are 8 threads and 4 L1 caches begin targeted.
If the topology indicates that there are two threads per core, then two
consecutive threads will share the data of one L1 cache unit. This example
would have the iteration space (0-999) split statically across the four L1
caches (so the first L1 would get (0-249), the second would get (250-499), etc).
Then the threads will use a dynamic,1 schedule to grab iterations from the L1
cache units. There are currently four supported layers: L1, L2, L3, NUMA
OMP_SCHEDULE can now read a hierarchical schedule with this syntax:
OMP_SCHEDULE='EXPERIMENTAL LAYER,SCHED[,CHUNK][:LAYER,SCHED[,CHUNK]...]:SCHED,CHUNK
And OMP_SCHEDULE can still read the normal SCHED,CHUNK syntax from before
I've kept most of the hierarchical scheduling logic inside kmp_dispatch_hier.h
to try to keep it separate from the rest of the code.
Differential Revision: https://reviews.llvm.org/D47962
llvm-svn: 336571
This patch reorganizes the loop scheduling code in order to allow hierarchical
scheduling to use it more effectively. In particular, the goal of this patch
is to separate the algorithmic parts of the scheduling from the thread
logistics code.
Moves declarations & structures to kmp_dispatch.h for easier access in
other files. Extracts the algorithmic part of __kmp_dispatch_init() and
__kmp_dispatch_next() into __kmp_dispatch_init_algorithm() and
__kmp_dispatch_next_algorithm(). The thread bookkeeping logic is still kept in
__kmp_dispatch_init() and __kmp_dispatch_next(). This is done because the
hierarchical scheduler needs to access the scheduling logic without the
bookkeeping logic. To prepare for new pointer in dispatch_private_info_t, a
new flags variable is created which stores the ordered and nomerge flags instead
of them being in two separate variables. This will keep the
dispatch_private_info_t structure the same size.
Differential Revision: https://reviews.llvm.org/D47961
llvm-svn: 336568
These are preliminary changes that attempt to use C++11 Atomics in the runtime.
We are expecting better portability with this change across architectures/OSes.
Here is the summary of the changes.
Most variables that need synchronization operation were converted to generic
atomic variables (std::atomic<T>). Variables that are updated with combined CAS
are packed into a single atomic variable, and partial read/write is done
through unpacking/packing
Patch by Hansang Bae
Differential Revision: https://reviews.llvm.org/D47903
llvm-svn: 336563
The current implementation always provides the thread-num for the current
parallel region. This patch fixes the behavior for ancestor levels >0.
Differential Revision: https://reviews.llvm.org/D46533
llvm-svn: 336085
Introduce OPENMP_INSTALL_LIBDIR and use in all install() commands.
This also fixes installation of libomptarget-nvptx that previously
didn't honor {OPENMP,LLVM}_LIBDIR_SUFFIX.
Differential Revision: https://reviews.llvm.org/D47130
llvm-svn: 333284
implicit_task_end callbacks in nested parallel regions did not always give the
correct thread_num, since the inner parallel region may have already been
finalized.
Now, the thread_num is stored at the beginning of the implicit task and
retrieved at the end, whenever necessary.
A testcase was added as well.
Differential Revision: https://reviews.llvm.org/D46260
llvm-svn: 331632
Currently, the affinity API reports garbage for the initial place list and any
thread's place lists when using KMP_AFFINITY=none|compact|scatter.
This patch does two things:
for KMP_AFFINITY=none, Creates a one entry table for the places, this way, the
initial place list is just a single place with all the proc ids in it. We also
set the initial place of any thread to 0 instead of KMP_PLACE_ALL so that the
thread reports that single place (place 0) instead of garbage (-1) when using
the affinity API.
When non-OMP_PROC_BIND affinity is used
(including KMP_AFFINITY=compact|scatter), a thread's place list is populated
correctly. We assume that each thread is assigned to a single place. This is
implemented in two of the affinity API functions
Differential Revision: https://reviews.llvm.org/D45527
llvm-svn: 330283
This patch introduces GOMP_taskloop to our API. It adds GOMP_4.5 to our
version symbols. Being a wrapper around __kmpc_taskloop, the function
creates a task with the loop bounds properly nested in the shareds so that
the GOMP task thunk will work properly. Also, the firstprivate copy constructors
are properly handled using the __kmp_gomp_task_dup() auxiliary function.
Currently, only linear spawning of tasks is supported
for the GOMP_taskloop interface.
Differential Revision: https://reviews.llvm.org/D45327
llvm-svn: 330282
This change removes the unnecessary lock operation on __kmp_initz_lock inside
the __kmp_atfork_child() function for Linux; the lock variable is initialized
in the same function later.
Patch by Hansang Bae
Differential Revision: https://reviews.llvm.org/D44949
llvm-svn: 328900
Added settings code to read OMP_TARGET_OFFLOAD environment variable. Added
target-offload-var ICV as __kmp_target_offload, set via OMP_TARGET_OFFLOAD,
if available, otherwise defaulting to DEFAULT. Valid values for the ICV are
specified as enum values {0,1,2} for disabled, default, and mandatory. An
internal API access function __kmpc_get_target_offload is provided.
Patch by Terry Wilmarth
Differential Revision: https://reviews.llvm.org/D44577
llvm-svn: 328046
The thread_num parameter of ompt_get_task_info() was not being used previously,
but need to be set.
The print_task_type() function (form the task-types.c testcase) was merged into
the print_ids() function (in callback.h). Testing of ompt_get_task_info() was
added to the task-types.c testcase. It was not tested extensively previously.
Differential Revision: https://reviews.llvm.org/D42472
llvm-svn: 326338
This is required to be NULL for implicit barriers at the end of a
parallel region. Noticed in review of D43191.
Differential Revision: https://reviews.llvm.org/D43308
llvm-svn: 325922
Only use ompt_ functions when testing OMPT in api_calls testcase.
Add size parameter to print_list.
Fix small bug in implementation of ompt_get_partition_place_nums(): return correct length.
Differential Revision: https://reviews.llvm.org/D42162
llvm-svn: 325422
If tool initialization returns 0, OMPT should not be active. The current
implementation provided some callback invocations in this case.
Differential Revision: https://reviews.llvm.org/D42709
llvm-svn: 324320
The defintion is not part of the spec and thus should not have the prefix
"ompt_" but rather a prefix that indicates that this is implementation
specific.
Differential Revision: https://reviews.llvm.org/D41166
llvm-svn: 322621
When the current thread is not an (initialized) OpenMP thread, the runtime
entry points return values that correspond to "not available" or similar
Differential Revision: https://reviews.llvm.org/D41167
llvm-svn: 322620
If user requested affinity with granularity=tile we need to either use HWLOC
or ignore the request. The change allows user to not specify
KMP_TOPOLOGY_METHOD=hwloc and choose it automatically instead.
Patch by Andrey Churbanov
Differential Revision: https://reviews.llvm.org/D40905
llvm-svn: 322205
This change simplifies __kmp_expand_threads to take a single argument.
Previously, it allowed two arguments and had logic to decide on different
potential expansion sizes. However, no calls to __kmp_expand_threads in the
runtime make use of this extra logic. Thus the extra argument and logic is
removed here.
Patch by Terry Wilmarth
Differential Revision: https://reviews.llvm.org/D41836
llvm-svn: 322204
This change improves stability of the runtime when the application forks child
processes. Acquiring/releasing __kmp_initz_lock and __kmp_forkjoin_lock in the
atfork handlers insures that the actual fork does not occur while those two
locks are held, and __kmp_itt_reset() reverts the itt's global state to the
initial state which also initializes the mutex stored in the global state.
Some missing initialization code was also inserted in the child's atfork handler.
Patch by Hansang Bae
Differential Revision: https://reviews.llvm.org/D41462
llvm-svn: 322202
This field is defined as kmp_int32, so we should use neither
pointers to kmp_int64 nor 64 bit atomic instructions.
(Found while testing on a Raspberry Pi, 32 bit ARM)
Differential Revision: https://reviews.llvm.org/D41656
llvm-svn: 321964
As for normal task creation, the task frame addresses need to be stored
for the encountering task.
Differential Revision: https://reviews.llvm.org/D41165
llvm-svn: 321421
The format string for hints only prints the second argument (string) and drops
the first argument (hint id). Depending on how you read the POSIX text for
printf, this could be valid. But for practical reason, i.e., unpacking the
va_list passed to printf based on the formating information, it makes sense
to fix the implementation and not pass the id for hint.
Failing testcases were:
misc_bugs/teams-reduction.c
ompt/parallel/not_enough_threads.c
Differential Revision: https://reviews.llvm.org/D41504
llvm-svn: 321361
This function is defined in OpenMP-TR6 section 4.1.5.1.6
The functions was not implemented yet.
Since ompt-functions can only be called after the runtime was initialized and
has loaded a tool, it can assume the runtime to be initialized. In contrast
to omp_get_num_procs which needs to check whether the runtime is initialized.
Differential Revision: https://reviews.llvm.org/D40949
llvm-svn: 321269
This revision fixes failing testcases with parallel for loops and the gomp
interface. The return address needs to be stored at entry to runtime.
The storage is cleared on usage, so we need to update the storage before
calling again internal functions, that will trigger event callbacks.
Differential Revision: https://reviews.llvm.org/D41181
llvm-svn: 321265
We use the bitmap ompt_enabled thoughout the runtime, to avoid loading the
vector of callback functions when testing if specific code should be executed.
Before invoking an event callback function, the pointer is tested for NULL.
This revision resets the corresponding bit in ompt_enabled to 0 if
NULL is passed in set_callback.
Differential Revision: https://reviews.llvm.org/D41171
llvm-svn: 321264
There are two /proc/cpuinfo layots in use for AArch64: old and new.
The old one has all 'processor : n' lines in one section, hence
checking for duplications does not make sense.
Differential Revision: https://reviews.llvm.org/D41000
llvm-svn: 320593
All architectures except x86_64 used the linear barrier implementation
by default which doesn't give good performance for a larger number
of threads.
Improvements for PARALLEL overhead (EPCC) with this patch on a Power8
system (2 sockets x 10 cores x 8 threads, OMP_PLACES=cores)
20 threads: 4.55us -> 3.49us
40 threads: 8.84us -> 4.06us
80 threads: 19.18us -> 4.74us
160 threads: 54.22us -> 6.73us
Differential Revision: https://reviews.llvm.org/D40358
llvm-svn: 320152
To make thread affinity work according to the OpenMP spec, the
runtime needs information about the hardware topology. On Linux
the default way is to parse /proc/cpuinfo which contains this
information for x86 machines but (at least) not for AArch64 and
Power architectures.
Fortunately, there is a different code path which is able to get
that data from sysfs. The needed patch has landed in 2006 for
Linux 2.6.16 which is safe to assume nowadays (even RHEL 5 had
a kernel version derived from 2.6.18, and we are now at RHEL 7!).
Differential Revision: https://reviews.llvm.org/D40357
llvm-svn: 320151
Otherwise I see hangs in the omp_single_copyprivate test when
compiling in release mode. With the debug assertions, I get a
failure `head > 0 && tail > 0`.
Differential Revision: https://reviews.llvm.org/D40722
llvm-svn: 320150
Redundant extra verbose output of binding to full mask in case
affinity=balanced or OMP_PLACES=<any> or OMP_PROC_BIND=<any>
Differential Revision: https://reviews.llvm.org/D40624
llvm-svn: 319960
This change is a trivial fix for enums that removes specification of "last" or
"upper" values, or other boundary values. This simplifies the code in places,
and results in never needing to update the "upper" values again.
Patch by Terry Wilmarth
Differential Revision: https://reviews.llvm.org/D40804
llvm-svn: 319957
__kmpc_reduce_nowait() correctly swapped the teams for reductions
in a teams construct. Apply the same logic to __kmpc_reduce() and
__kmpc_reduce_end().
Differential Revision: https://reviews.llvm.org/D40753
llvm-svn: 319788
This change makes kmp_r_sched_t type into a union for simpler
comparisons and assignments
Patch by Terry Wilmarth
Differential Revision: https://reviews.llvm.org/D40374
llvm-svn: 319379
kmp_aligned_malloc() always returned NULL on Windows (stub library only)
that may cause Fortran application crash. With this change all memory
allocation functions were fixed to use aligned{m,re,rec}alloc() to
allocate/reallocate memory. To deallocate that memory _aligned_free() is
used in kmp_free().
Patch by Olga Malysheva
Differential Revision: https://reviews.llvm.org/D40296
llvm-svn: 319375
Added two warnings:
1) Before building the topology map check if tiles are requested but the
topo method is not hwloc;
2) After building the topology map check if tiles are requested but not
detected by the library.
Patch by Olga Malysheva
Differential Revision: https://reviews.llvm.org/D40340
llvm-svn: 319374
Fortran array elements made default integer in OMP_GET_PLACE_PROC_IDS and
OMP_GET_PARTITION_PLACE_NUMS subroutines, otherwise call to them produces
incorrect result.
Patch by Olga Malysheva
Differential Revision: https://reviews.llvm.org/D40356
llvm-svn: 319372
These are needed by both libraries, so we can do that in a
common namespace and unify configuration parameters.
Also make sure that the user isn't requesting libomptarget
if the library cannot be built on the system. Issue an error
in that case.
Differential Revision: https://reviews.llvm.org/D40081
llvm-svn: 319342
As a first step, this allows us to generalize the detection of
standalone builds and make it fully compatible when building in
llvm/runtimes/ which automatically sets OPENMP_STANDLONE_BUILD.
Differential Revision: https://reviews.llvm.org/D40080
llvm-svn: 319341
Power has a weak consistency model so we need memory barriers to
make writes (both from runtime and from user code) available for
all threads.
Differential Revision: https://reviews.llvm.org/D40175
llvm-svn: 318848
Traditionally, the library had a weak symbol for ompt_start_tool()
that served as fallback and disabled OMPT if called. Tools could
provide their own version and replace the default implementation
to register callbacks and lookup functions. This mechanism has
worked reasonably well on Linux systems where this interface was
initially developed.
On Darwin / Mac OS X the situation is a bit more complicated and
the weak symbol doesn't work out-of-the-box. In my tests, the
library with the tool needed to link against the OpenMP runtime
to make the process work. This would effectively mean that a tool
needed to choose a runtime library whereas one design goal of the
interface was to allow tools that are agnostic of the runtime.
The solution is to use dlsym() with the argument RTLD_DEFAULT so
that static implementations of ompt_start_tool() are found in the
main executable. This works because the linker on Mac OS X includes
all symbols of an executable in the global symbol table by default.
To use the same code path on Linux, the application would need to
be built with -Wl,--export-dynamic. To avoid this restriction, we
continue to use weak symbols on Linux systems as before.
Finally this patch extends the existing test to cover all possible
ways of initializing the tool as described by the standard. It
also fixes ompt_finalize() to not call omp_get_thread_num() when
the library is shut down which resulted in hangs on Darwin.
The changes have been tested on Linux to make sure that it passes
the current tests as well as the newly extended one.
Differential Revision: https://reviews.llvm.org/D39801
llvm-svn: 317980
For up-to-date compilers, this assertion is reasonable, but it breaks
compatibility with the typical compiler installed on most systems.
This patch changes the default value to what we had when there was no
compiler support. A warning about the outdated compiler is printed during
runtime, when this point is reached.
Differential Revision: https://reviews.llvm.org/D39890
llvm-svn: 317928
In these places the const attribute seems correct and doesn't
need any other change, so let's do it.
Differential Revision: https://reviews.llvm.org/D39756
llvm-svn: 317798
Allocated memory is typically not 'const' if it needs to be freed.
This patch removes around 50 wrong const attributes, modifies the
corresponding functions and finally gets rid of some const_casts.
These have especially been strange for __kmp_str_fname_free() that
added a 'const' to call __kmp_str_free() which removed it again.
Two minor cleanups that I performed in this process:
* __kmp_tool_libraries now lives in kmp_settings.cpp as it is
used nowhere else.
* __kmp_msg_empty was removed as it was never used and Clang
now complained that it was assigned a string literal that
is 'const char *'.
Differential Revision: https://reviews.llvm.org/D39755
llvm-svn: 317797
1) Get rid of xaliasify, xexpand and xversionify for KMP_EXPAND_NAME and
KMP_VERSION_SYMBOL. KMP_VERSION_SYMBOL is a combination of xaliasify and
xversionify.
2) Put all attribute and __declspec definitions in kmp_os.h
Differential Revision: https://reviews.llvm.org/D39516
llvm-svn: 317636
The TR6 document is expected to be publically released around November 15.
This patch does not implement OMPT for libomptarget.
Patch by Simon Convent and Joachim Protze
Differential Revision: https://reviews.llvm.org/D39182
llvm-svn: 317436
This is part of the renaming of data types from OpenMP TR4 to TR6
Patch by Simon Convent
Differential Revision: https://reviews.llvm.org/D39326
llvm-svn: 317435
The TR6 document is expected to be publically released around November 15.
This patch does not implement OMPT for libomptarget.
Patch by Simon Convent and Joachim Protze
Differential Revision: https://reviews.llvm.org/D39182
llvm-svn: 317339
This is part of the renaming of data types from OpenMP TR4 to TR6
Patch by Simon Convent
Differential Revision: https://reviews.llvm.org/D39326
llvm-svn: 317338
This is a partial fix for bug 34050.
This prevents callers of omp_set_lock (which does not hold __kmp_global_lock)
from ever seeing an uninitialized version of __kmp_i_lock_table.table.
It does not solve a use-after-free race condition if omp_set_lock obtains a
pointer to __kmp_i_lock_table.table before it is updated and then attempts to
dereference afterwards. That race is far less likely and can be handled in a
separate patch.
The unit test usually segfaults on the current trunk revision. It passes with
the patch.
Patch by Adam Azarchs
Differential Revision: https://reviews.llvm.org/D39439
llvm-svn: 317115
The code is tested to work with latest clang, GNU and Intel compiler. The implementation
is optimized for low overhead when no tool is attached shifting the cost to execution with
tool attached.
This patch does not implement OMPT for libomptarget.
Patch by Simon Convent and Joachim Protze
Differential Revision: https://reviews.llvm.org/D38185
llvm-svn: 317085
Replacing call to __kmp_msg(kmp_ms_fatal,...) with __kmp_fatal(...) caused an
issue when incomplete message is displayed in case an error message is followed
by another message (e.g. by a hint messa)ge. This is because __kmp_fatal()
passes incomplete list of arguments to __kmp_msg().
Patch by Olga Malysheva
Differential Revision: https://reviews.llvm.org/D39248
llvm-svn: 316623
The problem is due to the runtime's threadprivate cleanup code which tries to
access data that was already destroyed by one of the root threads.
__kmp_init_gtid is used as a checker here since it is set to false before actual
resource cleanup is done in __kmp_cleanup().
Patch by Hansang Bae
llvm-svn: 316452
If both KMP_HW_SUBSET and KMP_PLACE_THREADS are set and KMP_PLACE_THREADS gets
parsed first, then the current environment variable parser rejects both and
neither get used. This patch uses the rivals mechanism that is used for other
environment variable groups (e.g., KMP_STACKSIZE, GOMP_STACKSIZE, OMP_STACKSIZE).
If both are set, then it tells the user that it is ignoring KMP_PLACE_THREADS in
favor of KMP_HW_SUBSET. The message about deprecating KMP_PLACE_THREADS when it
is set is still printed regardless.
Differential Revision: https://reviews.llvm.org/D38292
llvm-svn: 315091
Removes semicolons after if {} blocks, function definitions, etc.
I was able to apply the large OMPT patch cleanly on top of this one
with no conflicts.
llvm-svn: 314340
Minor code cleanup of Klocwork issues. Fatal messages are given no return
attribute. Define and use KMP_NORETURN to work for multiple C++ versions.
Patch by Terry Wilmarth
Differential Revision: https://reviews.llvm.org/D37275
llvm-svn: 312538
Cleanup code to remove BUILD_TV and unused code bracketed by it.
Patch by Terry Wilmarth
Differential Revision: https://reviews.llvm.org/D36011
llvm-svn: 311114
This change improves the way threads are spread across cores
when OMP_PROC_BIND=spread is set and no unusual affinity masks are in use.
Differential Revision: https://reviews.llvm.org/D36510
llvm-svn: 310670
The original locations can be reached without initializing the lock variable
(td_deque_lock), so it is potentially unsafe. It is guaranteed that the lock
is initialized if the deque (td_deque) is not NULL, and lock functions can be
safely called.
Patch by Hansang Bae
Differential Revision: https://reviews.llvm.org/D36017
llvm-svn: 309875
This change adds a new environment variable, KMP_TEAMS_THREAD_LIMIT, which is
used to set a new global variable, __kmp_teams_max_nth, which is checked when
determining the size and quantity of teams that will be created in the teams
construct. Specifically, it is a limit on the total number of threads in a given
teams construct. It differentiates the limits for the teams construct from the
limits for regular parallel regions (KMP_DEVICE_THREAD_LIMIT/__kmp_max_nth and
OMP_THREAD_LIMIT/__kmp_cg_max_nth). When each individual team is formed, it is
still subject to those limits. After the clauses to the teams construct are
parsed and calculated, we check to make sure we are within this limit, and if
not, reduce num_threads per team and/or number of teams, accordingly. The
default value is set to the number of available processors on the system.
Patch by Terry Wilmarth
Differential Revision: https://reviews.llvm.org/D36009
llvm-svn: 309874
This change fixes the implementation of OMP_THREAD_LIMIT. The implementation of
this previously was not restricted to a contention group (but it should be,
according to the spec), and this is fixed here. A field is added to root thread
to store a counter of the threads in the contention group. An extra check is
added when reserving threads for a parallel region that checks this variable and
compares to threadlimit-var, which is implemented as a new global variable,
kmp_cg_max_nth. Associated settings changes were also made, and clean up of
comments that referred to OMP_THREAD_LIMIT, but should refer to the new
KMP_DEVICE_THREAD_LIMIT (added in an earlier patch).
Patch by Terry Wilmarth
Differential Revision: https://reviews.llvm.org/D35912
llvm-svn: 309319
This change drops in KMP_DEVICE_THREAD_LIMIT to replace KMP_MAX_THREADS. It's
possible there will eventually be a OMP_DEVICE_THREAD_LIMIT, and we need
something to distinguish from OMP_THREAD_LIMIT, which is currently implemented
incorrectly (the fix for that will be added soon in a separate patch).
KMP_ALL_THREADS is deprecated here, but we can keep the "all" option on
KMP_DEVICE_THREAD_LIMIT to support that functionality. KMP_DEVICE_THREAD_LIMIT
now has priority over its deprecated rival KMP_ALL_THREADS. I also cleaned up
some comments that incorrectly referred to non-existent kmp_max_threads variable
instead of kmp_max_nth.
I've left the name of where this setting eventually ends up as
__kmp_max_nth, for now.
This change does not change much in the way of functionality. It does NOT change
OMP_THREAD_LIMIT. It's just cleaning up and setting up for that.
Patch by Terry Wilmarth
Differential Revision: https://reviews.llvm.org/D35860
llvm-svn: 309168
Removed unused __kmp_env_* variables. Also clangified other people's code.
Patch by Terry Wilmarth
Differential Revision: https://reviews.llvm.org/D35808
llvm-svn: 309000
Summary:
The kmp_os.h header is defining the `PAGE_SIZE` macro unconditionally,
even while it is only used directly after its definition, for the
Windows implementation of the `KMP_GET_PAGE_SIZE()` macro.
On at least FreeBSD, but likely all other BSDs too, this macro conflicts
with the one defined in system headers, so remove it, since nothing else
uses it. Make all Unixes use `getpagesize()` instead, and use
`GetSystemInfo()` for the Windows case.
Reviewers: jlpeyton, jcownie, emaste, AndreyChurbanov
Reviewed By: AndreyChurbanov
Subscribers: AndreyChurbanov, hfinkel, zturner
Differential Revision: https://reviews.llvm.org/D35072
llvm-svn: 308355
Summary:
Taskloop implementation is extended by using recursive task scheduling.
Envirable KMP_TASKLOOP_MIN_TASKS added as a manual threshold for the user
to switch from recursive to linear tasks scheduling.
Details:
* The calculations for the loop parameters are moved from __kmp_taskloop_linear
upper level
* Initial calculation is done in the __kmpc_taskloop, further range splitting
is done in the __kmp_taskloop_recur.
* Added threshold to switch from recursive to linear tasks scheduling;
* One half of split range is scheduled as an internal task which just moves
sub-range parameters to the stealing thread that continues recursive
scheduling (if number of tasks still enough), the other half is processed
recursively;
* Internal task duplication routine fixed to assign parent task, that was not
needed when all tasks were scheduled by same thread, but is needed now.
Patch by Andrey Churbanov
Differential Revision: https://reviews.llvm.org/D35273
llvm-svn: 308338
The internal details of this setting are not meant to be user visible and only create confusion.
Differential Revision: https://reviews.llvm.org/D35269
llvm-svn: 308189
Changes are: got all atomics to accept volatile pointers that allowed
to simplify many type conversions. Windows specific code fixed correspondingly.
Differential Revision: https://reviews.llvm.org/D35417
llvm-svn: 308164
The first bit is actually the "untied" flag. That is why the condition was
wrong and has to be inverted to set the flag correctly.
Found and initial patch by Simon Convent!
llvm-svn: 307899
Summary:
On Unix, a .S file is normally an assembly source which must be
preprocessed with a C preprocessor, while a .s file is "plain" assembly.
The former is handled by the compiler driver (cc), the latter is
directly passed to the assembler binary (as).
Because z_Linux_asm.s is supposed to be preprocessed, rename it to .S,
so it can be automatically picked up correctly by build systems.
Reviewers: AndreyChurbanov, emaste, jlpeyton
Reviewed By: AndreyChurbanov
Subscribers: mgorny, openmp-commits
Differential Revision: https://reviews.llvm.org/D35171
llvm-svn: 307680
While importing libomp into the FreeBSD base system we encountered
Clang warnings that "'register' storage class specifier is deprecated
and incompatible with C++1z [-Wdeprecated-register]".
Differential Revision: https://reviews.llvm.org/D35124
llvm-svn: 307441
Changes are: replaced C-style casts with cons_cast and reinterpret_cast;
type of several counters changed to signed; type of parameters of 32-bit and
64-bit AND and OR intrinsics changes to unsigned; changed files formatted
using clang-format version 3.8.1.
Differential Revision: https://reviews.llvm.org/D34759
llvm-svn: 307020
Reset affinity to none (false for proc-bind-var) so that threads in the child
processes are not bound tightly, unless the user explicitly sets this in
KMP_AFFINITY/OMP_PROC_BIND, in child processes. This can improve
performance for scripting languages which fork for parallelism like Python's
multiprocessing module.
Differential Revision: https://reviews.llvm.org/D34154
llvm-svn: 305513
If OpenMP is initialized before fork()-ing occurs and affinity is set to
something like compact, then the master thread will be pinned to a single HW
thread/core after initialization. If the master (or any other thread) then
forks N processes, all N processes will then be pinned to that same single HW
thread/core. To reset the affinity for the new child process, the atfork
handler for the child process can call kmp_set_thread_affinity_mask_initial()
to reset its affinity to the initial affinity of the application before it
re-initializes libomp. The parent process will not be affected and still
keeps its affinity setting.
Differential Revision: https://reviews.llvm.org/D34118
llvm-svn: 305306
Fix static initializers to use the proper unlocked value for the poll
field of the tas and futex locks.
Patch by Terry Wilmarth
Differential Revision: https://reviews.llvm.org/D33794
llvm-svn: 304828
Some code was restructured to move it under KMP_DEBUG. The rest is
formatting changes to fix some things broken by clang-format
Patch by Terry Wilmarth
Differential Revision: https://reviews.llvm.org/D33744
llvm-svn: 304438
With these settings, the create_hwloc_map() method was being called causing an
assert(). After some consideration, it was determined that disabling affinity
explicitly should just disable hwloc as well. i.e., KMP_AFFINITY overrides
KMP_TOPOLOGY_METHOD. This lets the user know that the Hwloc mechanism is being
ignored when KMP_AFFINITY=disabled.
Differential Revision: https://reviews.llvm.org/D33208
llvm-svn: 304344
This change checks if the initial affinity mask is equal to exactly one
Windows processor group's affinity mask. If it is, then the code does not
respect the initial affinity mask and uses the entire machine instead.
The reasoning behind this is that, by default, Windows assigns exactly one
processor group as the initial affinity mask even when there are multiple
Windows processor groups available. User's typically want to use the whole
machine, so we ignore this special case and use the entire machine.
If the initial affinity mask is a proper subset of one group, or spans multiple
groups, then the initial affinity mask is respected since we can assume that the
operating system did not assign this initial affinity mask. This change only
affects machines with multiple processor groups
Differential Revision: https://reviews.llvm.org/D33210
llvm-svn: 304343
An assert() was being tripped when KMP_AFFINITY=respect + Multiple Processor
Groups. Let __kmp_affinity_create_proc_group_map() function be able to create
address2os object which contains a single group by deleting restriction that
process affinity mask must span multiple groups.
llvm-svn: 303101
This patch contains the clang-format and cleanup of the entire code base. Some
of clang-formats changes made the code look worse in places. A best effort was
made to resolve the bulk of these problems, but many remain. Most of the
problems were mangling line-breaks and tabbing of comments.
Patch by Terry Wilmarth
Differential Revision: https://reviews.llvm.org/D32659
llvm-svn: 302929
Older Hwloc libraries (< 1.10.0) don't offer the HWLOC_OBJ_NUMANODE nor
HWLOC_OBJ_PACKAGE types. Instead they are named HWLOC_OBJ_NODE and
HWLOC_OBJ_SOCKET instead. This patch just defines the newer names based on
the older names when using an older Hwloc.
Differential Revision: https://reviews.llvm.org/D32496
llvm-svn: 301349
Without this fix cancellation status for parallel, sections and for persists
across construct boundaries.
Differential Revision: https://reviews.llvm.org/D31419
llvm-svn: 299434
This change slightly improves performance of KMP_YIELD_NOW() macro, by using
_rdtsc() intrinsic function if possible.
Patch by Hansang Bae
Differential Revision: https://reviews.llvm.org/D31008
llvm-svn: 298314
Affinity initialization code expects __kmp_affinity_type has the value
affinity_default by default, but the cleanup code does not properly set the
value back to affinity_default. This may introduce some issues when multiple
roots are trying to initialize/uninitialize the runtime successively.
Patch by Hansang Bae
Differential Revision: https://reviews.llvm.org/D31012
llvm-svn: 298313
This change fixes an assertion failure the in case KMP_AFFINITY is set with
'proclist' specified but without 'explicit'
e.g., KMP_AFFINITY=verbose,proclist=[0-31]
Patch by Olga Malysheva
Differential Revision: https://reviews.llvm.org/D30404
llvm-svn: 297480
Summary:
Bionic didn't get a GNU style strerror_r until Android M. Until then
we unconditionally exposed the POSIX one. Expand the check to account
for this.
Reviewers: pirama, AndreyChurbanov, jlpeyton
Reviewed By: jlpeyton
Subscribers: openmp-commits, srhines
Differential Revision: https://reviews.llvm.org/D30056
llvm-svn: 297235
Add build option LIBOMP_OMP_VERSION=50, 5.0 headers, and add the year/month
associated with OpenMP 5.0 in relevant source locations. Also, remove the
deprecated LIBOMP_OMP_VERSION=41 option.
Patch by Olga Malysheva
Differential Revision: https://reviews.llvm.org/D30450
llvm-svn: 297083
This section of code (__kmp_test_then_* functions) is guarded by
(KMP_ARCH_X86 || KMP_ARCH_X86_64) so it does not make sense to have other
architecture guards inside this section. Non-x86 architectures always
use intrinsics (__sync_*)
llvm-svn: 296525
Add counter to count number of static_steal for loops
Add counter for number of chunks executed per static_steal for loop
Add counter for number of chunks stolen per static_steal for loop
llvm-svn: 295461
Added test kmp_task_reduction_nest.cpp which has an example of
possible compiler codegen.
Differential Revision: https://reviews.llvm.org/D29600
llvm-svn: 295343
This change allows the runtime to turn __kmp_yield() on/off repeatedly on Linux.
This feature was removed when disabling monitor thread, but there are
applications that perform better with this feature on.
Patch by Hansang Bae
Differential Revision: https://reviews.llvm.org/D29227
llvm-svn: 295203
Added new ThreadSanitizer annotations to remove false positives with OpenMP reduction.
Cleaned up Tsan annotations header file from unused annotations.
Patch by Simone Atzeni!
Differential Revision: https://reviews.llvm.org/D29202
llvm-svn: 295158
Put the duplicated i_maxmin into traits_t by adding new members max_value and
min_value. Put ___kmp_size_type into traits_t by adding member type_size.
Differential Revision: https://reviews.llvm.org/D28847
llvm-svn: 293316
When the monitor thread is used, most threads in the team directly go to
sleep if the copy of bt_intervals/bt_set is not available in the cache,
and this happens at least once per thread in the wait function, making the
overall performance slightly better.
This change tries to mimic this behavior by using the bt_intervals cache,
which simply keeps the blocktime interval in terms of the platform-dependent
ticks or nanoseconds.
Patch by Hansang Bae
Differential Revision: https://reviews.llvm.org/D28906
llvm-svn: 293312
The lock tables were being reallocated if kmp_set_defaults() was called.
In the env_init code it says that the user should be able to switch between
different KMP_CONSISTENCY_CHECK values which is what this change enables.
llvm-svn: 292349
There is no corresponding free() for this expandable array. The logic is
added in __kmp_cleanup() next to the freeing of __kmp_nested_nth.
llvm-svn: 292348
Clang 4.0 trunk warns:
warning: logical not is only applied to the left hand side of this bitwise operator [-Wlogical-not-parentheses]
This points to a potential bug if the code really wants to check if the single
bit is not set: If for example (buf.edx >> 9) = 2 (has any bit set except the
least significant one), 'logical not' will return 0 which stays 0 after the
'bitwise and'.
To do this correctly we first need to evaluate the 'bitwise and'. In that case
it returns 2 & 1 = 0 which after the 'logical not' evaluates to 1.
Differential Revision: https://reviews.llvm.org/D28599
llvm-svn: 291764
Paul Osmialowski pointed out a double free bug in shutdown code. This patch
Moves the freeing of the implicit task to above the freeing of all fast memory
to prevent the double-free issue.
Differential Revision: https://reviews.llvm.org/D26860
llvm-svn: 287551
Have developer timers use partitioning scheme which also required that some
redundant developer timers be removed in favor of the already existing normal
timers. Move per thread stats initialization to just after global thread id
assignment which is as early as possible. Also put all global stats
initialization code in __kmp_stats_init() and all global stats destruction code
in __kmp_stats_fini().
Differential Revision: https://reviews.llvm.org/D26361
llvm-svn: 286892
This set of changes enables the affinity interface (Either the preexisting
native operating system or HWLOC) to be dynamically set at runtime
initialization. The point of this change is that we were seeing performance
degradations when using HWLOC. This allows the user to use the old affinity
mechanisms which on large machines (>64 cores) makes a large difference in
initialization time.
These changes mostly move affinity code under a small class hierarchy:
KMPAffinity
class Mask {}
KMPNativeAffinity : public KMPAffinity
class Mask : public KMPAffinity::Mask
KMPHwlocAffinity
class Mask : public KMPAffinity::Mask
Since all interface functions (for both affinity and the mask implementation)
are virtual, the implementation can be chosen at runtime initialization.
Differential Revision: https://reviews.llvm.org/D26356
llvm-svn: 286890
This patch allows ThreadSanitizer (Tsan) to verify OpenMP programs.
It means that no false positive will be reported by Tsan when
verifying an OpenMP programs.
This patch introduces annotations within the OpenMP runtime module to
provide information about thread synchronization to the Tsan runtime.
In order to enable the Tsan support when building the runtime, you must
enable the TSAN_SUPPORT option with the following environment variable:
-DLIBOMP_TSAN_SUPPORT=TRUE
The annotations will be enabled in the main shared library
(same mechanism of OMPT).
Patch by Simone Atzeni and Joachim Protze!
Differential Revision: https://reviews.llvm.org/D13072
llvm-svn: 286115
Check Task Scheduling Constraint (TSC) on stealing of untied task.
This is needed because the untied task can produce tied children
those can break TSC if untied is not a descendant of current task.
This can cause live lock on complex tyasking tests
(e.g. kastors/strassen-task-dep).
Differential Revision: https://reviews.llvm.org/D26182
llvm-svn: 285703
Summary:
If directives are used in a macro, clang complains with:
```
src/projects/openmp/runtime/src/kmp_runtime.c:7486:2: error: embedding a directive within macro arguments has undefined behavior [-Werror,-Wembedded-directive]
#if KMP_USE_MONITOR
```
This patch fixes two occurrences of the issue in `kmp_runtime.cpp`.
Reviewers: tlwilmar, jlpeyton, AndreyChurbanov, Hahnfeld
Subscribers: Hahnfeld, openmp-commits
Differential Revision: https://reviews.llvm.org/D25823
llvm-svn: 284728
Function strerror_r() has different signatures in different
implementations of libc: glibc's version returns a char*, while BSDs
and musl return a int. libomp unconditionally assumes glibc on Linux
and thus fails to compile against musl-libc. This patch addresses this
issue.
Differential Revision: https://reviews.llvm.org/D25071
llvm-svn: 284492
New mixed type atomic routines added for regular capture operations as well as
reverse update/capture operations. LHS - all integer and float types (no
complex so far), RHS - float16.
Patch by Olga Malysheva
Differential Revision: https://reviews.llvm.org/D25275
llvm-svn: 284489
This change removes/disables unnecessary code when monitor thread is not used.
Patch by Hansang Bae
Differential Revision: https://reviews.llvm.org/D25102
llvm-svn: 283577
Add check for "45" version to use "201511" string for OpenMP 4.5,
otherwise "200505" is used in Fortran module. Also, fix kmp_openmp_version
variable (used for the debugger, e.g.) and kmp_version_omp_api that is used
in KMP_VERSION=1 output.
Patch by Olga Malysheva
Differential Revision: https://reviews.llvm.org/D24761
llvm-svn: 282868
New routines should be used for atomics like "<int>OP=<float>" when <int> is
unsigned. Using functions __kmpc_atomic_fixed<bits>_<op>_fp) produces incorrect
results
Differential Revision: https://reviews.llvm.org/D24756
llvm-svn: 282509
This change set disables creation of the monitor thread by default. The global
counter maintained by the monitor thread was replaced by logic that uses system
time directly, and cyclic yielding on Linux target was also removed since there
was no clear benefit of using it. Turning on KMP_USE_MONITOR variable (=1)
enables creation of monitor thread again if it is really necessary for some
reasons.
Differential Revision: https://reviews.llvm.org/D24739
llvm-svn: 282507
Introduce a new LIBOMP_INSTALL_VARIABLES cache variable that can be used
to disable creating libgomp and libiomp5 aliases on 'make install'.
Those aliases are undesired e.g. on Gentoo systems where libomp is used
purely by clang.
Differential Revision: https://reviews.llvm.org/D24563
llvm-svn: 281512
Previous differencials D23305-D23310 changed task frame information management only for the kmp interface, but not for the whole gomp interface. This broke some testcases when building with gcc.
This patch fixes the broken task frame information for the gomp interface.
Patch by Joachim Protze!
Differential Revision: https://reviews.llvm.org/D24502
llvm-svn: 281468
In case, the current team is a serialized team (lwt), the frame information should be written to this data structure.
Before, nested serialized teams would overwrite the same task information.
Patch by Joachim Protze!
Differential Revision: https://reviews.llvm.org/D23310
llvm-svn: 281467
The comment already states, that this function should work similarly as __ompt_get_taskinfo.
The function only looked for lwt entries of the current team, but not when unrolling the parents. This fix aligns the implementation to __ompt_get_taskinfo.
The new test case creates a single theaded team (->lwt) and then a nested active team.
Before the innermost print_id(1) would deliver a different team then the outer print_id(0).
Patch by Joachim Protze!
Differential Revision: https://reviews.llvm.org/D23309
llvm-svn: 281466
The exit address is set when execution of a task is started and should be reset as soon as the execution is finished.
Especially for the asm implementation of __kmp_invoke_microtask, resetting in this call would be painfull, so reset just after the invokation.
The testcase shows the effect of this patch:
Before, the implicit barriers at the end of an implicit task would see an exit address for the implicit task.
This barrier is a task scheduling point. Thus, any explicit task scheduled there would see an exit, but no reenter address for the implicit task.
Patch by Joachim Protze!
Differential Revision: https://reviews.llvm.org/D23307
llvm-svn: 281465
The latest OMPT spec changed the semantic of a tasks reenter frame to be the application frame, that will be entered, when the runtime frame drops.
Before it was the last frame in the runtime. This doesn't work for some gcc execution pathes or even clang generated code for :
Since there is no runtime frame between the executed task and the encountering task.
The test case compares exit and reenter addresses against addresses captured in application code
Patch by Joachim Protze!
Differential Revision: https://reviews.llvm.org/D23305
llvm-svn: 281464
Rather than checking KMP_CPU_SETSIZE, which doesn't exist when using Hwloc, we
use the get_max_proc() function which can vary based on the operating system.
For example on Windows with multiple processor groups, it might be the case that
the highest bit possible in the bitmask is not equal to the number of hardware
threads on the machine but something higher than that.
Differential Revision: https://reviews.llvm.org/D24206
llvm-svn: 281245
Implementation of missing OpenMP 4.0 API functions omp_get_default_device and omp_set_default_device.
Also, added support for the environment variable OMP_DEFAULT_DEVICE.
Differential Revision: https://reviews.llvm.org/D23587
llvm-svn: 281065
When affinity isn't supported, __kmp_affinity_compact doesn't exist. The
problem is that in kmp_affinity.h there is a function which uses it without the
proper KMP_AFFINITY_SUPPORTED guard around it. The compiler was smart enough to
ignore it and the function __kmp_affinity_cmp_Address_child_num which relies on
it, but I think it is cleaner to have it under the proper guard. Since the
function is only used in the kmp_affinity.cpp file and there aren't any plans to
have it elsewhere. I have moved it there.
llvm-svn: 280542
the __kmp_affinity_determine_capable() functions are highly operating system
specific. This change has the functions use the type they expect explicitly.
llvm-svn: 280538
In case atomic reduction method is not available (the compiler can't generate
it) the assertion failure occurred if KMP_FORCE_REDUCTION=atomic was specified.
This change replaces the assertion with a warning and sets the reduction method
to the default one - 'critical'.
Patch by Olga Malysheva
Differential Revision: https://reviews.llvm.org/D23990
llvm-svn: 280519
Consider the following code:
int dep;
#pragma omp target nowait depend(out: dep)
{
sleep(1);
}
#pragma omp task depend(in: dep)
{
printf("Task with dependency\n");
}
printf("Doing some work...\n");
In its current state the runtime will block on the second task and not
continue execution.
Differential Revision: https://reviews.llvm.org/D23116
llvm-svn: 277992
Consider the following code which may be executed by a serial team:
int dep;
#pragma omp target nowait depend(out: dep)
{
sleep(1);
}
#pragma omp task depend(in: dep)
{
#pragma omp target nowait
{
sleep(1);
}
}
Here the explicit task may not be freed until the nested proxy task has
finished. The current code hasn't considered this and called __kmp_free_task
anyway which triggered an assert because of remaining incomplete children:
KMP_DEBUG_ASSERT( TCR_4(taskdata->td_incomplete_child_tasks) == 0 );
Differential Revision: https://reviews.llvm.org/D23115
llvm-svn: 277991
node->dn.task is only filled after the dependencies are already processed.
This currently leads to unhelpful output from KA_TRACE or even a crash
if one enables KMP_SUPPORT_GRAPH_OUTPUT.
llvm-svn: 277717
Summary:
Android does not have pthread_cancel. Disable KMP_CANCEL_THREADS if
__ANDROID__ is defined.
Subscribers: tberghammer, srhines, openmp-commits, danalbert
Differential Revision: https://reviews.llvm.org/D23029
llvm-svn: 277618
This patch enables balanced affinity on machines that do not have
hardware threads and have cores clustered into packages. In facts,
balacing algorithm could be generalized for any arrangement with
at least two levels of hierarchy (depth > 1).
Differential Revision: https://reviews.llvm.org/D22365
llvm-svn: 277212
Summary:
When compiling the runtime library with clang we get warnings like:
```
error: passing an object that undergoes default argument promotion to 'va_start' has undefined behavior [-Werror,-Wvarargs]
va_start( args, id );
^
note: parameter of type 'kmp_i18n_id_t' (aka 'kmp_i18n_id') is declared here
kmp_i18n_id_t id,
```
My understanding is that the va_start macro only gets the promoted type so it won't know what was the exact type of the argument, which can potentially not work for some targets given that the implementation of the the calling convention could not be done properly.
This patch fixes that by using a built-in type in the function signature.
Reviewers: tlwilmar, jlpeyton, AndreyChurbanov
Subscribers: arpith-jacob, carlo.bertolli, caomhin, openmp-commits
Differential Revision: https://reviews.llvm.org/D22427
llvm-svn: 276428
When linking with libhwloc, the ORDERED EPCC test slows down on big
machines (> 48 cores). Performance analysis showed that a cache thrash
was occurring and this padding helps alleviate the problem.
Also, inside the main spin-wait loop in kmp_wait_release.h, we can eliminate
the references to the global shared variables by instead creating a local
variable, oversubscribed and instead checking that.
Differential Revision: http://reviews.llvm.org/D22093
llvm-svn: 274894
If update_master_only is set the place list is not completely traversed
and therefore this assertion failed. Make it only trigger if
update_master_only is false.
(was introduced by D20539)
Differential Revision: http://reviews.llvm.org/D21925
llvm-svn: 274482
This change fixes an error in comparing the existing schedule on the team to
the new schedule, in the chunk field. Also added additional checks and used
KMP_CHECK_UPDATE where appropriate.
Patch by Terry Wilmarth.
Differential Revision: http://reviews.llvm.org/D21897
llvm-svn: 274371
EPCC Performance of single is considerably worse than plain barrier.
Adding a read-only check to the code before the atomic compare-and-store
helps considerably.
Patch by Terry Wilmarth.
Differential Revision: http://reviews.llvm.org/D21893
llvm-svn: 274369
* Incorrect lock value written in __kmp_test_futex_lock
* Incorrect lock value check in tas/futex lock with USE_LOCK_PROFILE on
Patch by Hansang Bae
llvm-svn: 274053
UNICODE and _UNICODE defintions were added in the LLVM CMake build system.
While on Unices, the UNICODE/_UNICODE macros don't cause problems, on Windows
only ittnotify_static.c should be compiled using -DUNICODE. We are still
looking at a proper fix, but this change sets the build back to exactly what it
was doing before. Also, a comment and TODO were added in the src/CMakeLists.txt
file to help explain.
llvm-svn: 274052
That patch made all LLVM projects build with -DUNICODE. However, this doesn't
work for the OpenMP runtime.
But just overriding the flag with -UUNICODE breaks compiling ittnotify_static.c,
which for some reason needs to be compiled with -DUNICIODE. Note that compiling
ittnotify.h with -DUNICODE does not work though.
This seems like a mess. This commit fixes it for now, but it would be great
if someone who works on the OpenMP runtime could fix it properly.
llvm-svn: 273898
Bug fix for hang when omp task and nested parallelism used together.
Still some problem remains with task state saving/restoring, but
user's case works fine now. All tasking unit tests passed as well.
Patch by Andrey Churbanov
Differential Revision: http://reviews.llvm.org/D21558
llvm-svn: 273297
Replaced readings of nproc from team structure with ones from
thread structure to improve performance.
Patch by Andrey Churbanov.
Differential Revision: http://reviews.llvm.org/D21559
llvm-svn: 273293
The removal of legacy code to support long-deprecated debugger support library
resulted in some whitespace changes. Comments from that legacy code were made
public as they may be useful for other debuggers.
Patch by Olga Malysheva.
Differential Revision: http://reviews.llvm.org/D21391
llvm-svn: 273282
A couple improvements:
1) Add ability to limit fullMask size when KMP_HW_SUBSET limits resources.
2) Make KMP_HW_SUBSET work for affinity_none, and only limit fullMask in this case.
Patch by Andrey Churbanov.
Differential Revision: http://reviews.llvm.org/D21528
llvm-svn: 273278
There was a segfault in the stubs library in posix_memalign because
of a bad parameter. The fix is to send address of the pointer as a
parameter. Also added check of result of posix_memalign.
Patch by Andrey Churbanov.
Differential Revision: http://reviews.llvm.org/D21529
llvm-svn: 273276
This change appends the process id to the KMP_STATS_FILE (if specified) which
enables MPI processes to output their stats to separate files.
Differential Revision: http://reviews.llvm.org/D21386
llvm-svn: 273273
Change hwloc discovery algorithm to print topology for only accessible
resources, and report uniformity correspondingly, similar to what other topology
discovery algorithms do. Fixes minor inconsistency in total topology reported
and resources used for threads binding in case hwloc used.
Patch by Andrey Churbanov.
Differential Revision: http://reviews.llvm.org/D21389
llvm-svn: 272952
This patch allows a user to enable Hwloc on windows. There are three main
changes in here:
1.kmp.h - Move definitions/declarations out of KMP_OS_WINDOWS guard (our windows
implementation of affinity) because they need to be defined when
KMP_USE_HWLOC is on as well.
2.teach __kmp_set_system_affinity, __kmp_get_system_affinity,
__kmp_get_proc_group, and __kmp_affinity_bind_thread how to use hwloc.
3.teach CMake how to include hwloc when building Windows
Another minor change in here is to make sure that anything under KMP_USE_HWLOC
is also guarded by KMP_AFFINITY_SUPPORTED as well. This is to prevent Mac
builds from requiring anything from Hwloc.
Differential Revision: http://reviews.llvm.org/D21441
llvm-svn: 272951
With single thread using __kmpc_omp_wait_deps segfaults in OpenMP runtime.
Offloading with depend also encounters this problem when we generate
kmpc_omp_wait_deps instead of kmpc_omp_task_with_deps.
Patch by Alex Duran
Differential Revision: http://reviews.llvm.org/D21384
llvm-svn: 272949
Cleanup: fixed missing memory cleanup in couple of corner cases. Fixes possible
memory leak in some corner cases
Patch by Andrey Churbanov
Differential Revision: http://reviews.llvm.org/D21355
llvm-svn: 272946
Improved performance of ittnotify calls by request from ittnotify
owner: calls to __itt_string_handle_create made unique (it was
called multiple times).
Patch by Andrey Churbanov
Differential Revision: http://reviews.llvm.org/D21353
llvm-svn: 272945
Deprecate KMP_PLACE_THREADS and rename it to KMP_HW_SUBSET due to confusion
about its purpose and function among users. KMP_HW_SUBSET is an environment
variable which allows users to easily pick a subset of the hardware topology to
use. e.g., KMP_HW_SUBSET=30c,2t means use 30 cores, 2 threads per core.
Patch by Andrey Churbanov
Differential Revision: http://reviews.llvm.org/D21340
llvm-svn: 272937
Added argv array check/allocation for parallel directly nested inside the teams
construct, as new coming Fortran codegen passes parameters directly into
kmpc_fork_call missing same parameters in kmpc_fork_teams (earlier codegen
passed to parallel the subset of parameter passed to teams, and thus
no check/allocation needed).
Patch by Andrey Churbanov
Differential Revision: http://reviews.llvm.org/D21336
llvm-svn: 272935
Currently, there is a big overhead in reporting of loop metadata through
ittnotify. The pair of functions: __kmp_str_loc_init/__kmp_str_loc_free are
replaced with strchr/atoi calls. Thus, a lot of time consuming actions are
skipped - many memory allocations/deallocations, heavy string duplication, etc.
The loop metadata only needs line and column info from the source string, so no
allocations and string splitting actually needed.
Patch by Andrey Churbanov
Differential Revision: http://reviews.llvm.org/D21309
llvm-svn: 272698
Cleanup - unused code removal.
TODO: consider to remove (replace with flag class methods)
also kmp_wait_64 and kmp_release_64 routines.
Patch by Andrey Churbanov
Differential Revision: http://reviews.llvm.org/D21332
llvm-svn: 272697
OpenMP 4.1 is now OpenMP 4.5. Any mention of 41 or 4.1 is replaced with
45 or 4.5. Also, if the CMake option LIBOMP_OMP_VERSION is 41, CMake warns that
41 is deprecated and to use 45 instead.
llvm-svn: 272687
Fix for bugzilla https://llvm.org/bugs/show_bug.cgi?id=26602. Removed functions
body consisted of the only KMP_ASSERT(0) statement. Thus possible runtime crash
converted to compile-time error, which looks preferable (faster possible error
detection).
TODO: consider C++11 static assert as an alternative, that could
make the diagnostics better.
Patch by Andrey Churbanov
Differential Revision: http://reviews.llvm.org/D21304
llvm-svn: 272590
Remove static specifier from var fullMask and remove kmp_get_fullMask() routine.
When iterating through procs in a mask, always check if proc is in fullMask
(this check was missing in a few places).
Patch by Brian Bliss.
Differential Revision: http://reviews.llvm.org/D21300
llvm-svn: 272589
If either current_task or new_task is untied then skip task scheduling
constraint checks, because untied tasks are not affected by the task
scheduling constraints.
Differential Revision: http://reviews.llvm.org/D21196
llvm-svn: 272570
The problem scenario is the following:
A dynamic library, libfoo.so, depends on libomp.so (it creates parallel region
and calls some omp functions). An application has a loop where it dynamically
loads libfoo.so, calls the function from it, unloads libfoo.so. After several
loop iterations application crashes with the message about lack of resources
OMP: Error #34: System unable to allocate necessary resources for OMP thread:
The problem is that pthread_kill() was not followed by pthread_join() in case
of terminated thread. This patch fixes this problem for both worker and monitor
threads.
Differential Revision: http://reviews.llvm.org/D21200
llvm-svn: 272567
These changes remove the hwloc_topology_ignore_type function which doesn't exist
in the hwloc 2.0 API. In the existing code, the topology extracted from hwloc
has the cache levels stripped out and then assumes the final stripped topology
follows the typical three-level topology: packages -> cores -> HW threads.
But the code is doing unclean manipulations to determine at what level those
resources are located and also assumes too much about what hwloc is detecting
(there could be intermediate levels in between socket and core for instance).
This new way of extracting the topology doesn't strip out any hardware objects
that hwloc detects. It does not assume the three level topology, and instead
searches for the relevant three levels within the topology for each bit of
information using hwloc interface functions. i.e., the three level topology
subset that our affinity code is interested in is extracted from the hwloc
topology tree directly.
For example, the new __kmp_hwloc_get_nobjs_under_obj function gives the user the
number of cores under a socket reliably without worrying if there are unexpected
objects between the socket object and core object in the hwloc topology
structure. Also, now that all topology information is kept, there are also
possibilities of using the caches/numa nodes to determine more sophisticated
affinity settings in the future.
There is also some cleanup code added for the destruction of the
__kmp_hwloc_topology object.
Differential Revision: http://reviews.llvm.org/D21195
llvm-svn: 272565
The bitmask complement operation doesn't consider the max proc id which means
something like !{0} will be translated to {1,2,3,4,...,600,601,...,1023} on a
Linux system even though there aren't 600 processors on said system. This
change has the complement bitmask and-ed with the fullmask so that it will only
contain valid processors.
Differential Revision: http://reviews.llvm.org/D21245
llvm-svn: 272561
Refactored __kmp_execute_tasks_template to shorten and remove code redundancy.
The original code for __kmp_execute_tasks_template was very redundant with
large sections of repeated code that needed to be kept consistent, and goto
statements that made the control flow difficult to discern. This refactoring
removes all gotos and redundancy.
Patch by Terry Wilmarth
Differential Revision: http://reviews.llvm.org/D20879
llvm-svn: 272286
MSVC doesn't allow std::atomic<>s in a union since they don't have trivial
copy constructor. Replacing them with e.g. std::atomic_int works, but that
breaks the GCC build on Linux, because then calls to e.g. std::atomic_load_explicit
fail, as they expect a real std::atomic<> pointer.
Fixing this with an #ifdef to unbreak the build for now.
llvm-svn: 272271
As I replaced no-op TCR_4 with actual code, compiler complained while building debug build.
This patch moves 'cast to int' to the correct place.
Extension to Differential Revision: http://reviews.llvm.org/D19880
llvm-svn: 271377
This patch replaces use of compiler builtin atomics with
C++11 atomics for ticket locks implementation. Ticket locks
are used in critical places of the runtime, e.g. in the tasking
mechanism.
The main reason this change was introduced is the problem
with work stealing function on ARM architecture which suffered
from nasty race condition. It turned out that the root cause of
the problem lies in the way ticket locks are implemented. Changing
compiler builtins into C++11 atomics solves the problem.
Two assertions were added into kmp_tasking.c which are useful
for detecting early symptoms of something wrong going on with
work stealing, which were among the possible outcomes of the
race condition.
Differential Revision: http://reviews.llvm.org/D19878
llvm-svn: 271324
This patch implements the new kmp_sch_static_balanced_chunked schedule kind that
the compiler will generate when it encounters schedule(simd: static). It just
adds the new constant and the new switch case __kmp_for_static_init.
Patch by Alex Duran.
Differential Revision: http://reviews.llvm.org/D20699
llvm-svn: 271320
When an asynchronous offload task is completed, COI calls the runtime to queue
a "destructor task". When the task deques are full, a dead-lock situation
arises where the OpenMP threads are inside but cannot progress because the COI
thread is stuck inside the runtime trying to find a slot in a deque.
This patch implements the solution where the task deques doubled in size when
a task is being queued from a COI thread.
Differential Revision: http://reviews.llvm.org/D20733
llvm-svn: 271319
The problem is the lack of dispatch buffers when thousands of loops with nowait,
about 10 iterations each, are executed by hundreds of threads. We only have
built-in 7 dispatch buffers, but there is a need in dozens or hundreds of
buffers.
The problem can be fixed by setting KMP_MAX_DISP_BUF to bigger value. In order
to give users same possibility I changed build-time control into run-time one,
adding API just in case.
This change adds an environment variable KMP_DISP_NUM_BUFFERS and a new API
function kmp_set_disp_num_buffers(int num_buffers).
The KMP_DISP_NUM_BUFFERS envirable works only before serial initialization,
because during the serial initialization we already allocate buffers for the hot
team, so it is too late to change the number of buffers later (or we need to
reallocate buffers for all teams which sounds too complicated). The
kmp_set_defaults() routine does not work for this envirable, because it calls
serial initialization before reading the parameter string. So a new routine,
kmp_set_disp_num_buffers(), is created so that it can set our internal global
variable before the library initialization. If both the envirable and API used
the envirable wins.
Differential Revision: http://reviews.llvm.org/D20697
llvm-svn: 271318
The OMP_PROC_BIND=spread strategy fails to assign the master thread the
correct place partition after the first parallel region. Other threads in the
hot team will remember their place_partition, but the master's place partition
is restored to what it was before entering the parallel region. So when the hot
team is used for subsequent parallel regions, the master has lost this info.
This fix calls __kmp_partition_places to update only the master thread's place
partition in the spread case when there are no other changes to the hot team.
Patch by Terry Wilmarth
Differential Revision: http://reviews.llvm.org/D20539
llvm-svn: 270890
On Blue Gene/Q, having LIBOMP_USE_ITT_NOTIFY support compiled into a
statically-linked binary causes a failure at runtime because dlopen fails.
This patch changes LIBOMP_USE_ITT_NOTIFY to a cacheable configuration setting
that can be disabled.
Patch by John Mellor-Crummey
Differential Revision: http://reviews.llvm.org/D20517
llvm-svn: 270884
Clang no longer restricts itself to generating microtasks with a small number
of arguments, and so an assembly implementation is required to prevent hitting
the parameter limit present in the C implementation. This adds an
implementation for ppc64[le].
llvm-svn: 270821
Most of this is modifications to check for differences before updating data
fields in team struct. There is also some rearrangement of the team struct.
Patch by Diego Caballero
Differential Revision: http://reviews.llvm.org/D20487
llvm-svn: 270468
These changes allow testing on Windows using clang.exe.
There are two main changes:
1. Only link to -lm when it actually exists on the system
2. Create basic versions of pthread_create() and pthread_join() for windows.
They are not POSIX compliant by any stretch but will allow any existing
and future tests to use pthread_create() and pthread_join() for testing
interactions of libomp with os threads.
Differential Revision: http://reviews.llvm.org/D20391
llvm-svn: 270464
KMP_USE_FUTEX preprocessor definition defined in kmp_lock.h is used
inconsequently throughout LLVM libomp code.
* some .c files that use this define do not include kmp_lock.h file,
in effect guarded part of code are never compiled
* some places in code use architecture-depending preprocessor
logic expressions which effectively disable use of Futex for
AArch64 architecture, all these places should use
'#if KMP_USE_FUTEX' instead to avoid any further confusions
* some places use KMP_HAS_FUTEX which is nowhere defined,
KMP_USE_FUTEX should be used instead
Differential Revision: http://reviews.llvm.org/D19629
llvm-svn: 269642
This patch solves 'Too many args to microtask' problem which occurs
while executing lulesh2.0.3 benchmark on AArch64.
To solve this I had to wrtite AArch64 assembly version of
__kmp_invoke_microtask() function, similar to x86 and x86_64
implementations.
Differential Revision: http://reviews.llvm.org/D19879
llvm-svn: 269399
This change adds a new entry point,
kmp_aligned_malloc(size_t size, size_t alignment), an entry point corresponding
to kmp_malloc() but with the capability to return aligned memory as well.
Other allocator routines have been adjusted so that kmp_free() can be used for
freeing memory blocks allocated by any kmp_*alloc() routine, including the new
kmp_aligned_malloc() routine.
Differential Revision: http://reviews.llvm.org/D19814
llvm-svn: 269365
After hot teams were enabled by default, the library started using levels kept
in the team structure. The levels are broken in case foreign thread exits and
puts its team into the pool which is then re-used by another foreign thread.
The broken behavior observed is when printing the levels for each new team, one
gets 1, 2, 1, 2, 1, 2, etc. This makes the library believe that every other
team is nested which is incorrect. What is wanted is for the levels to be
1, 1, 1, etc.
Differential Revision: http://reviews.llvm.org/D19980
llvm-svn: 269363
This reverts a presumaby-unintentional change in:
r268640 - [STATS] Use partitioned timer scheme
and fixes segfaults in an x86_64 debug build of the runtime library.
llvm-svn: 269259
This patch introduces following:
* TCI_* and TCD_* macros for incrementation and decrementation
* Fix for invalid use of TCR_8 in one expression
Differential Revision: http://reviews.llvm.org/D19880
llvm-svn: 268826
This change removes the current timers with ones that partition time properly.
The current timers are nested, so that if a new timer, B, starts when the
current timer, A, is already timing, A's time will include B's. To eliminate
this problem, the partitioned timers are designed to stop the current timer (A),
let the new timer run (B), and when the new timer is finished, restart the
previously running timer (A). With this partitioning of time, a threads' timers
all sum up to the OMP_worker_thread_life time and can now easily show the
percentage of time a thread is spending in different parts of the runtime or
user code.
There is also a new state variable associated with each thread which tells where
it is executing a task. This corresponds with the timers: OMP_task_*, e.g., if
time is spent in OMP_task_taskwait, then that thread executed tasks inside a
#pragma omp taskwait construct.
The changes are mostly changing the MACROs to use the new PARITIONED_* macros,
the new partitionedTimers class and its methods, and new state logic.
Differential Revision: http://reviews.llvm.org/D19229
llvm-svn: 268640
This debug sections's functionality can be replicated using the environment
variable KMP_TOPOLOGY_METHOD with different values and KMP_AFFINITY=verbose
llvm-svn: 267472
This change has the hwloc_bitmap_list_snprintf() function use the entire buffer
to print the mask. There is no need to shorten the buffer length by 7. It only
needs to be shortened by one byte.
llvm-svn: 267470
The trip count calculation was incorrect for loops with large bounds. For example,
for(int i=-2,000,000,000; i < 2,000,000,000; i+=50000000), the trip count
calculation had overflow (trying to calculate 2,000,000,000 + 2,000,000,000 with
signed integers) and wasn't giving the right value. This patch fixes this error
in the runtime by using unsigned integers instead. There is still a bug in the
clang compiler component because it warns that there is overflow in the
test case file when there isn't. This error isn't there for the Intel Compiler.
So for now, the test case is designated as XFAIL.
Differential Revision: http://reviews.llvm.org/D19078
llvm-svn: 266677
Introduced a counter of parts of an untied task submitted for execution. The
counter controls whether all parts of the task are already finished. The
compiler should generate re-submission of partially executed untied task by
itself before exiting of each task part except for the lexical last part.
Differential Revision: http://reviews.llvm.org/D19026
llvm-svn: 266675
Some codes that use TLS fail intermittently because one thread tries to write
TLS values after the TLS key has been destroyed by another thread. This happens
when one thread executes library shutdown (and destroys TLS keys), while another
thread starts to execute the TLS key destructor routine. Before this change, the
kmp_init_runtime flag was checked before calling pthread_* TLS functions, but
this flag is set to FALSE later than the destruction of the TLS keys, which
leads to failure. The fix is to check kmp_init_gtid instead, as this flag is
unset *before* the destruction of TLS keys.
Differential Revision: http://reviews.llvm.org/D19022
llvm-svn: 266674
ittnotify fix for barrier imbalance time in case tasks exist. In the current
implementation, task execution time is included into aggregated time on a
barrier. This fix calculates task execution time and corrects the arrive time
by subtracting the task execution time.
Since __kmp_invoke_task() can not only be called on a barrier, the field
th.th_bar_arrive_time is used to check if the function was called at the
barrier (th.th_bar_arrive_time != 0). So for this check, th_bar_arrive_time
is set to zero right after the value is used on the barrier.
Differential Revision: http://reviews.llvm.org/D19030
llvm-svn: 266332
This change adds back off logic in the test and set lock for better contended
lock performance. It uses a simple truncated binary exponential back off
function. The default back off parameters are tuned for x86.
The main back off logic has a two loop structure where each is controlled by a
user-level parameter:
max_backoff - limits the outer loop number of iterations.
This parameter should be a power of 2.
min_ticks - the inner spin wait loop number of "ticks" which is system
dependent and should be tuned for your system if you so choose.
The "ticks" on x86 correspond to the time stamp counter,
but on other architectures ticks is a timestamp derived
from gettimeofday().
The user can modify these via the environment variable:
KMP_SPIN_BACKOFF_PARAMS=max_backoff[,min_ticks]
Currently, since the default user lock is a queuing lock,
one would have to also specify KMP_LOCK_KIND=tas to use the test-and-set locks.
Differential Revision: http://reviews.llvm.org/D19020
llvm-svn: 266329
This change has OMP_WAIT_POLICY=active to mean that threads will busy-wait in
spin loops and virtually never go to sleep. OMP_WAIT_POLICY=passive now means
that threads will immediately go to sleep inside a spin loop. KMP_BLOCKTIME was
the previous mechanism to specify this behavior via KMP_BLOCKTIME=0 or
KMP_BLOCKTIME=infinite, but the standard OpenMP environment variable should
also be able to specify this behavior.
Differential Revision: http://reviews.llvm.org/D18577
llvm-svn: 265339
#endif was one line too low. If KMP_USE_ADAPTIVE_LOCKS is 0,
then queuing locks would incorrectly use drdpa lock mechanism.
This is a fix for https://llvm.org/bugs/show_bug.cgi?id=26649
llvm-svn: 264934
Removed reference to "ref ct" in a comment, as ref_ct no longer exists. Also
moved the comment to where the task_team is about to be tested if NULL.
llvm-svn: 264786
The problem is that the definition of kmp_cpuinfo_t contains:
char name [3*sizeof (kmp_cpuid_t)]; // CPUID(0x80000002,0x80000003,0x80000004)
and kmp_cpuid_t is only defined when compiling for x86.
Differential Revision: http://reviews.llvm.org/D18245
llvm-svn: 264535
For serialized parallel regions, wrong ids were reported. Now the same code is
used as in kmp_dispatch.cpp which emits the correct ids.
Differential Revision: http://reviews.llvm.org/D18348
llvm-svn: 264266
For non-serialized parallel regions the master thread issued two callbacks:
The first one in kmp_gsupport.c and the second in __kmp_join_call. Therefore
only trigger the callback in kmp_gsupport.c for serialized parallel regions.
Differential Revision: http://reviews.llvm.org/D16716
llvm-svn: 264264
Have Visual Studio use MemoryBarrier() instead of _mm_mfence() and remove
__declspec align attribute from function parameters in kmp_atomic.h
llvm-svn: 264166
This change logically separates the stats_flags_e::noTotal bit flag from the
stats_flags_e::onlyInMaster and stats_flags_e::noUnits bit flags. If no
TOTAL_foo output is wanted for a particular statistic, the flag must be
explicitly included in that statistic's flags.
Differential Revision: http://reviews.llvm.org/D18198
llvm-svn: 263954
Building libomp using CMake versions < 3.3 caused a link time error. These
errors occurred because when assembling z_Windows_NT-586_asm.asm, the
definitions: OMPT_SUPPORT, _M_AMD64|_M_IA32 weren't defined on the command line.
To fix the problem, the COMPILE_FLAGS property for the assembly file is appended
to instead of the COMPILE_DEFINITIONS property being set. For whatever reason, the
COMPILE_DEFINITIONS property doesn't pick up the definitions for assembly files
for the older CMake versions.
llvm-svn: 263651
This change adds a header to the printout of the statistics which includes the
time, machine name, and processor info if available. This change also includes
some cosmetic changes like using enum casting for timer and counter iteration.
Differential Revision: http://reviews.llvm.org/D18153
llvm-svn: 263580
This change removes synthesized stats and instead has all timers print out a
total which is the aggregate statistics across threads. This is displayed as
"Total_foo" at the end of program. The stats_flags_e::synthesized flag is
removed and the printStats() function is split into two separate functions:
printTimerStats() which can display the aggregate total and printCounterStats().
Differential Revision: http://reviews.llvm.org/D17869
llvm-svn: 263290
From the standard: The taskloop construct specifies that the iterations of one
or more associated loops will be executed in parallel using OpenMP tasks. The
iterations are distributed across tasks created by the construct and scheduled
to be executed.
This initial implementation uses a simple linear tasks distribution algorithm.
Later we can add other algorithms to speedup generation of huge number of tasks
(i.e., tree-like tasks generation should be faster).
This needs to be put into the OpenMP runtime library in order for the
compiler team to develop the compiler side of the implementation.
Differential Revision: http://reviews.llvm.org/D17404
llvm-svn: 262535
From the standard: A doacross loop nest is a loop nest that has cross-iteration
dependence. An iteration is dependent on one or more lexicographically earlier
iterations. The ordered clause parameter on a loop directive identifies the
loop(s) associated with the doacross loop nest.
The init/fini routines allocate/free doacross buffer(s) for each loop for each
thread. The wait routine waits for a flag designated by the dependence vector.
The post routine sets the flag designated by current iteration vector. We use
a similar technique of shared buffer indices that covers up to 7 nowait loops
executed simultaneously by different threads (number 7 has no real meaning,
just heuristic value). Also, the size of structures are kept intact via
reducing dummy arrays.
This needs to be put into the OpenMP runtime library in order for the compiler
team to develop the compiler side of the implementation.
Differential Revision: http://reviews.llvm.org/D17399
llvm-svn: 262532
This change introduces the new OpenMP 4.5 affinity api surrounding
OpenMP Places. There are six new entry points:
Typically called in serial region:
* omp_get_num_places - returns the number of places available to the execution
environment in the place list.
* omp_get_place_num_procs - returns the number of processors available to the
execution environment in the specified place.
* omp_get_place_proc_ids - returns the numerical identifiers of the processors
available to the execution environment in the specified place.
Typically called inside parallel region:
* omp_get_place_num - returns the place number of the place to which the
encountering thread is bound.
* omp_get_partition_num_places - returns the number of places in the place
partition of the innermost implicit task.
* omp_get_partition_place_nums - returns the list of place numbers
corresponding to the places in the place-var ICV of the innermost
implicit task.
Differential Revision: http://reviews.llvm.org/D17417
llvm-svn: 261915
The maximum task priority value is read from envirable: OMP_MAX_TASK_PRIORITY.
But as of now, nothing is done with it. We just handle the environment variable
and add the new api: omp_get_max_task_priority() which returns that value or
zero if it is not set.
Differential Revision: http://reviews.llvm.org/D17411
llvm-svn: 261908
The monotonic/non-monotonic flags are sent to the runtime via the sched_type by
setting the 30th (non-monotonic) or 29th (monotonic) bit in the sched_type.
Macros are added to probe if monotonic or non-monotonic is specified
(SCHEDULE_HAS_[NON]MONOTONIC & SCHEDULE_HAS_NO_MODIFIERS)
and also to to get the base sched_type (SCHEDULE_WITHOUT_MODIFIERS)
Currently, nothing is done with the modifiers.
Also, this patch adds some comments on the use of the enumerations in at least
one place where it is subtle.
Differential Revision: http://reviews.llvm.org/D17406
llvm-svn: 261906
For pragma omp taskwait the runtime is called from the task context.
Therefore, the reentry frame information should be updated.
The information should be available for both taskwait event calls; therefore,
set before the first event and reset after the last event.
Patch by Joachim Protze
Differential Revision: http://reviews.llvm.org/D17145
llvm-svn: 260674
When a target task finishes and it tries to access the th_task_team from the
threads in the team where it was created, th_task_team can be NULL or point to
a different place when that thread started a nested region that is still
running. Finding the exact task_team that the threads were using is difficult
as it would require to unwind the task_state_memo_stack. So a new field was added
in the taskdata structure to point to the active task_team when the task was
created.
llvm-svn: 260615
The problem is that the master's thread state was not saved before entering a
parallel region so it does not remember tasks when it returns.
llvm-svn: 260306
(libgomp has bool as well)
This was causing a test failure in omp_test_if.c when building with GCC in
Debug mode. I have verified that GCC versions 4.9.2 and 5.3.0 now work and
compile-tested this change with clang 3.7.1 and Intel Compiler 16.0.
Differential Revision: http://reviews.llvm.org/D16921
llvm-svn: 260204
When building executables for Cray supercomputers, statically-linked executables
are preferred. This patch makes it possible to build the OpenMP runtime as an
archive for building statically-linked executables. The patch adds the flag
LIBOMP_ENABLE_SHARED, which defaults to true. When true, a build of the OpenMP
runtime yields dynamic libraries. When false, a build of the OpenMP runtime
yields static libraries. There is no setting that allows both kinds of libraries
to be built.
Patch by John Mellor-Crummey
Differential Revision: http://reviews.llvm.org/D16525
llvm-svn: 259817
In: http://lists.llvm.org/pipermail/openmp-dev/2015-August/000858.html, a
performance issue was found with libomp's task dependencies. The task
dependencies hash table has an issue with collisions. The current table size is
a power of two. This combined with the current hash function causes a large
number of collisions to occurr. Also, the current size (64) is too small for
larger applications so the table size is increased.
This patch creates a two level hash table approach for task dependencies. The
implicit task is considered the "master" or "top-level" task which has a large
static sized hash table (997), and nested tasks will have smaller hash
tables (97). Prime numbers were chosen to help reduce collisions.
Differential Revision: http://reviews.llvm.org/D16640
llvm-svn: 259113
The attached patch adds support for ompt_event_task_dependences and
ompt_event_task_dependence_pair events from the OMPT specification [1]. These
events only apply to OpenMP 4.0 and 4.1 (aka 4.5) because task dependencies
were introduced in 4.0.
With respect to the changes:
ompt_event_task_dependences
According to the specification, this event is raised after the task has been
created, thefore this event needs to be raised after ompt_event_task_begin
(in __kmp_task_start). However, the dependencies are known at
__kmpc_omp_task_with_deps which occurs before __kmp_task_start. My modifications
extend the ompt_task_info_t struct in order to store the dependencies of the
task when _kmpc_omp_task_with_deps occurs and then they are emitted in
__kmp_task_start just after raising the ompt_event_task_begin. The deps field
is allocated and valid until the event is raised and it is freed and set
to null afterwards.
ompt_event_task_dependence_pair
The processing of the dependences (i.e. checking whenever a dependence is
already satisfied) is done within __kmp_process_deps. That function checks
every dependence and calls the __kmp_track_dependence routine which gives some
support for graphical output. I used that routine to emit the dependence pair
but I also needed to know the sink_task. Despite the fact that the code within
KMP_SUPPORT_GRAPH_OUTPUT refers to task_sink it may be null because
sink->dn.task (there's a comment regarding this) and in fact it does not point
to a proper pointer value because the value is set in node->dn.task = task;
after the __kmp_process_deps calls in __kmp_check_deps. I have extended the
__kmp_process_deps and __kmp_track_dependence parameter list to receive the
sink_task.
[1] https://github.com/OpenMPToolsInterface/OMPT-Technical-Report/blob/target/ompt-tr.pdf
Patch by Harald Servat
Differential Revision: http://reviews.llvm.org/D14746
llvm-svn: 259038
When the code behind the barrier is executed, the master thread may have
already resumed execution. That's why we cannot safely assume that *pteam
is not yet freed.
This has been introduced by r258866.
llvm-svn: 259037
Current clang trunk reports _OPENMP to be 201307 = OpenMP 4.0. It doesn't
recognize '#pragma omp declare target' though (patch still pending) and
therefore fails compilation.
Differential Revision: http://reviews.llvm.org/D16631
llvm-svn: 259026
For implcit barriers in simple parallel for loops, the order of the OMPT events
was wrong. The barrier_{begin,end} events came after the implcit_task_end
event for the implcit barrier at the end of the parallel region. This is wrong
because the implicit task executes the barrier before ending. This patch fixes
the order of the event: It will be triggerd now just before
__kmp_pop_current_task_from_thread() is called.
Patch by Tim Cramer
Differential Revision: http://reviews.llvm.org/D16347
llvm-svn: 258866
This change fixes the bug: https://llvm.org/bugs/show_bug.cgi?id=25975
by bypassing the perl module files which try to deduce system information.
These perl modules files don't offer useful information and are from the
original build system. They can be removed after this change.
llvm-svn: 258843
This change fixes one issue reported at https://llvm.org/bugs/show_bug.cgi?id=26184
There was missing cleanup code for the cached indirect lock pool. The change
will fix the reported case where it tries to initialize a lock after runtime
cleanup/reinitialization, but it is still possible that the user program runs
into another problem because most test programs have a call to __kmpc_set_lock
after cleanup/reinitialization without calling __kmpc_init_lock causing a crash/hang.
llvm-svn: 258528
The release builds are configured to be reproducible, so that the
binaries compare equal between bootstrap iterations. The OpenMP
run-time build was failing like this:
runtime/src/kmp_version.c:108:79: error: expansion of date or time macro is not reproducible [-Werror,-Wdate-time]
char const __kmp_version_build_time[] = KMP_VERSION_PREFIX "build time: " __DATE__ " " __TIME__;
Figuring as the build currently doesn't set LIBOMP_DATE, it's probably
OK to skip setting the build time here too.
llvm-svn: 257833
This new API, int kmp_set_thread_affinity_mask_initial(), is available for use
by other parallel runtime libraries inside a possibly OpenMP-registered thread.
This entry point restores the current thread's affinity mask to the affinity
mask of the application when it first began. If -1 is returned it can be assumed
that either the thread hasn't called affinity initialization or that the thread
isn't registered with the OpenMP library. If 0 is returned then, then the call
was successful. Any return value greater than zero indicates an error occurred
when setting affinity.
Differential Revision: http://reviews.llvm.org/D15867
llvm-svn: 257489
Recent changes to support dynamic locks didn't consider the code compiled when
OMPT_SUPPORT=true. As a result, the OMPT support was broken by recent changes
to nested locks to support dynamic locks. For OMPT to work with dynamic locks,
they need to provide a return code indicating whether a nested lock acquisition
was the first or not.
This patch moves the OMPT support for nested locks into the #else case when
DYNAMIC locks were not used. New support is needed for dynamic locks. This patch
fixes the build and leaves a placeholder where the missing OMPT callbacks can be
added either the author of the OMPT support for locks, or the dynamic
locking support.
Patch by John Mellor-Crummey
Differential Revision: http://reviews.llvm.org/D15656
llvm-svn: 256314
When users sets envirable KMP_BLOCKTIME to "infinite" (the time one busy-waits
at barrieres, etc.), the monitor thread is not useful and can be ignored. This
change prevents the creation of the monitor thread when the users sets
KMP_BLOCKTIME to "infinite".
Differential Revision: http://reviews.llvm.org/D15628
llvm-svn: 256061
This change allows clang to build the stats library for every architecture
which supports __builtin_readcyclecounter(). CMake also checks for all
necessary features for stats and will error out if the platform does not
support it.
Patch by Hal Finkel and Johnny Peyton
llvm-svn: 256002
This change set includes all changes to make the code conform to the OMP 4.5 specification:
* Removed hint / hinted_init definitions from include/40 files
* Hint values are powers of 2 to enable composition (4.5 spec)
* Hinted lock initialization functions were renamed (4.5 spec)
kmp_init_lock_hinted -> omp_init_lock_with_hint
kmp_init_nest_lock_hinted -> omp_init_nest_lock_with_hint
* __kmpc_critical_section_with_hint was added to support a critical section with
a hint (4.5 spec)
* __kmp_map_hint_to_lock was added to convert a hint (possibly a composite) to
an internal lock type
* kmpc_init_lock_with_hint and kmpc_init_nest_lock_with_hint were added as
internal entries for the hinted lock initializers. The preivous internal
functions (__kmp_init*) were moved to kmp_csupport.c and reused in multiple
places
* Added the two init functions to dllexports
* KMP_USE_DYNAMIC_LOCK is turned on if OMP_41_ENABLED is turned on
Differential Revision: http://reviews.llvm.org/D15205
llvm-svn: 255376
* Added a new user TSX lock implementation, RTM, This implementation is a
light-weight version of the adaptive lock implementation, omitting the
back-off logic for deciding when to specualte (or not). The fall-back lock is
still the queuing lock.
* Changed indirect lock table management. The data for indirect lock management
was encapsulated in the "kmp_indirect_lock_table_t" type. Also, the lock table
dimension was changed to 2D (was linear), and each entry is a
kmp_indirect_lock_t object now (was a pointer to an object).
* Some clean up in the critical section code
* Removed the limits of the tuning parameters read from KMP_ADAPTIVE_LOCK_PROPS
* KMP_USE_DYNAMIC_LOCK=1 also turns on these two switches:
KMP_USE_TSX, KMP_USE_ADAPTIVE_LOCKS
Differential Revision: http://reviews.llvm.org/D15204
llvm-svn: 255375
There are going to be two more patches which bring this feature up to date and in line with OpenMP 4.5.
* Renamed jump tables for the lock functions (and some clean up).
* Renamed some macros to be in KMP_ namespace.
* Return type of unset functions changed from void to int.
* Enabled use of _xebgin() et al. intrinsics for accessing TSX instructions.
Differential Revision: http://reviews.llvm.org/D15199
llvm-svn: 255373
Fix for crash in the teams construct in case user sets OMP_THREAD_LIMIT to a
number less than the number of processors. Now the number of threads will be
silently reduced if the user didn't specify teams parameters or with a
warning if the user specified teams parameters conflicting with
OMP_THREAD_LIMIT.
Differential Revision: http://reviews.llvm.org/D14732
llvm-svn: 254322
The task_team pointer is dereferenced unconditionally which causes a SEGFAULT
when it is NULL (e.g. for serialized parallel, that can happen for "teams"
construct or for "target nowait"). The solution is to skip second task team
setup for single thread team.
Differential Revision: http://reviews.llvm.org/D14729
llvm-svn: 254321
These changes allow libhwloc to be used as the topology discovery/affinity
mechanism for libomp. It is supported on Unices. The code additions:
* Canonicalize KMP_CPU_* interface macros so bitmask operations are
implementation independent and work with both hwloc bitmaps and libomp
bitmaps. So there are new KMP_CPU_ALLOC_* and KMP_CPU_ITERATE() macros and
the like. These are all in kmp.h and appropriately placed.
* Hwloc topology discovery code in kmp_affinity.cpp. This uses the hwloc
interface to create a libomp address2os object which the rest of libomp knows
how to handle already.
* To build, use -DLIBOMP_USE_HWLOC=on and
-DLIBOMP_HWLOC_INSTALL_DIR=/path/to/install/dir [default /usr/local]. If CMake
can't find the library or hwloc.h, then it will tell you and exit.
Differential Revision: http://reviews.llvm.org/D13991
llvm-svn: 254320
Fix ittnotify loop metadata reporting for schedule(runtime) and
chunked schedule set via OMP_SCHEDULE. The bug was that chunk=1
reported always.
llvm-svn: 252952
The patch adds support for ompt_event_task_switch into LLVM/OpenMP. Note that
the patch has also updated the signature of ompt_event_task_switch to
ompt_task_pair_callback_t (rather than the previous ompt_task_switch_callback_t).
Patch by Harald Servat
Differential Revision: http://reviews.llvm.org/D14566
llvm-svn: 252761
1) Add get_ptr_type() method to all wait flag types.
2) Flag in sleep_loc may change type by the time the resume is called from
__kmp_null_resume_wrapper. We use get_ptr_type to obtain the real type
and compare it to the casted object received. If they don't match, we know
the flag has changed (already resumed and replaced by another flag). If they
match, it doesn't hurt to go ahead and resume it.
Differential Revision: http://reviews.llvm.org/D14458
llvm-svn: 252487
1) When the number of threads in a team increases, new threads need to have all
their barrier struct fields initialized. We were missing the parent_bar and
team fields.
2) For non-forkjoin barriers, we now do the __kmp_task_team_setup before the
gather. The setup now sets up the task_team that all the threads will switch
to after the barrier, but it needs to be done before other threads do the
switch.
3) Remove an unneeded assignment of tt_found_tasks in task team free function.
Differential Revision: http://reviews.llvm.org/D14456
llvm-svn: 252486
These changes include:
1) Machine hierarchy now uses the base_num_threads field to indicate the
maximum number of threads the current hierarchy can handle without a resize.
2) In __kmp_get_hierarchy, we need to get depth after any potential resize
is done.
3) Cleanup of hierarchy resize code to support 1 above.
Differential Revision: http://reviews.llvm.org/D14455
llvm-svn: 252475
Setting dynamic schedule with chunk size 0 via omp_set_schedule(dynamic,0)
and then using "schedule (runtime)" causes infinite loop because for the
chunked dynamic schedule we didn't correct zero chunk to the default (1).
llvm-svn: 252338
Use of #ifdef OMPT_DEBUG was causing messages to be generated under normal
operation when the OpenMP library was compiled with KMP_DEBUG enabled.
Elsewhere, KMP_DEBUG evaluates assertions, but never produces messages during
normal operation. To avoid this inconsistency, set OMPT_DEBUG using a cmake
variable LIBOMP_OMPT_DEBUG.
While I was editing the associated ompt-specific.h and ompt-general.c files,
make the spacing and comments consistent.
Patch by John Mellor-Crummey
Differential Revision: http://reviews.llvm.org/D14355
llvm-svn: 252173
This is a refactoring of the task_team code that more elegantly handles the two
task_team case. Two task_teams per team are kept in use for the lifetime of the
team. Thus no reference counting is needed.
Differential Revision: http://reviews.llvm.org/D13993
llvm-svn: 252082
The problem is that the ompt_tool() function (which must be implemented by a
performance tool) should be defined in the RTL as well to cover the case when
the tool is not present in the address space of the process. This functionality
is accomplished with weak symbols in Unices. Unfortunately, Windows does not
support weak symbols.
The solution in these changes is to grab the list of all modules loaded by the
process and then search for symbol "ompt_tool()" within them. The function
ompt_tool_windows() performs the search of the ompt_tool symbol. If ompt_tool is
found, then its return value is used to initialize the tool. If ompt_tool is not
found, then ompt_tool_windows() returns NULL and OMPT is thus, disabled.
While doing these changes, the OMPT_SUPPORT detection in CMake was changed to
test for the required featuers for OMPT_SUPPORT, namely: builtin_frame_address()
existence, weak attribute existence and psapi.dll existence. For
LIBOMP_HAVE_OMPT_SUPPORT to be true, it must be that the builtin_frame_address()
intrinsic exists AND one of: either weak attributes exist or psapi.dll exists.
Also, since Process Status API is used I had to add new dependency -- psapi.dll
to the library dependency micro test.
Differential Revision: http://reviews.llvm.org/D14027
llvm-svn: 251654
The th.th_task_state for the master thread at the start of a nested parallel
should not be zeroed in __kmp_allocate_team() because it is later put in the
stack of states in __kmp_fork_call() for further re-use after exiting the
nested region. It is zeroed after being put in the stack.
Differential Revision: http://reviews.llvm.org/D13702
llvm-svn: 250847
Moved '@' from delimiters to offset designators for the KMP_PLACE_THREADS
environment variable. Only one of: postfix "o" or prefix @, should be used
in the value of KMP_PLACE_THREADS. For example, '2s@2,4c@2,1t'. This is also
the format of KMP_SETTINGS=1 output now (removed "o" from there).
e.g., 2s,2o,4c,2o,1t.
Differential Revision: http://reviews.llvm.org/D13701
llvm-svn: 250846
Without this fix, cancellation requests in one parallel region cause
cancellation of the second region even though the second one was
not intended to be cancelled.
llvm-svn: 250727
warnings similar to the following:
runtime/src/kmp_global.c:117:35: warning: implicit conversion from
'unsigned long' to 'int' changes value from 18446744073709551615 to -1
[-Wconstant-conversion]
int __kmp_sys_max_nth = KMP_MAX_NTH;
~~~~~~~~~~~~~~~~~ ^~~~~~~~~~~
runtime/src/kmp.h:849:34: note: expanded from macro 'KMP_MAX_NTH'
# define KMP_MAX_NTH PTHREAD_THREADS_MAX
^~~~~~~~~~~~~~~~~~~
Clamp KMP_MAX_NTH to INT_MAX to avoid these warnings. Also use INT_MAX
whenever PTHREAD_THREADS_MAX is not defined at all.
Differential Revision: http://reviews.llvm.org/D13827
llvm-svn: 250708
This fix implements the following OMPT events for the API locking routines:
* ompt_event_acquired_lock
* ompt_event_acquired_nest_lock_first
* ompt_event_acquired_nest_lock_next
* ompt_event_init_lock
* ompt_event_init_nest_lock
* ompt_event_destroy_lock
* ompt_event_destroy_nest_lock
For the acquired events the depths of the locks ist required, so a return value
was added similiar to the return values we already have for the release lock
routines.
Patch by Tim Cramer
Differential Revision: http://reviews.llvm.org/D13689
llvm-svn: 250526
* Avoid computing state needed only by OMPT unless the ompt_enabled flag is set.
* Properly handle a corner case in OMPT where team == NULL.
Patch by John Mellor-Crummey
Differential Revision: http://reviews.llvm.org/D13502
llvm-svn: 249857