__ompt_get_task_info_internal function is adapted to support thread_num
determination during the execution of multiple nested serialized
parallel regions enclosed by a regular parallel region.
Consider the following program that contains parallel region R1 executed
by two threads. Let the worker thread T of region R1 executes serialized
parallel regions R2 that encloses another serialized parallel region R3.
Note that the thread T is the master thread of both R2 and R3 regions.
Assume that __ompt_get_task_info_internal function is called with the
argument "ancestor_level == 1" during the execution of region R3.
The function should determine the "thread_num" of the thread T inside
the team of region R2, whose implicit task is at level 1 inside the
hierarchy of active tasks. Since the thread T is the master thread of
region R2, one should expected that "thread_num" takes a value 0.
After the while loop finishes, the following stands: "lwt != NULL",
"prev_lwt == NULL", "prev_team" represents the team information about
the innermost serialized parallel region R3. This results in executing
the assignment "thread_num = prev_team->t.t_master_tid". Note that
"prev_team->t.t_master_tid" was initialized at the moment of
R2’s creation and represents the "thread_num" of the thread T inside
the region R1 which encloses R2. Since the thread T is the worker thread
of the region R1, "the thread_num" takes value 1, which is a contradiction.
This patch proposes to use "lwt" instead of "prev_lwt" when determining
the "thread_num". If "lwt" exists, the task at the requested level belongs
to the serialized parallel region. Since the serialized parallel region
is executed by one thread only, the "thread_num" takes value 0.
Similarly, assume that __ompt_get_task_info_internal function is called
with the argument "ancestor_level == 2" during the execution of region R3.
The function should determine the "thread_num" of the thread T inside the
team of region R1. Since the thread is the worker inside the region R1,
one should expected that "thread_num" takes value 1. After the loop finishes,
the following stands: "lwt == NULL", "prev_lwt != NULL", "prev_team" represents
the team information about the innermost serialized parallel region R3.
This leads to execution of the assignment "thread_num = 0", which causes
a contradiction.
Ignoring the "prev_lwt" leads to executing the assignment
"thread_num = prev_team->t.t_master_tid" instead. From the previous explanation,
it is obvious that "thread_num" takes value 1.
Note that the "prev_lwt" variable is marked as unnecessary and thus removed.
This patch introduces the test case which represents the OpenMP program
described earlier in the summary.
Differential Revision: https://reviews.llvm.org/D110699
__kmp_fork_call sets the enter_frame of the active task (th_curren_task)
before new parallel region begins. After the region is finished, the
enter_frame is cleared.
The old implementation of __kmpc_fork_call didn’t clear the enter_frame of
active task.
Also, the way of initializing the enter_frame of the active task was wrong.
Consider the following two OpenMP programs.
The first program: Let R1 be the serialized parallel region that encloses
another serialized parallel region R2. Assume that thread that executes R2 is
going to create a new serialized parallel region R3 by executing
__kmpc_fork_call. This thread is responsible to set enter_frame of R2's
implicit task. Note that the information about R2's implicit task is present
inside master_th->th.th_current_task at this moment, while lwt represents the
information about R1's implicit task. The old implementation uses lwt and
resets enter_frame of R1's implicit task instead of R2's implicit task. The
new implementation uses master_th->th.th_current_task instead.
The second program: Consider the OpenMP program that contains parallel region
R1 which encloses an explicit task T. Assume that thread should create another
parallel region R2 during the execution of the T. The __kmpc_fork_call is
responsible to create R2 and set enter frame of T whose information is present
inside the master_th->th.th_current_task.
Old implementation tries to set the frame of
parent_team->t.t_implicit_task_taskdata[tid] which corresponds to the implicit
task of the R1, instead of T.
Differential Revision: https://reviews.llvm.org/D112419
When including <ostream>, the register_callback macro of the OMPT callback.h
clashes with a function defined in ostream. This patch renames the macro
and includes ompt into the macro name.
This is an alternative approach to address inconsistencies pointed out in: D90078
This patch makes sure that the return address is reset, when leaving the scope.
In some cases, I had to move the macro out of an if-statement to have it in the
right scope, in some cases I added an additional block to restrict the scope.
This patch does not handle inconsistencies, which might occur if the return
address is still set when we call into the application.
Test case (repeated_calls.c) provided by @hbae
Differential Revision: https://reviews.llvm.org/D91692
This is done at call-site and does not need to be handled in
__kmp_invoke_microtask. It was already absent from the x86
and x86_64 assembly, this patch removes it from the generic
implementation in z_Linux_util.cpp and adds documentation for
AArch64 and PPC64 that it's actually not needed. I can't test
on these architectures, so I don't want to change the code just
because it looks right :)
While at it, rename some variables for consistency and add a
check in test/ompt/parallel/normal.c that the pointer was reset
before entering the barrier.
Differential Revision: https://reviews.llvm.org/D64442
llvm-svn: 366721
OpenMP 5.0 says that the callback for the events initial-task-begin and
initial-task-end has to be ompt_callback_implicit_task.
Patch by Tim Cramer
Differential Revision: https://reviews.llvm.org/D58776
llvm-svn: 361157
The omp-tools.h file is generated from the OpenMP spec to ensure that the interface
is implemented as specified.
The other changes are necessary to update the interface implementation to the
final version as published in 5.0.
The omp-tools.h header was previously called ompt.h, currently a copy under this name
is installed for legacy tools.
Patch partially perpared by @sconvent
Reviewers: AndreyChurbanov, hbae, Hahnfeld
Reviewed By: hbae
Tags: #openmp, #ompt
Differential Revision: https://reviews.llvm.org/D55579
llvm-svn: 351197
This patch updates the implementation of the ompt_frame_t, ompt_wait_id_t
and ompt_state_t. The final version of the OpenMP 5.0 spec added the "t"
for these types.
Furthermore the structure for ompt_frame_t changed and allows to specify
that the reenter frame belongs to the runtime.
Patch partially prepared by Simon Convent
Reviewers: hbae
llvm-svn: 349458
The current implementation always provides the thread-num for the current
parallel region. This patch fixes the behavior for ancestor levels >0.
Differential Revision: https://reviews.llvm.org/D46533
llvm-svn: 336085
Upcoming changes to FileCheck will modify CHECK-DAG to not match
overlapping regions of the input. This test was found to be affected
because it expects to find four threads to invoke events of type
ompt_event_implicit_task_begin. It turns out this is wrong because
OMP_THREAD_LIMIT is set to 2, so there are only two threads. The
rest of the test got it right so it went unnoticed until now.
(Rewrite test and apply clang-format to it as discussed in the past.)
Differential Revision: https://reviews.llvm.org/D47119
llvm-svn: 333361
This is required to be NULL for implicit barriers at the end of a
parallel region. Noticed in review of D43191.
Differential Revision: https://reviews.llvm.org/D43308
llvm-svn: 325922
Clang 5 or higher adds an intermediate function call in certain cases when
compiling with debug flag. This revision updates the testcases to work
correctly.
Differential Revision: https://reviews.llvm.org/D40595
llvm-svn: 321263
Reasons for expected failures are mainly bugs when using lables in OpenMP regions
or missing support of some OpenMP features.
For some worksharing clauses, support to distinguish the kind of workshare was
added just recently.
If an issue was fixed in a minor release version of a compiler, we flag the
test as unsupported for this compiler version to avoid false positives.
Same for fixes that where backported to older compiler versions.
Differential Revision: https://reviews.llvm.org/D40384
llvm-svn: 321262
The code is tested to work with latest clang, GNU and Intel compiler. The implementation
is optimized for low overhead when no tool is attached shifting the cost to execution with
tool attached.
This patch does not implement OMPT for libomptarget.
Patch by Simon Convent and Joachim Protze
Differential Revision: https://reviews.llvm.org/D38185
llvm-svn: 317085
Previous differencials D23305-D23310 changed task frame information management only for the kmp interface, but not for the whole gomp interface. This broke some testcases when building with gcc.
This patch fixes the broken task frame information for the gomp interface.
Patch by Joachim Protze!
Differential Revision: https://reviews.llvm.org/D24502
llvm-svn: 281468
In case, the current team is a serialized team (lwt), the frame information should be written to this data structure.
Before, nested serialized teams would overwrite the same task information.
Patch by Joachim Protze!
Differential Revision: https://reviews.llvm.org/D23310
llvm-svn: 281467
The comment already states, that this function should work similarly as __ompt_get_taskinfo.
The function only looked for lwt entries of the current team, but not when unrolling the parents. This fix aligns the implementation to __ompt_get_taskinfo.
The new test case creates a single theaded team (->lwt) and then a nested active team.
Before the innermost print_id(1) would deliver a different team then the outer print_id(0).
Patch by Joachim Protze!
Differential Revision: https://reviews.llvm.org/D23309
llvm-svn: 281466
The exit address is set when execution of a task is started and should be reset as soon as the execution is finished.
Especially for the asm implementation of __kmp_invoke_microtask, resetting in this call would be painfull, so reset just after the invokation.
The testcase shows the effect of this patch:
Before, the implicit barriers at the end of an implicit task would see an exit address for the implicit task.
This barrier is a task scheduling point. Thus, any explicit task scheduled there would see an exit, but no reenter address for the implicit task.
Patch by Joachim Protze!
Differential Revision: https://reviews.llvm.org/D23307
llvm-svn: 281465
The latest OMPT spec changed the semantic of a tasks reenter frame to be the application frame, that will be entered, when the runtime frame drops.
Before it was the last frame in the runtime. This doesn't work for some gcc execution pathes or even clang generated code for :
Since there is no runtime frame between the executed task and the encountering task.
The test case compares exit and reenter addresses against addresses captured in application code
Patch by Joachim Protze!
Differential Revision: https://reviews.llvm.org/D23305
llvm-svn: 281464
OMPT tests can check for right frame information of tasks:
* parent_task_frame was directly printed as a pointer, but actually points to a struct ompt_frame {void*, void*}
* NULL is printed in the beginning of execution and loaded to FileChecker variable [[NULL]]
* implicit tasks now also print their frame information
* macro to print frame address from application
* print task info for barrier begin
Patch by Joachim Protze!
Differential Revision: https://reviews.llvm.org/D23304
llvm-svn: 281463
For non-serialized parallel regions the master thread issued two callbacks:
The first one in kmp_gsupport.c and the second in __kmp_join_call. Therefore
only trigger the callback in kmp_gsupport.c for serialized parallel regions.
Differential Revision: http://reviews.llvm.org/D16716
llvm-svn: 264264
Some basic checks next to the implementation should futher lower the
possibility to introduce regressions. (Note that this would have catched
the ordering issue fixed in rL258866 and pointed to rL263940.)
The tests are implementation dependent in one point because they assume that
thread ids are assigned in ascending order. This is not defined by the standard
but currently ensured in libomp. We have to think about another way of ordering
the threads should this ever be subject to change...
Note that this isn't aiming at replacing the implementation independent
test-suite at https://github.com/OpenMPToolsInterface/ompt-test-suite!
Differential Revision: http://reviews.llvm.org/D16715
llvm-svn: 264027