Commit Graph

29 Commits

Author SHA1 Message Date
Vladimir Inđić f41d08540b [OpenMP][OMPT] thread_num determination during execution of nested serialized parallel regions
__ompt_get_task_info_internal function is adapted to support thread_num
determination during the execution of multiple nested serialized
parallel regions enclosed by a regular parallel region.

Consider the following program that contains parallel region R1 executed
by two threads. Let the worker thread T of region R1 executes serialized
parallel regions R2 that encloses another serialized parallel region R3.
Note that the thread T is the master thread of both R2 and R3 regions.

Assume that __ompt_get_task_info_internal function is called with the
argument "ancestor_level == 1" during the execution of region R3.
The function should determine the "thread_num" of the thread T inside
the team of region R2, whose implicit task is at level 1 inside the
hierarchy of active tasks. Since the thread T is the master thread of
region R2, one should expected that "thread_num" takes a value 0.
After the while loop finishes, the following stands: "lwt != NULL",
"prev_lwt == NULL", "prev_team" represents the team information about
the innermost serialized parallel region R3. This results in executing
the assignment "thread_num = prev_team->t.t_master_tid". Note that
"prev_team->t.t_master_tid" was initialized at the moment of
R2’s creation and represents the "thread_num" of the thread T inside
the region R1 which encloses R2. Since the thread T is the worker thread
of the region R1, "the thread_num" takes value 1, which is a contradiction.

This patch proposes to use "lwt" instead of "prev_lwt" when determining
the "thread_num". If "lwt" exists, the task at the requested level belongs
to the serialized parallel region. Since the serialized parallel region
is executed by one thread only, the "thread_num" takes value 0.

Similarly, assume that __ompt_get_task_info_internal function is called
with the argument "ancestor_level == 2" during the execution of region R3.
The function should determine the "thread_num" of the thread T inside the
team of region R1. Since the thread is the worker inside the region R1,
one should expected that "thread_num" takes value 1. After the loop finishes,
the following stands: "lwt == NULL", "prev_lwt != NULL", "prev_team" represents
the team information about the innermost serialized parallel region R3.
This leads to execution of the assignment "thread_num = 0", which causes
a contradiction.

Ignoring the "prev_lwt" leads to executing the assignment
"thread_num = prev_team->t.t_master_tid" instead. From the previous explanation,
it is obvious that "thread_num" takes value 1.

Note that the "prev_lwt" variable is marked as unnecessary and thus removed.

This patch introduces the test case which represents the OpenMP program
described earlier in the summary.

Differential Revision: https://reviews.llvm.org/D110699
2021-10-25 18:21:20 +02:00
Vladimir Inđić f2410bfb1c [OpenMP][OMPT][clang] task frame support fixed in __kmpc_fork_call
__kmp_fork_call sets the enter_frame of the active task (th_curren_task)
before new parallel region begins. After the region is finished, the
enter_frame is cleared.

The old implementation of __kmpc_fork_call didn’t clear the enter_frame of
active task.

Also, the way of initializing the enter_frame of the active task was wrong.
Consider the following two OpenMP programs.

The first program: Let R1 be the serialized parallel region that encloses
another serialized parallel region R2. Assume that thread that executes R2 is
going to create a new serialized parallel region R3 by executing
__kmpc_fork_call. This thread is responsible to set enter_frame of R2's
implicit task. Note that the information about R2's implicit task is present
inside master_th->th.th_current_task at this moment, while lwt represents the
information about R1's implicit task. The old implementation uses lwt and
resets enter_frame of R1's implicit task instead of R2's implicit task. The
new implementation uses master_th->th.th_current_task instead.

The second program: Consider the OpenMP program that contains parallel region
R1 which encloses an explicit task T. Assume that thread should create another
parallel region R2 during the execution of the T. The __kmpc_fork_call is
responsible to create R2 and set enter frame of T whose information is present
inside the master_th->th.th_current_task.
Old implementation tries to set the frame of
parent_team->t.t_implicit_task_taskdata[tid] which corresponds to the implicit
task of the R1, instead of T.

Differential Revision: https://reviews.llvm.org/D112419
2021-10-25 18:21:19 +02:00
Joachim Protze 35ab6d6390 [OpenMP][Tests][NFC] rename macro to avoid naming clash
When including <ostream>, the register_callback macro of the OMPT callback.h
clashes with a function defined in ostream. This patch renames the macro
and includes ompt into the macro name.
2021-02-24 18:03:54 +01:00
Joachim Protze 6d3b81664a [OpenMP][OMPT] Introduce a guard to handle OMPT return address
This is an alternative approach to address inconsistencies pointed out in: D90078
This patch makes sure that the return address is reset, when leaving the scope.
In some cases, I had to move the macro out of an if-statement to have it in the
right scope, in some cases I added an additional block to restrict the scope.

This patch does not handle inconsistencies, which might occur if the return
address is still set when we call into the application.

Test case (repeated_calls.c) provided by @hbae

Differential Revision: https://reviews.llvm.org/D91692
2020-11-25 18:17:44 +01:00
Jonas Hahnfeld a2748c74d6 [OMPT] Cleanup reset of exit_frame pointer
This is done at call-site and does not need to be handled in
__kmp_invoke_microtask. It was already absent from the x86
and x86_64 assembly, this patch removes it from the generic
implementation in z_Linux_util.cpp and adds documentation for
AArch64 and PPC64 that it's actually not needed. I can't test
on these architectures, so I don't want to change the code just
because it looks right :)

While at it, rename some variables for consistency and add a
check in test/ompt/parallel/normal.c that the pointer was reset
before entering the barrier.

Differential Revision: https://reviews.llvm.org/D64442

llvm-svn: 366721
2019-07-22 18:46:02 +00:00
Joachim Protze 48b8a4b519 [OMPT] Handling of the events of initial-task-begin and initial-task-end
OpenMP 5.0 says that the callback for the events initial-task-begin and
initial-task-end has to be ompt_callback_implicit_task.

Patch by Tim Cramer

Differential Revision: https://reviews.llvm.org/D58776

llvm-svn: 361157
2019-05-20 14:21:36 +00:00
Joachim Protze 2b46d30fc7 [OMPT] Second chunk of final OMPT 5.0 interface updates
The omp-tools.h file is generated from the OpenMP spec to ensure that the interface
is implemented as specified.
The other changes are necessary to update the interface implementation to the
final version as published in 5.0.
The omp-tools.h header was previously called ompt.h, currently a copy under this name
is installed for legacy tools.

Patch partially perpared by @sconvent

Reviewers: AndreyChurbanov, hbae, Hahnfeld

Reviewed By: hbae

Tags: #openmp, #ompt

Differential Revision: https://reviews.llvm.org/D55579

llvm-svn: 351197
2019-01-15 15:36:53 +00:00
Joachim Protze cf80e72e30 [Tests] fix non-determinism failure in testcase
llvm-svn: 349460
2018-12-18 08:57:23 +00:00
Joachim Protze 0e0d6cdd58 [OMPT] First chunk of final OMPT 5.0 interface updates
This patch updates the implementation of the ompt_frame_t, ompt_wait_id_t
and ompt_state_t. The final version of the OpenMP 5.0 spec added the "t"
for these types.
Furthermore the structure for ompt_frame_t changed and allows to specify
that the reenter frame belongs to the runtime.

Patch partially prepared by Simon Convent

Reviewers: hbae
llvm-svn: 349458
2018-12-18 08:52:30 +00:00
Joachim Protze 1f7d4aca8d [OMPT] Add testcase for thread_num provided by implicit task events
llvm-svn: 349457
2018-12-18 08:52:12 +00:00
Jonas Hahnfeld ba5ec9c684 [OMPT] Fix typo in test parallel/nested_thread_num.c
This caused test failures with GCC since its initial commit in
r336085 (https://reviews.llvm.org/D46533).

llvm-svn: 337911
2018-07-25 12:34:31 +00:00
Joachim Protze 4a73ae167e [OMPT] Provide the right thread_num for ancestor levels
The current implementation always provides the thread-num for the current
parallel region. This patch fixes the behavior for ancestor levels >0.

Differential Revision: https://reviews.llvm.org/D46533

llvm-svn: 336085
2018-07-02 09:13:24 +00:00
Jonas Hahnfeld 3c6595d65d [OMPT] Fix test parallel/not_enough_threads.c
Upcoming changes to FileCheck will modify CHECK-DAG to not match
overlapping regions of the input. This test was found to be affected
because it expects to find four threads to invoke events of type
ompt_event_implicit_task_begin. It turns out this is wrong because
OMP_THREAD_LIMIT is set to 2, so there are only two threads. The
rest of the test got it right so it went unnoticed until now.

(Rewrite test and apply clang-format to it as discussed in the past.)

Differential Revision: https://reviews.llvm.org/D47119

llvm-svn: 333361
2018-05-27 17:07:38 +00:00
Jonas Hahnfeld 82768d0ba1 [OMPT] Fix parallel_data in implicit barrier-end
This is required to be NULL for implicit barriers at the end of a
parallel region. Noticed in review of D43191.

Differential Revision: https://reviews.llvm.org/D43308

llvm-svn: 325922
2018-02-23 16:46:25 +00:00
Paul Osmialowski 6b8141acdd [OMPT] Add missing initialization in nested_lwt.c test case
Without this initialization this test case tend to fail.

Differential Revision: https://reviews.llvm.org/D41542

llvm-svn: 321379
2017-12-22 19:24:06 +00:00
Joachim Protze 0e2a2571ca [OMPT] Use frames at different level when using clang version 5 or higher with debug flag
Clang 5 or higher adds an intermediate function call in certain cases when
compiling with debug flag. This revision updates the testcases to work
correctly.

Differential Revision: https://reviews.llvm.org/D40595

llvm-svn: 321263
2017-12-21 13:55:29 +00:00
Joachim Protze 633bc4ca99 [OMPT] Add annotations to testcases that are expected to fail when using certain compilers
Reasons for expected failures are mainly bugs when using lables in OpenMP regions
or missing support of some OpenMP features.
For some worksharing clauses, support to distinguish the kind of workshare was
added just recently.

If an issue was fixed in a minor release version of a compiler, we flag the
test as unsupported for this compiler version to avoid false positives.
Same for fixes that where backported to older compiler versions.

Differential Revision: https://reviews.llvm.org/D40384

llvm-svn: 321262
2017-12-21 13:55:16 +00:00
Jonas Hahnfeld ba84ca9efb [OMPT] Fix null pointer in parallel/no_thread_num_clause.c
Looks like the implementation of printf on Darwin uses "0x0"
instead of "(nil)" like glibc does.

llvm-svn: 317515
2017-11-06 22:06:14 +00:00
Joachim Protze 82e94a5934 Update implementation of OMPT to the specification OpenMP 5.0 Preview 1 (TR4).
The code is tested to work with latest clang, GNU and Intel compiler. The implementation
is optimized for low overhead when no tool is attached shifting the cost to execution with
tool attached.

This patch does not implement OMPT for libomptarget.

Patch by Simon Convent and Joachim Protze

Differential Revision: https://reviews.llvm.org/D38185

llvm-svn: 317085
2017-11-01 10:08:30 +00:00
Jonas Hahnfeld 848d690697 [OMPT] fix task frame information for gomp interface
Previous differencials D23305-D23310 changed task frame information management only for the kmp interface, but not for the whole gomp interface. This broke some testcases when building with gcc.
This patch fixes the broken task frame information for the gomp interface.

Patch by Joachim Protze!

Differential Revision: https://reviews.llvm.org/D24502

llvm-svn: 281468
2016-09-14 13:59:39 +00:00
Jonas Hahnfeld dd9a05d5d8 [OMPT] save exit address to lwt if available
In case, the current team is a serialized team (lwt), the frame information should be written to this data structure.
Before, nested serialized teams would overwrite the same task information.

Patch by Joachim Protze!

Differential Revision: https://reviews.llvm.org/D23310

llvm-svn: 281467
2016-09-14 13:59:31 +00:00
Jonas Hahnfeld 28ea24bba7 [OMPT] fix __ompt_get_teaminfo to consult lwt entries of parent teams
The comment already states, that this function should work similarly as __ompt_get_taskinfo.

The function only looked for lwt entries of the current team, but not when unrolling the parents. This fix aligns the implementation to __ompt_get_taskinfo.

The new test case creates a single theaded team (->lwt) and then a nested active team.
Before the innermost print_id(1) would deliver a different team then the outer print_id(0).

Patch by Joachim Protze!

Differential Revision: https://reviews.llvm.org/D23309

llvm-svn: 281466
2016-09-14 13:59:24 +00:00
Jonas Hahnfeld 8a27064e05 [OMPT] Reset task exit frame when execution is finished
The exit address is set when execution of a task is started and should be reset as soon as the execution is finished.
Especially for the asm implementation of __kmp_invoke_microtask, resetting in this call would be painfull, so reset just after the invokation.

The testcase shows the effect of this patch:
Before, the implicit barriers at the end of an implicit task would see an exit address for the implicit task.

This barrier is a task scheduling point. Thus, any explicit task scheduled there would see an exit, but no reenter address for the implicit task.

Patch by Joachim Protze!

Differential Revision: https://reviews.llvm.org/D23307

llvm-svn: 281465
2016-09-14 13:59:19 +00:00
Jonas Hahnfeld fd0614d830 [OMPT] Align implementation of reenter frame address to latest (frozen) version of OMPT spec
The latest OMPT spec changed the semantic of a tasks reenter frame to be the application frame, that will be entered, when the runtime frame drops.
Before it was the last frame in the runtime. This doesn't work for some gcc execution pathes or even clang generated code for :
Since there is no runtime frame between the executed task and the encountering task.

The test case compares exit and reenter addresses against addresses captured in application code

Patch by Joachim Protze!

Differential Revision: https://reviews.llvm.org/D23305

llvm-svn: 281464
2016-09-14 13:59:13 +00:00
Jonas Hahnfeld 464cdca9d3 [OMPT] extend ompt tests by checks for frame pointers
OMPT tests can check for right frame information of tasks:
 * parent_task_frame was directly printed as a pointer, but actually points to a struct ompt_frame {void*, void*}
 * NULL is printed in the beginning of execution and loaded to FileChecker variable [[NULL]]
 * implicit tasks now also print their frame information
 * macro to print frame address from application
 * print task info for barrier begin

Patch by Joachim Protze!

Differential Revision: https://reviews.llvm.org/D23304

llvm-svn: 281463
2016-09-14 13:59:05 +00:00
Jonas Hahnfeld 801fe9bbe2 [OMPT] Test ids reported by ompt_get_{parallel,task}_id
llvm-svn: 264265
2016-03-24 12:52:11 +00:00
Jonas Hahnfeld 1c1c71776a [OMPT] Fix duplicate implicit_task_end events for master thread with GCC
For non-serialized parallel regions the master thread issued two callbacks:
The first one in kmp_gsupport.c and the second in __kmp_join_call. Therefore
only trigger the callback in kmp_gsupport.c for serialized parallel regions.

Differential Revision: http://reviews.llvm.org/D16716

llvm-svn: 264264
2016-03-24 12:52:04 +00:00
Jonas Hahnfeld b1cad2954b [OMPT] Make tests require OMPT_BLAME
ompt_event_barrier_{begin,end} are optional blame events.
In total it doesn't make any sense to test partially built OMPT support.

llvm-svn: 264031
2016-03-22 08:23:24 +00:00
Jonas Hahnfeld c804301113 [OMPT] Create infrastructure and add first tests for OMPT
Some basic checks next to the implementation should futher lower the
possibility to introduce regressions. (Note that this would have catched
the ordering issue fixed in rL258866 and pointed to rL263940.)

The tests are implementation dependent in one point because they assume that
thread ids are assigned in ascending order. This is not defined by the standard
but currently ensured in libomp. We have to think about another way of ordering
the threads should this ever be subject to change...

Note that this isn't aiming at replacing the implementation independent
test-suite at https://github.com/OpenMPToolsInterface/ompt-test-suite!

Differential Revision: http://reviews.llvm.org/D16715

llvm-svn: 264027
2016-03-22 07:22:49 +00:00