Our conditional writes in the runtime look like this:
```
if (active)
*ptr = value;
```
In the RAII we need to assign `ptr` which comes from a lookup call.
If a thread that is not the main thread calls lookup with the intention
to write the pointer, we'll create a new thread state. As such, we need
to avoid calling lookup for inactive threads. We used to use `nullptr`
as their `ptr` value but that can cause pessimistic reasoning. We now
use `undef` instead.
Differential Revision: https://reviews.llvm.org/D130114
If we have a broken assumption we want to print a message to the user.
If the assumption is broken by many threads in many teams this can
become a problem. To avoid it we use a hash that tracks if a broken
assumption has (likely) been printed and avoid printing it again. This
is not fool proof and has some caveats that might cause problems in
the future (see comment) but it should improve the situation
considerably for now.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D112156
Summary:
The thread ID function was reintroduced in D110195, but could
potentially be removed by the optimizer. Make the function noinline to
preserve the call sites and add it to the externalization RAII so its
definition is not removed by the attributor.
The "old" OpenMP GPU device runtime (D14254) has served us well for many
years but modernizing it has caused some pain recently. This patch
introduces an alternative which is mostly written from scratch embracing
OpenMP 5.X, C++, LLVM coding style (where applicable), and conceptual
interfaces. This new runtime is opt-in through a clang flag (D106793).
The new runtime is currently only build for nvptx and has "-new" in its
name.
The design is tailored towards middle-end optimizations rather than
front-end code generation choices, a trend we already started in the old
runtime a while back. In contrast to the old one, state is organized in
a simple manner rather than a "smart" one. While this can induce costs
it helps optimizations. Our expectation is that the majority of codes
can be optimized and a "simple" design is therefore preferable. The new
runtime does also avoid users to pay for things they do not use,
especially wrt. memory. The unlikely case of nested parallelism is
supported but costly to make the more likely case use less resources.
The worksharing and reduction implementation have been taken from the
old runtime and will be rewritten in the future if necessary.
Documentation and debug features are still mostly missing and will be
added over time.
All external symbols start with `__kmpc` for legacy reasons but should
be renamed once we switch over to a single runtime. All internal symbols
are placed in appropriate namespaces (anonymous or `_OMP`) to avoid name
clashes with user symbols.
Differential Revision: https://reviews.llvm.org/D106803