A declare variant template is only compatible with a base when the
number of template arguments is equal, otherwise our instantiations will
produce nonsensical results.
Exposes as part of D109344.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D109770
Adds initial parsing and sema for the 'append_args' clause.
Note that an AST clause is not created as it instead adds its values
to the OMPDeclareVariantAttr.
Differential Revision: https://reviews.llvm.org/D111854
Adds initial parsing and sema for the 'adjust_args' clause.
Note that an AST clause is not created as it instead adds its expressions
to the OMPDeclareVariantAttr.
Differential Revision: https://reviews.llvm.org/D99905
Clang would reject
#pragma omp for
#pragma omp tile sizes(P)
for (int i = 0; i < 128; ++i) {}
where P is a template parameter, but the loop itself is not
template-dependent. Because P context-dependent, the TransformedStmt
cannot be generated and therefore is nullptr (until the template is
instantiated by TreeTransform). The OMPForDirective would still expect
the a loop is the dependent context and trigger an error.
Fix by introducing a NumGeneratedLoops field to OMPLoopTransformation.
This is used to distinguish the case where no TransformedStmt will be
generated at all (e.g. #pragma omp unroll full) and template
instantiation is needed. In the latter case, delay resolving the
iteration space like when the for-loop itself is template-dependent
until the template instatiation.
A more radical solution would always delay the iteration space analysis
until template instantiation, but would also break many test cases.
Reviewed By: ABataev
Differential Revision: https://reviews.llvm.org/D111124
Insert OMPLoopTransformationDirective between OMPLoopBasedDirective and the loop transformations OMPTileDirective and OMPUnrollDirective. This simplifies handling of loop transformations not requiring distinguishing between OMPTileDirective and OMPUnrollDirective anymore.
Reviewed By: ABataev
Differential Revision: https://reviews.llvm.org/D111119
Stop using APInt constructors and methods that were soft-deprecated in
D109483. This fixes all the uses I found in clang.
Differential Revision: https://reviews.llvm.org/D110808
Parallel regions are outlined as functions with capture variables explicitly generated as distinct parameters in the function's argument list. That complicates the fork_call interface in the OpenMP runtime: (1) the fork_call is variadic since there is a variable number of arguments to forward to the outlined function, (2) wrapping/unwrapping arguments happens in the OpenMP runtime, which is sub-optimal, has been a source of ABI bugs, and has a hardcoded limit (16) in the number of arguments, (3) forwarded arguments must cast to pointer types, which complicates debugging. This patch avoids those issues by aggregating captured arguments in a struct to pass to the fork_call.
Reviewed By: jdoerfert, jhuber6
Differential Revision: https://reviews.llvm.org/D102107
This patch supports OpenMP 5.0 metadirective features.
It is implemented keeping the OpenMP 5.1 features like dynamic user condition in mind.
A new function, getBestWhenMatchForContext, is defined in llvm/Frontend/OpenMP/OMPContext.h
Currently this function return the index of the when clause with the highest score from the ones applicable in the Context.
But this function is declared with an array which can be used in OpenMP 5.1 implementation to select all the valid when clauses which can be resolved in runtime. Currently this array is set to null by default and its implementation is left for future.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D91944
This patch supports OpenMP 5.0 metadirective features.
It is implemented keeping the OpenMP 5.1 features like dynamic user condition in mind.
A new function, getBestWhenMatchForContext, is defined in llvm/Frontend/OpenMP/OMPContext.h
Currently this function return the index of the when clause with the highest score from the ones applicable in the Context.
But this function is declared with an array which can be used in OpenMP 5.1 implementation to select all the valid when clauses which can be resolved in runtime. Currently this array is set to null by default and its implementation is left for future.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D91944
This patch supports OpenMP 5.0 metadirective features.
It is implemented keeping the OpenMP 5.1 features like dynamic user condition in mind.
A new function, getBestWhenMatchForContext, is defined in llvm/Frontend/OpenMP/OMPContext.h
Currently this function return the index of the when clause with the highest score from the ones applicable in the Context.
But this function is declared with an array which can be used in OpenMP 5.1 implementation to select all the valid when clauses which can be resolved in runtime. Currently this array is set to null by default and its implementation is left for future.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D91944
This patch supports construct trait set selector by using the existed
declare variant infrastructure inside `OMPContext` and simd selector is
currently not supported. The goal of this patch is to pass the declare variant
test inside sollve test suite.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D109635
Since these assumptions are coming from OpenMP it makes sense to mark
them as such in the generic IR encoding. Standardized assumptions will
be named
omp_ASSUMPTION_NAME
and extensions will be named
ompx_ASSUMPTION_NAME
which is the OpenMP 5.2 syntax for "extensions" of any kind.
This also matches what the OpenMP-Opt pass expects.
Summarized,
#pragma omp [...] assume[s] no_parallelism
now generates the same IR assumption annotation as
__attribute__((assume("omp_no_parallelism")))
Reviewed By: jhuber6
Differential Revision: https://reviews.llvm.org/D105937
This renames the primary methods for creating a zero value to `getZero`
instead of `getNullValue` and renames predicates like `isAllOnesValue`
to simply `isAllOnes`. This achieves two things:
1) This starts standardizing predicates across the LLVM codebase,
following (in this case) ConstantInt. The word "Value" doesn't
convey anything of merit, and is missing in some of the other things.
2) Calling an integer "null" doesn't make any sense. The original sin
here is mine and I've regretted it for years. This moves us to calling
it "zero" instead, which is correct!
APInt is widely used and I don't think anyone is keen to take massive source
breakage on anything so core, at least not all in one go. As such, this
doesn't actually delete any entrypoints, it "soft deprecates" them with a
comment.
Included in this patch are changes to a bunch of the codebase, but there are
more. We should normalize SelectionDAG and other APIs as well, which would
make the API change more mechanical.
Differential Revision: https://reviews.llvm.org/D109483
Recommit of 707ce34b06. Don't introduce a
dependency to the LLVMPasses component, instead register the required
passes individually.
Add methods for loop unrolling to the OpenMPIRBuilder class and use them in Clang if `-fopenmp-enable-irbuilder` is enabled. The unrolling methods are:
* `unrollLoopFull`
* `unrollLoopPartial`
* `unrollLoopHeuristic`
`unrollLoopPartial` and `unrollLoopHeuristic` can use compiler heuristics to automatically determine the unroll factor. If possible, that is if no CanonicalLoopInfo is required to pass to another method, metadata for LLVM's LoopUnrollPass is added. Otherwise the unroll factor is determined using the same heurstics as user by LoopUnrollPass. Not requiring a CanonicalLoopInfo, especially with `unrollLoopHeuristic` allows greater flexibility.
With full unrolling and partial unrolling with known unroll factor, instead of duplicating instructions by the OpenMPIRBuilder, the full unroll is still delegated to the LoopUnrollPass. In case of partial unrolling the loop is first tiled using the existing `tileLoops` methods, then the inner loop fully unrolled using the same mechanism.
Reviewed By: jdoerfert, kiranchandramohan
Differential Revision: https://reviews.llvm.org/D107764
Add support for ordered directive in the OpenMPIRBuilder.
This patch also modidies clang to use the ordered directive when the
option -fopenmp-enable-irbuilder is enabled.
Also fix one ICE when parsing one canonical for loop with the relational
operator LE or GE in openmp region by replacing unary increment
operation of the expression of the variable "Expr A" minus the variable
"Expr B" (++(Expr A - Expr B)) with binary addition operation of the
experssion of the variable "Expr A" minus the variable "Expr B" and the
expression with constant value "1" (Expr A - Expr B + "1").
Reviewed By: Meinersbur, kiranchandramohan
Differential Revision: https://reviews.llvm.org/D107430
Breaks build with -DBUILD_SHARED_LIBS=ON
```
CMake Error: The inter-target dependency graph contains the following strongly connected component (cycle):
"LLVMFrontendOpenMP" of type SHARED_LIBRARY
depends on "LLVMPasses" (weak)
"LLVMipo" of type SHARED_LIBRARY
depends on "LLVMFrontendOpenMP" (weak)
"LLVMCoroutines" of type SHARED_LIBRARY
depends on "LLVMipo" (weak)
"LLVMPasses" of type SHARED_LIBRARY
depends on "LLVMCoroutines" (weak)
depends on "LLVMipo" (weak)
At least one of these targets is not a STATIC_LIBRARY. Cyclic dependencies are allowed only among static libraries.
CMake Generate step failed. Build files cannot be regenerated correctly.
```
This reverts commit 707ce34b06.
Add methods for loop unrolling to the OpenMPIRBuilder class and use them in Clang if `-fopenmp-enable-irbuilder` is enabled. The unrolling methods are:
* `unrollLoopFull`
* `unrollLoopPartial`
* `unrollLoopHeuristic`
`unrollLoopPartial` and `unrollLoopHeuristic` can use compiler heuristics to automatically determine the unroll factor. If possible, that is if no CanonicalLoopInfo is required to pass to another method, metadata for LLVM's LoopUnrollPass is added. Otherwise the unroll factor is determined using the same heurstics as user by LoopUnrollPass. Not requiring a CanonicalLoopInfo, especially with `unrollLoopHeuristic` allows greater flexibility.
With full unrolling and partial unrolling with known unroll factor, instead of duplicating instructions by the OpenMPIRBuilder, the full unroll is still delegated to the LoopUnrollPass. In case of partial unrolling the loop is first tiled using the existing `tileLoops` methods, then the inner loop fully unrolled using the same mechanism.
Reviewed By: jdoerfert, kiranchandramohan
Differential Revision: https://reviews.llvm.org/D107764
This patch implements Clang support for an original OpenMP extension
we have developed to support OpenACC: the `ompx_hold` map type
modifier. The next patch in this series, D106510, implements OpenMP
runtime support.
Consider the following example:
```
#pragma omp target data map(ompx_hold, tofrom: x) // holds onto mapping of x
{
foo(); // might have map(delete: x)
#pragma omp target map(present, alloc: x) // x is guaranteed to be present
printf("%d\n", x);
}
```
The `ompx_hold` map type modifier above specifies that the `target
data` directive holds onto the mapping for `x` throughout the
associated region regardless of any `target exit data` directives
executed during the call to `foo`. Thus, the presence assertion for
`x` at the enclosed `target` construct cannot fail. (As usual, the
standard OpenMP reference count for `x` must also reach zero before
the data is unmapped.)
Justification for inclusion in Clang and LLVM's OpenMP runtime:
* The `ompx_hold` modifier supports OpenACC functionality (structured
reference count) that cannot be achieved in standard OpenMP, as of
5.1.
* The runtime implementation for `ompx_hold` (next patch) will thus be
used by Flang's OpenACC support.
* The Clang implementation for `ompx_hold` (this patch) as well as the
runtime implementation are required for the Clang OpenACC support
being developed as part of the ECP Clacc project, which translates
OpenACC to OpenMP at the directive AST level. These patches are the
first step in upstreaming OpenACC functionality from Clacc.
* The Clang implementation for `ompx_hold` is also used by the tests
in the runtime implementation. That syntactic support makes the
tests more readable than low-level runtime calls can. Moreover,
upstream Flang and Clang do not yet support OpenACC syntax
sufficiently for writing the tests.
* More generally, the Clang implementation enables a clean separation
of concerns between OpenACC and OpenMP development in LLVM. That
is, LLVM's OpenMP developers can discuss, modify, and debug LLVM's
extended OpenMP implementation and test suite without directly
considering OpenACC's language and execution model, which can be
handled by LLVM's OpenACC developers.
* OpenMP users might find the `ompx_hold` modifier useful, as in the
above example.
See new documentation introduced by this patch in `openmp/docs` for
more detail on the functionality of this extension and its
relationship with OpenACC. For example, it explains how the runtime
must support two reference counts, as specified by OpenACC.
Clang recognizes `ompx_hold` unless `-fno-openmp-extensions`, a new
command-line option introduced by this patch, is specified.
Reviewed By: ABataev, jdoerfert, protze.joachim, grokos
Differential Revision: https://reviews.llvm.org/D106509
A new rule is added in 5.0:
If a list item appears in a reduction, lastprivate or linear clause
on a combined target construct then it is treated as if it also appears
in a map clause with a map-type of tofrom.
Currently map clauses for all capture variables are added implicitly.
But missing for list item of expression for array elements or array
sections.
The change is to add implicit map clause for array of elements used in
reduction clause. Skip adding map clause if the expression is not
mappable.
Noted: For linear and lastprivate, since only variable name is
accepted, the map has been added though capture variables.
To do so:
During the mappable checking, if error, ignore diagnose and skip
adding implicit map clause.
The changes:
1> Add code to generate implicit map in ActOnOpenMPExecutableDirective,
for omp 5.0 and up.
2> Add extra default parameter NoDiagnose in ActOnOpenMPMapClause:
Use that to skip error as well as skip adding implicit map during the
mappable checking.
Note: there are only tow places need to be check for NoDiagnose. Rest
of them either the check is for < omp 5.0 or the error already generated for
reduction clause.
Differential Revision: https://reviews.llvm.org/D108132
The root problem is a null pointer is accessed during the call to
checkOpenMPLoop, because loop up bound expr is an error expression
due to error diagnostic was emit early.
To fix this, in setLCDeclAndLB, setUB and setStep instead return false,
return true when LB, UB or Step contains Error, so that the checking is
stopped in checkOpenMPLoop.
Differential Revision: https://reviews.llvm.org/D107385
where should not.
Currently we are using QTy->isIncompleteType(&ND) to check incomplete
type. But before doing that, need to instantiate for a class template
specialization or a class member of a class template specialization,
or an array with known size of such..., so that we know it is really
incomplete type.
To fix this using RequireCompleteType instead.
The new test is added into "test/OpenMP/target_update_messages.cpp"
The different of using RequireCompleteType is when emit incomplete type,
an additional note is also emitted to point to where incomplete type
is declared. Because this change, many tests are needed to be fixed
by adding additional note.
This is to fix https://bugs.llvm.org/show_bug.cgi?id=50508
Differential Revision: https://reviews.llvm.org/D107200
No need to try to create the default constructor for private copy, it
will be called automatically in the initializer of the declare
reduction. Fixes balance between constructors/destructors calls.
Differential Revision: https://reviews.llvm.org/D105143
Need to capture locals in aligned clauses for the combined directives to
be fix the crash in the codegen.
Differential Revision: https://reviews.llvm.org/D104258
Added support for CXXRewrittenBinaryOperator as a condition in ranged
for loops. This is a new kind of expression, need to extend support for
C++20 constructs.
It fixes PR49970: range-based for compilation fails for libstdc++ vector
with -std=c++20.
Differential Revision: https://reviews.llvm.org/D104240
<string> is currently the highest impact header in a clang+llvm build:
https://commondatastorage.googleapis.com/chromium-browser-clang/llvm-include-analysis.html
One of the most common places this is being included is the APInt.h header, which needs it for an old toString() implementation that returns std::string - an inefficient method compared to the SmallString versions that it actually wraps.
This patch replaces these APInt/APSInt methods with a pair of llvm::toString() helpers inside StringExtras.h, adjusts users accordingly and removes the <string> from APInt.h - I was hoping that more of these users could be converted to use the SmallString methods, but it appears that most end up creating a std::string anyhow. I avoided trying to use the raw_ostream << operators as well as I didn't want to lose having the integer radix explicit in the code.
Differential Revision: https://reviews.llvm.org/D103888
Implementation of the unroll directive introduced in OpenMP 5.1. Follows the approach from D76342 for the tile directive (i.e. AST-based, not using the OpenMPIRBuilder). Tries to use `llvm.loop.unroll.*` metadata where possible, but has to fall back to an AST representation of the outer loop if the partially unrolled generated loop is associated with another directive (because it needs to compute the number of iterations).
Reviewed By: ABataev
Differential Revision: https://reviews.llvm.org/D99459
This renames the expression value categories from rvalue to prvalue,
keeping nomenclature consistent with C++11 onwards.
C++ has the most complicated taxonomy here, and every other language
only uses a subset of it, so it's less confusing to use the C++ names
consistently, and mentally remap to the C names when working on that
context (prvalue -> rvalue, no xvalues, etc).
Renames:
* VK_RValue -> VK_PRValue
* Expr::isRValue -> Expr::isPRValue
* SK_QualificationConversionRValue -> SK_QualificationConversionPRValue
* JSON AST Dumper Expression nodes value category: "rvalue" -> "prvalue"
Signed-off-by: Matheus Izvekov <mizvekov@gmail.com>
Reviewed By: rsmith
Differential Revision: https://reviews.llvm.org/D103720
Multiple clauses are mutually exclusive. This patch refactors the functions that check for pairs of mutually exclusive clauses into a generalized function which also also accepts a list of clause types if which at most one can appear.
NFC patch extracted out of D99459 by request.
Reviewed By: ABataev
Differential Revision: https://reviews.llvm.org/D103666
Patch allows using of constexpr vars evaluatable to constant calue to be
used in declare mapper construct.
Differential Revision: https://reviews.llvm.org/D103642
The PreInits of a loop transformation (atm moment only tile) include the computation of the trip count. The trip count is needed by any loop-associated directives that consumes the transformation-generated loop. Hence, we must ensure that the PreInits of consumed loop transformations are emitted with the consuming directive.
This is done by addinging the inner loop transformation's PreInits to the outer loop-directive's PreInits. The outer loop-directive will consume the de-sugared AST such that the inner PreInits are not emitted twice. The PreInits of a loop transformation are still emitted directly if its generated loop(s) are not associated with another loop-associated directive.
Reviewed By: ABataev
Differential Revision: https://reviews.llvm.org/D102180
The OpenMP spec seems to require the compound operators be used for
+, *, &, |, and ^ reduction. So use these if a class has those operators.
If not try the simple operators as we did previously to limit the impact
to existing code.
Fixes: https://bugs.llvm.org/show_bug.cgi?id=48584
Differential Revision: https://reviews.llvm.org/D101941
This patch fixes various issues with our prior `declare target` handling
and extends it to support `omp begin declare target` as well.
This started with PR49649 in mind, trying to provide a way for users to
avoid the "ref" global use introduced for globals with internal linkage.
From there it went down the rabbit hole, e.g., all variables, even
`nohost` ones, were emitted into the device code so it was impossible to
determine if "ref" was needed late in the game (based on the name only).
To make it really useful, `begin declare target` was needed as it can
carry the `device_type`. Not emitting variables eagerly had a ripple
effect. Finally, the precedence of the (explicit) declare target list
items needed to be taken into account, that meant we cannot just look
for any declare target attribute to make a decision. This caused the
handling of functions to require fixup as well.
I tried to clean up things while I was at it, e.g., we should not "parse
declarations and defintions" as part of OpenMP parsing, this will always
break at some point. Instead, we keep track what region we are in and
act on definitions and declarations instead, this is what we do for
declare variant and other begin/end directives already.
Highlights:
- new diagnosis for restrictions specificed in the standard,
- delayed emission of globals not mentioned in an explicit
list of a declare target,
- omission of `nohost` globals on the host and `host` globals on the
device,
- no explicit parsing of declarations in-between `omp [begin] declare
variant` and the corresponding end anymore, regular parsing instead,
- precedence for explicit mentions in `declare target` lists over
implicit mentions in the declaration-definition-seq, and
- `omp allocate` declarations will now replace an earlier emitted
global, if necessary.
---
Notes:
The patch is larger than I hoped but it turns out that most changes do
on their own lead to "inconsistent states", which seem less desirable
overall.
After working through this I feel the standard should remove the
explicit declare target forms as the delayed emission is horrible.
That said, while we delay things anyway, it seems to me we check too
often for the current status even though that is often not sufficient to
act upon. There seems to be a lot of duplication that can probably be
trimmed down. Eagerly emitting some things seems pretty weak as an
argument to keep so much logic around.
---
Reviewed By: ABataev
Differential Revision: https://reviews.llvm.org/D101030
A user reported an assertion (below) but without a reproducer. I failed to
create a test myself but from the assertion one can derive the problem.
I set the DefaultMapperId location now to make sure this doesn't cause
trouble.
```
clang-13: .../DeclTemplate.h:1940:
void clang::ClassTemplateSpecializationDecl::setPointOfInstantiation(clang::SourceLocation):
Assertion `Loc.isValid() && "point of instantiation must be valid!"' failed.
```
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D100621
as used.
The problem only happens with constexpr variable, for constexpr variable,
variable is not marked during parser variable. This is because compiler
might find some var's associate expressions may not actully an odr-used
later, the variables get kept in MaybeODRUseExprs, in normal case, at
end of process fullExpr, the variable will be marked during the call to
CleanupVarDeclMarking(). Since we are processing expression of OpenMP
clauses, and the ActOnFinishFullExpr is not getting called that casue
variable is not get marked.
One way to fix this is to call CleanupVarDeclMarking() in EndOpenMPClause
for each omp directive.
This to fix https://bugs.llvm.org/show_bug.cgi?id=50206
Differential Revision: https://reviews.llvm.org/D101781
Need to respect mapping/privatization of declare target variables in the
target regions if explicitly specified by the user.
Differential Revision: https://reviews.llvm.org/D99530
For combined worksharing directives need to emit the temp arrays outside
of the parallel region and update them in the master thread only.
Differential Revision: https://reviews.llvm.org/D100187
If no initializer-clause is specified, the private variables will be
initialized following the rules for initialization of objects with static
storage duration.
Need to adjust the implementation to the current version of the
standard.
Differential Revision: https://reviews.llvm.org/D99539
Added basic parsing/sema/serialization support to extend the
existing 'destroy' clause for use with the 'interop' directive.
Differential Revision: https://reviews.llvm.org/D98834
Added basic parsing/sema/serialization support for interop directive.
Support for the 'init' clause.
Differential Revision: https://reviews.llvm.org/D98558
emission
Ensure that we are in a function declaration context before checking
the diagnostic emission status, to avoid dereferencing a NULL function
declaration.
Differential Revision: https://reviews.llvm.org/D97573
Initial support for using the OpenMPIRBuilder by clang to generate loops using the OpenMPIRBuilder. This initial support is intentionally limited to:
* Only the worksharing-loop directive.
* Recognizes only the nowait clause.
* No loop nests with more than one loop.
* Untested with templates, exceptions.
* Semantic checking left to the existing infrastructure.
This patch introduces a new AST node, OMPCanonicalLoop, which becomes parent of any loop that has to adheres to the restrictions as specified by the OpenMP standard. These restrictions allow OMPCanonicalLoop to provide the following additional information that depends on base language semantics:
* The distance function: How many loop iterations there will be before entering the loop nest.
* The loop variable function: Conversion from a logical iteration number to the loop variable.
These allow the OpenMPIRBuilder to act solely using logical iteration numbers without needing to be concerned with iterator semantics between calling the distance function and determining what the value of the loop variable ought to be. Any OpenMP logical should be done by the OpenMPIRBuilder such that it can be reused MLIR OpenMP dialect and thus by flang.
The distance and loop variable function are implemented using lambdas (or more exactly: CapturedStmt because lambda implementation is more interviewed with the parser). It is up to the OpenMPIRBuilder how they are called which depends on what is done with the loop. By default, these are emitted as outlined functions but we might think about emitting them inline as the OpenMPRuntime does.
For compatibility with the current OpenMP implementation, even though not necessary for the OpenMPIRBuilder, OMPCanonicalLoop can still be nested within OMPLoopDirectives' CapturedStmt. Although OMPCanonicalLoop's are not currently generated when the OpenMPIRBuilder is not enabled, these can just be skipped when not using the OpenMPIRBuilder in case we don't want to make the AST dependent on the EnableOMPBuilder setting.
Loop nests with more than one loop require support by the OpenMPIRBuilder (D93268). A simple implementation of non-rectangular loop nests would add another lambda function that returns whether a loop iteration of the rectangular overapproximation is also within its non-rectangular subset.
Reviewed By: jdenny
Differential Revision: https://reviews.llvm.org/D94973
If the mapped structure has data members, which have 'default' mappers,
need to map these members individually using their 'default' mappers.
Differential Revision: https://reviews.llvm.org/D92195
OpenMP 5.0 removed a lot of restriction for overlapped mapped items
comparing to OpenMP 4.5. Patch restricts the checks for overlapped data
mappings only for OpenMP 4.5 and less and reorders mapping of the
arguments so, that present and alloc mappings are processed first and
then all others.
Differential Revision: https://reviews.llvm.org/D86119
The tile directive is in OpenMP's Technical Report 8 and foreseeably will be part of the upcoming OpenMP 5.1 standard.
This implementation is based on an AST transformation providing a de-sugared loop nest. This makes it simple to forward the de-sugared transformation to loop associated directives taking the tiled loops. In contrast to other loop associated directives, the OMPTileDirective does not use CapturedStmts. Letting loop associated directives consume loops from different capture context would be difficult.
A significant amount of code generation logic is taking place in the Sema class. Eventually, I would prefer if these would move into the CodeGen component such that we could make use of the OpenMPIRBuilder, together with flang. Only expressions converting between the language's iteration variable and the logical iteration space need to take place in the semantic analyzer: Getting the of iterations (e.g. the overload resolution of `std::distance`) and converting the logical iteration number to the iteration variable (e.g. overload resolution of `iteration + .omp.iv`). In clang, only CXXForRangeStmt is also represented by its de-sugared components. However, OpenMP loop are not defined as syntatic sugar. Starting with an AST-based approach allows us to gradually move generated AST statements into CodeGen, instead all at once.
I would also like to refactor `checkOpenMPLoop` into its functionalities in a follow-up. In this patch it is used twice. Once for checking proper nesting and emitting diagnostics, and additionally for deriving the logical iteration space per-loop (instead of for the loop nest).
Differential Revision: https://reviews.llvm.org/D76342
Even code in target and declare target regions might not be emitted.
With this patch we delay more diagnostics and use laziness and linkage
to determine if a function is emitted (for the device). Note that we
still eagerly emit diagnostics for target regions, unfortunately, see
the TODO for the reason.
This hopefully fixes PR48933.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D95928
Type errors in function declarations were not (always) diagnosed prior
to this patch. Furthermore, certain remarks did not get associated
properly which caused them to be emitted multiple times.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D95912
The number of iterations calculation was failing in some cases with more
than two collpased loops. Now the LoopIterationSpace selected matches
InitDependOnLC and CondDependOnLC.
Differential Revision: https://reviews.llvm.org/D95834
The `assumes` directive is an OpenMP 5.1 feature that allows the user to
provide assumptions to the optimizer. Assumptions can refer to
directives (`absent` and `contains` clauses), expressions (`holds`
clause), or generic properties (`no_openmp_routines`, `ext_ABCD`, ...).
The `assumes` spelling is used for assumptions in the global scope while
`assume` is used for executable contexts with an associated structured
block.
This patch only implements the global spellings. While clauses with
arguments are "accepted" by the parser, they will simply be ignored for
now. The implementation lowers the assumptions directly to the
`AssumptionAttr`.
Reviewed By: ABataev
Differential Revision: https://reviews.llvm.org/D91980
The `assumes` directive is an OpenMP 5.1 feature that allows the user to
provide assumptions to the optimizer. Assumptions can refer to
directives (`absent` and `contains` clauses), expressions (`holds`
clause), or generic properties (`no_openmp_routines`, `ext_ABCD`, ...).
The `assumes` spelling is used for assumptions in the global scope while
`assume` is used for executable contexts with an associated structured
block.
This patch only implements the global spellings. While clauses with
arguments are "accepted" by the parser, they will simply be ignored for
now. The implementation lowers the assumptions directly to the
`AssumptionAttr`.
Reviewed By: ABataev
Differential Revision: https://reviews.llvm.org/D91980
Support present modifier in defaultmap by adding an extra dimension
for `ImplicitMap`. Therefore, we now create OMPMapClause in `ActOnOpenMPExecutableDirective`
based on both `maptype` and `maptype-modifier`.
Reviewed By: ABataev
Differential Revision: https://reviews.llvm.org/D92427
The variables used in atomic construct should be captured in outer
task-based regions implicitly. Otherwise, the compiler will crash trying
to find the address of the local variable.
Differential Revision: https://reviews.llvm.org/D92682
Compiler needs to convert some of the loop iteration
variables/conditions to different types for better codegen and it may
lead to spurious warning messages about implicit signed/unsigned
conversions.
Differential Revision: https://reviews.llvm.org/D92655
Reviewed by aaron.ballman, rsmith, wchilders
Highlights of review:
- avoid specifying an underlying type (unless such an enum is stored (or part of an abi?))
- avoid using enums as bit-fields, preferring unsigned bit-fields that we static_cast enumerators to. (MS's abi laysout enum bit-fields differently).
- clang-format, clang-format, clang-format.
https://reviews.llvm.org/D91035
Thank you!
If the variable is implicitly firstprivatized in the inner task-based
region, it also must be firstprivatized in outer task-based regions.
Previously firstprivates were captured in tasks but later it was
optimized to reduce the memory usage. But still need to mark such
variables as implicit firstprivate in outer tasks.
Differential Revision: https://reviews.llvm.org/D91627
In order not to modify the `tgt_target_data_update` information but still be
able to pass the extra information for non-contiguous map item (offset,
count, and stride for each dimension), this patch overload `arg` when
the maptype is set as `OMP_MAP_DESCRIPTOR`. The origin `arg` is for
passing the pointer information, however, the overloaded `arg` is an
array of descriptor_dim:
struct descriptor_dim {
int64_t offset;
int64_t count;
int64_t stride
};
and the array size is the same as dimension size. In addition, since we
have count and stride information in descriptor_dim, we can replace/overload the
`arg_size` parameter by using dimension size.
For supporting `stride` in array section, we use a dummy dimension in
descriptor to store the unit size. The formula for counting the stride
in dimension D_n: `unit size * (D_0 * D_1 ... * D_n-1) * D_n.stride`.
Demonstrate how it works:
```
double arr[3][4][5];
D0: { offset = 0, count = 1, stride = 8 } // offset, count, dimension size always be 0, 1, 1 for this extra dimension, stride is the unit size
D1: { offset = 0, count = 2, stride = 8 * 1 * 2 = 16 } // stride = unit size * (product of dimension size of D0) * D1.stride = 4 * 1 * 2 = 8
D2: { offset = 2, count = 2, stride = 8 * (1 * 5) * 1 = 40 } // stride = unit size * (product of dimension size of D0, D1) * D2.stride = 4 * 5 * 1 = 20
D3: { offset = 0, count = 2, stride = 8 * (1 * 5 * 4) * 2 = 320 } // stride = unit size * (product of dimension size of D0, D1, D2) * D3.stride = 4 * 25 * 2 = 200
// X here means we need to offload this data, therefore, runtime will transfer
// data from offset 80, 96, 120, 136, 400, 416, 440, 456
// Runtime patch: https://reviews.llvm.org/D82245
// OOOOO OOOOO OOOOO
// OOOOO OOOOO OOOOO
// XOXOO OOOOO XOXOO
// XOXOO OOOOO XOXOO
```
Reviewed By: ABataev
Differential Revision: https://reviews.llvm.org/D84192
folding to not constant folding.
Constant folding of ICEs is done as a GCC compatibility measure, but new
code was picking it up, presumably by accident, due to the bad default.
While here, also switch the flag from a bool to an enum to make it more
obvious what it means at call sites. This highlighted a couple of places
where our behavior is different between C++11 and C++14 due to switching
from checking for an ICE to checking for a converted constant
expression (where there is no 'fold' codepath).
This reapplies D88384 with the minor modification that an assertion was
changed to a regular conditional and graceful exit from
ASTContext::mergeTypes.
Especially for templates we need to check at some point if the base
function matches the specialization we might call instead. Before this
lead to the replacement of `std::sqrt(int(2))` calls with one that
converts the argument to a `std::complex<int>`, clearly not the desired
behavior.
Reported as PR47655
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D88384
Need to fix a check for the variable if it is declared in the inner
OpenMP region to be able to firstprivatize it.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D88240
Need to fix a check for the variable if it is declared in the inner
OpenMP region to be able to firstprivatize it.
Differential Revision: https://reviews.llvm.org/D88240
No need to make final copy from the firsptrivate/lastprivate copy to the original item if the item is a data memeber.
Firstprivate copy creates a copy by reference and the original item gets
updated correctly when updating the lastprivate shared variable.
Differential Revision: https://reviews.llvm.org/D88179
In CUDA/HIP a function may become implicit host device function by
pragma or constexpr. A host device function is checked in both
host and device compilation. However it may be emitted only
on host or device side, therefore the diagnostics should be
deferred until it is known to be emitted.
Currently clang is only able to defer certain diagnostics. This causes
false alarms and limits the usefulness of host device functions.
This patch lets clang defer all overloading resolution diagnostics for host device functions.
An option -fgpu-defer-diag is added to control this behavior. By default
it is off.
It is NFC for other languages.
Differential Revision: https://reviews.llvm.org/D84364
With this extension the effects of `omp begin declare variant` will be
applied to template function declarations. The behavior is opt-in and
controlled by the `extension(allow_templates)` trait. While generally
useful, this will enable us to implement complex math function calls by
overloading the templates of the standard library with the ones in
libc++.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D85735
This extension allows to declare variants in between `omp begin/end
declare variant` that do not match the type of the existing function
with that name. Without this extension we would not find a base function
(with a compatible type), therefore create a new one, which would
cause conflicting declarations. With this extension we will not create
"missing" base functions, which basically renders these specializations
harmless. They will be generated but never called.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D85878
Due to `omp begin/end declare variant`, OpenMP context selectors can be
nested. This patch adds initial support for this so we can use it for
target math variants. We should improve the detection of "equivalent"
scores and user conditions, we should also revisit the data structures
of the OMPTraitInfo object, however, both are not pressing issues right
now.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D85877
The base must be shared between the threads, threadprivates are not
allowed to be bases for array-like reductions.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D85762
This is recommit of 6c8041aa0f, reverted in de044f7562 because of some
fails. Original commit message is below.
This change allow a CastExpr to have optional FPOptionsOverride object,
stored in trailing storage. Of all cast nodes only ImplicitCastExpr,
CStyleCastExpr, CXXFunctionalCastExpr and CXXStaticCastExpr are allowed
to have FPOptions.
Differential Revision: https://reviews.llvm.org/D85960
This change allow a CastExpr to have optional FPOptionsOverride object,
stored in trailing storage. Of all cast nodes only ImplicitCastExpr,
CStyleCastExpr, CXXFunctionalCastExpr and CXXStaticCastExpr are allowed
to have FPOptions.
Differential Revision: https://reviews.llvm.org/D85960
Function Sema::isOpenMPGlobalCapturedDecl() has a parameter `unsigned Level`,
but use `Level >= 0` as the condition of `while`, thus cause an infinite loop.
Fix by changing the loop condition to `Level > 0`.
Reviewed By: ABataev
Differential Revision: https://reviews.llvm.org/D86858
OpenMP 5.0 supports nested declare target regions. So, in general,it is
allow to mark a declarationas declare target with different device_type
or link type. Patch adds support for such kind of nesting.
Differential Revision: https://reviews.llvm.org/D86239
If the declaration is used in the reduction clause, it is captured by
reference by default. But if the declaration is a pointer and it is a
base for array-like reduction, this declaration can be captured by
value, since the pointee is reduced but not the original declaration.
Differential Revision: https://reviews.llvm.org/D85321
Summary:
Introduced OMPChildren class to handle all associated clauses, statement
and child expressions/statements. It allows to represent some directives
more correctly (like flush, depobj etc. with pseudo clauses, ordered
depend directives, which are standalone, and target data directives).
Also, it will make easier to avoid using of CapturedStmt in directives,
if required (atomic, tile etc. directives).
Also, it simplifies serialization/deserialization of the
executable/declarative directives.
Reduces number of allocation operations for mapper declarations.
Reviewers: jdoerfert
Subscribers: yaxunl, guansong, jfb, cfe-commits, sstefan1, aaron.ballman, caomhin
Tags: #clang
Differential Revision: https://reviews.llvm.org/D83261
This patch implements Clang front end support for the OpenMP TR8
`present` motion modifier for `omp target update` directives. The
next patch in this series implements OpenMP runtime support.
Reviewed By: ABataev
Differential Revision: https://reviews.llvm.org/D84711
It was unclear what `isa` was supposed to mean so we did not provide any
traits for this context selector. With this patch we will allow *any*
string or identifier. We use the target attribute and target info to
determine if the trait matches. In other words, we will check if the
provided value is a target feature that is available (at the call site).
Fixes PR46338
Reviewed By: ABataev
Differential Revision: https://reviews.llvm.org/D83281
This patch implements Clang front end support for the OpenMP TR8
`present` motion modifier for `omp target update` directives. The
next patch in this series implements OpenMP runtime support.
Reviewed By: ABataev
Differential Revision: https://reviews.llvm.org/D84711
Summary:
If the variable is constrcutible, its copy is created by calling a
constructor. Such variables are duplicated and thus, must be captured.
Reviewers: jdoerfert
Subscribers: yaxunl, guansong, cfe-commits, sstefan1, caomhin
Tags: #clang
Differential Revision: https://reviews.llvm.org/D83909
This change allow a CallExpr to have optional FPOptionsOverride object,
stored in trailing storage. The implementaion is made similar to the way
used in BinaryOperator.
Differential Revision: https://reviews.llvm.org/D84343
This patch implements Clang front end support for the OpenMP TR8
`present` map type modifier. The next patch in this series implements
OpenMP runtime support.
This patch does not attempt to implement TR8 sec. 2.22.7.1 "map
Clause", p. 319, L14-16:
> If a map clause with a present map-type-modifier is present in a map
> clause, then the effect of the clause is ordered before all other
> map clauses that do not have the present modifier.
Compare to L10-11, which Clang does not appear to implement yet:
> For a given construct, the effect of a map clause with the to, from,
> or tofrom map-type is ordered before the effect of a map clause with
> the alloc, release, or delete map-type.
This patch also does not implement the `present` implicit-behavior for
`defaultmap` or the `present` motion-modifier for `target update`.
Reviewed By: ABataev
Differential Revision: https://reviews.llvm.org/D83061
Reapply 49e5f603d4
which had been reverted in c94332919b.
Originally reverted because I hadn't updated it in quite a while when I
got around to committing it, so there were a bunch of missing changes to
new code since I'd written the patch.
Reviewers: aaron.ballman
Differential Revision: https://reviews.llvm.org/D76646
Summary:
According to OpenMP 5.0, the restrictions for mapping of overlapped data
apply only for explicitly mapped data, there is no restriction for
implicitly mapped data just like in OpenMP 4.5.
Reviewers: jdoerfert
Subscribers: yaxunl, guansong, sstefan1, cfe-commits, caomhin
Tags: #clang
Differential Revision: https://reviews.llvm.org/D83398
Summary:
If user-defined reductions with the initializer are used with classes,
the compiler misses the constructor call when trying to create a private
copy of the reduction variable.
Reviewers: jdoerfert
Subscribers: cfe-commits, yaxunl, guansong, caomhin
Tags: #clang
Differential Revision: https://reviews.llvm.org/D83334
This implements the default(firstprivate) clause as defined in OpenMP
Technical Report 8 (2.22.4).
Reviewed By: jdoerfert, ABataev
Differential Revision: https://reviews.llvm.org/D75591
There is a version that just tests (also called
isIntegerConstantExpression) & whereas this version is specifically used
when the value is of interest (a few call sites were actually refactored
to calling the test-only version) so let's make the API look more like
it.
Reviewers: aaron.ballman
Differential Revision: https://reviews.llvm.org/D76646
Summary:
This patch is removing the custom enumeration for OpenMP Directives and Clauses and replace them
with the newly tablegen generated one from llvm/Frontend. This is a first patch and some will follow to share the same
infrastructure where possible. The next patch should use the clauses allowance defined in the tablegen file.
Reviewers: jdoerfert, DavidTruby, sscalpone, kiranchandramohan, ichoyjx
Reviewed By: DavidTruby, ichoyjx
Subscribers: jholewinski, cfe-commits, dblaikie, MaskRay, ymandel, ichoyjx, mgorny, yaxunl, guansong, jfb, sstefan1, aaron.ballman, llvm-commits
Tags: #llvm, #flang, #clang
Differential Revision: https://reviews.llvm.org/D82906
Summary:
As discussed previously when landing patch for OpenMP in Flang, the idea is
to share common part of the OpenMP declaration between the different Frontend.
While doing this it was thought that moving to tablegen instead of Macros will also
give a cleaner and more powerful way of generating these declaration.
This first part of a future series of patches is setting up the base .td file for
DirectiveLanguage as well as the OpenMP version of it. The base file is meant to
be used by other directive language such as OpenACC.
In this first patch, the Directive and Clause enums are generated with tablegen
instead of the macros on OMPConstants.h. The next pacth will extend this
to other enum and move the Flang frontend to use it.
Reviewers: jdoerfert, DavidTruby, fghanim, ABataev, jdenny, hfinkel, jhuber6, kiranchandramohan, kiranktp
Reviewed By: jdoerfert, jdenny
Subscribers: arphaman, martong, cfe-commits, mgorny, yaxunl, hiraditya, guansong, jfb, sstefan1, aaron.ballman, llvm-commits
Tags: #llvm, #openmp, #clang
Differential Revision: https://reviews.llvm.org/D81736
Summary:
As discussed previously when landing patch for OpenMP in Flang, the idea is
to share common part of the OpenMP declaration between the different Frontend.
While doing this it was thought that moving to tablegen instead of Macros will also
give a cleaner and more powerful way of generating these declaration.
This first part of a future series of patches is setting up the base .td file for
DirectiveLanguage as well as the OpenMP version of it. The base file is meant to
be used by other directive language such as OpenACC.
In this first patch, the Directive and Clause enums are generated with tablegen
instead of the macros on OMPConstants.h. The next pacth will extend this
to other enum and move the Flang frontend to use it.
Reviewers: jdoerfert, DavidTruby, fghanim, ABataev, jdenny, hfinkel, jhuber6, kiranchandramohan, kiranktp
Reviewed By: jdoerfert, jdenny
Subscribers: cfe-commits, mgorny, yaxunl, hiraditya, guansong, jfb, sstefan1, aaron.ballman, llvm-commits
Tags: #llvm, #openmp, #clang
Differential Revision: https://reviews.llvm.org/D81736
Summary:
According to OpenMP 5.0, nonmonotonic modifier can be used with all
schedule kinds, not only dynamic and guided as in OpenMP 4.5.
Reviewers: jdoerfert
Subscribers: yaxunl, guansong, sstefan1, cfe-commits, caomhin
Tags: #clang
Differential Revision: https://reviews.llvm.org/D82026
Summary:
The OpenMP loops are normalized and transformed into the loops from 0 to
max number of iterations. In some cases, original scheme may lead to
overflow during calculation of number of iterations. If it is unknown,
if we can end up with overflow or not (the bounds are not constant and
we cannot define if there is an overflow), cast original type to the
unsigned.
Reviewers: jdoerfert
Subscribers: yaxunl, guansong, sstefan1, openmp-commits, cfe-commits, caomhin
Tags: #clang, #openmp
Differential Revision: https://reviews.llvm.org/D81881
Summary:
According to OpenMP, During execution of an iteration of a worksharing-loop or a loop nest within a worksharing-loop, simd, or worksharing-loop SIMD region, a thread must not execute more than one ordered region corresponding to an ordered construct without a depend clause.
Need to report an error in this case.
Reviewers: jdoerfert
Subscribers: yaxunl, guansong, sstefan1, cfe-commits, caomhin
Tags: #clang
Differential Revision: https://reviews.llvm.org/D81951
Summary:
Added codegen for use_device_addr clause. The components of the list
items are mapped as a kind of RETURN components and then the returned
base address is used instead of the real address of the base declaration
used in the use_device_addr expressions.
Reviewers: jdoerfert
Subscribers: yaxunl, guansong, sstefan1, cfe-commits, caomhin
Tags: #clang
Differential Revision: https://reviews.llvm.org/D80730
Summary:
If the array subscript expression is type depent, its analysis must be
delayed before its instantiation.
Reviewers: jdoerfert
Subscribers: yaxunl, guansong, caomhin, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D78637
Summary:
Diagnostic is emitted if some declaration of unsupported type
declaration is used inside device code.
Memcpy operations for structs containing member with unsupported type
are allowed. Fixed crash on attempt to emit diagnostic outside of the
functions.
The approach is generalized between SYCL and OpenMP.
CUDA/OMP deferred diagnostic interface is going to be used for SYCL device.
Reviewers: rsmith, rjmccall, ABataev, erichkeane, bader, jdoerfert, aaron.ballman
Reviewed By: jdoerfert
Subscribers: guansong, sstefan1, yaxunl, mgorny, bader, ebevhan, Anastasia, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D74387
Summary:
With recovery expr, it is possible that we have a value-dependent expr
within non-dependent context.
Reviewers: sammccall, jdoerfert
Subscribers: yaxunl, guansong, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D80200
Summary:
The BFloat IR type is introduced to provide support for, initially, the BFloat16
datatype introduced with the Armv8.6 architecture (optional from Armv8.2
onwards). It has an 8-bit exponent and a 7-bit mantissa and behaves like an IEEE
754 floating point IR type.
This is part of a patch series upstreaming Armv8.6 features. Subsequent patches
will upstream intrinsics support and C-lang support for BFloat.
Reviewers: SjoerdMeijer, rjmccall, rsmith, liutianle, RKSimon, craig.topper, jfb, LukeGeeson, sdesmalen, deadalnix, ctetreau
Subscribers: hiraditya, llvm-commits, danielkiss, arphaman, kristof.beyls, dexonsmith
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D78190
Summary:
Predefined allocators should not be mapped at all (they are just enumeric
constants). FOr user-defined allocators need to map the traits only as
firstprivates, the allocator itself is private.
At the beginning of the target region the user-defined allocatores must
be created and then destroyed at the end of the target region:
```
omp_allocator_handle_t my_allocator = __kmpc_init_allocator(<gtid>,
/*default memhandle*/ 0, <number_of_traits>, &<traits>);
...
call void @__kmpc_destroy_allocator(<gtid>, my_allocator);
```
Reviewers: jdoerfert, aaron.ballman
Subscribers: jholewinski, yaxunl, guansong, cfe-commits, caomhin
Tags: #clang
Differential Revision: https://reviews.llvm.org/D79257
Summary:
omp.h header file defines omp_null_allocator as a predefined allocator,
need to consider it also as a predefined allocator.
Reviewers: jdoerfert
Subscribers: jholewinski, yaxunl, guansong, cfe-commits, caomhin
Tags: #clang
Differential Revision: https://reviews.llvm.org/D79186