Fixes PR45753
When a program that contains a loop to which both `omp parallel for`
pragma and `clang loop` pragma are associated is compiled with the
-fopenmp option, `clang loop` pragma did not take effect. The example
below should not be vectorized by the `clang loop` pragma but it was
actually vectorized. The cause is that `llvm.loop.vectorize.width`
was not output to the IR when -fopenmp is specified.
The fix attaches attributes if they exist for the loop.
[example.c]
```
int a[100], b[100];
void foo() {
#pragma omp parallel for
#pragma clang loop vectorize(disable)
for (int i = 0; i < 100; i++)
a[i] += b[i] * i;
}
```
[compile]
```
$ clang -O2 -fopenmp example.c -c -Rpass=vect
example.c:3:11: remark: vectorized loop (vectorization width: 4, interleaved count: 2) [-Rpass=loop-vectorize]
#pragma omp parallel for
^
```
[IR with -fopenmp]
```
$ clang -O2 exmaple.c -S -emit-llvm -mllvm -disable-llvm-optzns -o - -fopenmp | grep 'vectorize\.width'
```
[IR with -fno-openmp]
```
$ clang -O2 example.c -S -emit-llvm -mllvm -disable-llvm-optzns -o - -fno-openmp | grep 'vectorize\.width'
!7 = !{!"llvm.loop.vectorize.width", i32 1}
```
Differential Revision: https://reviews.llvm.org/D79921
If we're going to assume references are dereferenceable, we should also
assume they're aligned: otherwise, we can't actually dereference them.
See also D80072.
Differential Revision: https://reviews.llvm.org/D80166
test cases
Add support for #pragma float_control
Reviewers: rjmccall, erichkeane, sepavloff
Differential Revision: https://reviews.llvm.org/D72841
This reverts commit 85dc033cac, and makes
corrections to the test cases that failed on buildbots.
Summary:
Patch forces codegen to use the new runtime functions for task reductions where
the issue with passing the address of the original variables to the UDR
initializers is fixed. Also, this patch is required for upcoming
support of task modifier inreduction clause.
Reviewers: jdoerfert
Subscribers: yaxunl, guansong, cfe-commits, caomhin
Tags: #clang
Differential Revision: https://reviews.llvm.org/D78733
Implemented codegen for the iterator expression in the depend clauses.
Iterator construct is emitted the following way:
iterator(cnt1, cnt2, ...), in : <dep>
<TotalNumDeps> = <cnt1_size> * <cnt2_size> * ...;
kmp_depend_t deps[<TotalNumDeps>];
deps_counter = 0;
for (cnt1) {
for (cnt2) {
...
deps[deps_counter].base_addr = &<dep>;
deps[deps_counter].size = sizeof(<dep>);
deps[deps_counter].flags = in;
deps_counter += 1;
...
}
}
For depobj construct the codegen is very similar, but the memory is
allocated dynamically and added extra first item reserved for internal use.
If the ancestor device modifier is used and the value of the device
clause is evaluated to 1, the ancestor device shall be used for the
execution.
Since the reverse offloading is not supported yet, the target construct
execution is always initiated from the host, not from the device. So, if
the ancestor modifier is specified, just execute target region on the
host.
Avoid copying of the orignal variable if it is going to be marked as
firstprivate in task regions. For taskloops, still need to copy the
non-trvially copyable variables to correctly construct them upon task
creation.
Added codegen for update clause in depobj. Reads the number of the
elements from the first element and updates flags for each element in
the loop.
```
omp_depend_t x;
kmp_depend_info *base = (kmp_depend_info *)x;
intptr_t num = x[-1].base_addr;
kmp_depend_info *end = x + num;
kmp_depend_info *el = base;
do {
el.flags = new_flag;
el = &el[1];
} while (el != end);
```
Added codegen for 'depend' clause in depobj directive. The depend clause
is emitted as kmp_depend_info <deps>[<number_of_items_in_clause> + 1]. The
first element in this array is reserved for storing the number of
elements in this array: <deps>[0].base_addr =
<number_of_items_in_clause>;
This extra element is required to implement 'update' and 'destroy'
clauses. It is required to know the size of array to destroy it
correctly and to update depency kind.
This patch introduces a new helper class `OMPBuilderCBHelpers`,
which will contain all reusable C/C++ language specific function-
alities required by the `OMPIRBuilder`.
Initially, this helper class contains the body and finalization
codegen functionalities implemented using callbacks which were
moved here for reusability among the different directives
implemented in the `OMPIRBuilder`, along with RAIIs for preserving
state prior to emitting outlined and/or inlined OpenMP regions.
In the future this helper class will also contain all the different
call backs required by OpenMP clauses/variable privatization.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D74562
Add support for Master and Critical directive in the OMPIRBuilder. Both make use of a new common interface for emitting inlined OMP regions called `emitInlinedRegion` which was added in this patch as well.
Also this patch modifies clang to use the new directives when `-fopenmp-enable-irbuilder` commandline option is passed.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D72304
According to OpenMP 5.0, cancel and cancellation point constructs are
supported in taskloop directive. Added support for cancellation in
taskloop, master taskloop and parallel master taskloop.
directive.
According to OpenMP 5.0, The atomic_default_mem_order clause specifies the default memory ordering behavior for atomic constructs that must be provided by an implementation. If the default memory ordering is specified as seq_cst, all atomic constructs on which memory-order-clause is not specified behave as if the seq_cst clause appears. If the default memory ordering is specified as relaxed, all atomic constructs on which memory-order-clause is not specified behave as if the relaxed clause appears.
If the default memory ordering is specified as acq_rel, atomic constructs on which memory-order-clause is not specified behave as if the release clause appears if the atomic write or atomic update operation is specified, as if the acquire clause appears if the atomic read operation is specified, and as if the acq_rel clause appears if the atomic captured update operation is specified.
Add support for Master and Critical directive in the OMPIRBuilder. Both make use of a new common interface for emitting inlined OMP regions called `emitInlinedRegion` which was added in this patch as well.
Also this patch modifies clang to use the new directives when `-fopenmp-enable-irbuilder` commandline option is passed.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D72304
Add support for Master and Critical directive in the OMPIRBuilder. Both make use of a new common interface for emitting inlined OMP regions called `emitInlinedRegion` which was added in this patch as well.
Also this patch modifies clang to use the new directives when `-fopenmp-enable-irbuilder` commandline option is passed.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D72304
regions.
If the lastprivate conditional is passed as shared in inner region, we
shall check if it was ever changed and use this updated value after exit
from the inner region as an update value.
regions with reductions, lastprivates or linears clauses.
If the lastprivate conditional variable is updated in inner parallel
region with reduction, lastprivate or linear clause, the value must be
considred as a candidate for lastprivate conditional. Also, tracking in
inner parallel regions is not required.
The option will limit debug info by only emitting complete class
type information when its constructor is emitted.
This patch changes comparisons with LimitedDebugInfo to use the new
level instead.
Differential Revision: https://reviews.llvm.org/D72427
Added codegen support for lastprivate conditional. According to the
standard, if when the conditional modifier appears on the clause, if an
assignment to a list item is encountered in the construct then the
original list item is assigned the value that is assigned to the new
list item in the sequentially last iteration or lexically last section
in which such an assignment is encountered.
We look for the assignment operations and check if the left side
references lastprivate conditional variable. Then the next code is
emitted:
if (last_iv_a <= iv) {
last_iv_a = iv;
last_a = lp_a;
}
At the end the implicit barrier is generated to wait for the end of all
threads and then in the check for the last iteration the private copy is
assigned the last value.
if (last_iter) {
lp_a = last_a; // <--- new code
a = lp_a; // <--- store of private value to the original variable.
}
This allows to use the OpenMPIRBuilder for parallel regions. Code was
extracted from D61953 and adapted to work with the new version (D70109).
All but one feature should be supported. An update of this patch will
provide test coverage and privatization other than shared.
Reviewed By: fghanim
Differential Revision: https://reviews.llvm.org/D70290
Summary:
Basic codegen for the declarations marked as nontemporal. Also, if the
base declaration in the member expression is marked as nontemporal,
lvalue for member decl access inherits nonteporal flag from the base
lvalue.
Reviewers: rjmccall, hfinkel, jdoerfert
Subscribers: guansong, arphaman, caomhin, kkwli0, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D71708
According to OpenMP 5.0, if clause can be used in for simd directive. If
condition in the if clause if false, the non-vectorized version of the
loop must be executed.
Summary:
The new OpenMPConstants.h is a location for all OpenMP related constants
(and helpers) to live.
This patch moves the directives there (the enum OpenMPDirectiveKind) and
rewires Clang to use the new location.
Initially part of D69785.
Reviewers: kiranchandramohan, ABataev, RaviNarayanaswamy, gtbercea, grokos, sdmitriev, JonChesterfield, hfinkel, fghanim
Subscribers: jholewinski, ppenzin, penzn, llvm-commits, cfe-commits, jfb, guansong, bollu, hiraditya, mgorny
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D69853
According to OpenMP 5.0, if clause can be used in for simd directive. If
condition in the if clause if false, the non-vectorized version of the
loop must be executed.
AggValueSlot
This reapplies 8a5b7c3570 after a null
dereference bug in CGOpenMPRuntime::emitUserDefinedMapper.
Original commit message:
This is needed for the pointer authentication work we plan to do in the
near future.
a63a81bd99/clang/docs/PointerAuthentication.rst
According to OpenMP 5.0, if clause can be used in for simd directive. If
condition in the if clause if false, the non-vectorized version of the
loop must be executed.
According to OpenMP 5.0, if clause can be used in simd directive. If
condition in the if clause if false, the non-vectorized version of the
loop must be executed.
Added parsing/sema/codegen support for 'parallel master taskloop'
constructs. Some of the clauses, like 'grainsize', 'num_tasks', 'final'
and 'priority' are not supported in full, only constant expressions can
be used currently in these clauses.
llvm-svn: 374791
The behavior from the original patch has changed, since we're no longer
allowing LLVM to just ignore the alignment. Instead, we're just
assuming the maximum possible alignment.
Differential Revision: https://reviews.llvm.org/D68824
llvm-svn: 374562
The test fails on Windows, with
error: 'warning' diagnostics expected but not seen:
File builtin-assume-aligned.c Line 62: requested alignment
must be 268435456 bytes or smaller; assumption ignored
error: 'warning' diagnostics seen but not expected:
File builtin-assume-aligned.c Line 62: requested alignment
must be 8192 bytes or smaller; assumption ignored
llvm-svn: 374456
Code to handle __builtin_assume_aligned was allowing larger values, but
would convert this to unsigned along the way. This patch removes the
EmitAssumeAligned overloads that take unsigned to do away with this
problem.
Additionally, it adds a warning that values greater than 1 <<29 are
ignored by LLVM.
Differential Revision: https://reviews.llvm.org/D68824
llvm-svn: 374450
We previously failed to treat an array with an instantiation-dependent
but not value-dependent bound as being an instantiation-dependent type.
We now track the array bound expression as part of a constant array type
if it's an instantiation-dependent expression.
llvm-svn: 373685
Runtime function __kmpc_push_tripcount better to call inside of the task
context for target regions. Otherwise, the libomptarget is unable to
link the provided tripcount value for nowait target regions and
completely looses this information.
llvm-svn: 372609
If the variable, used in the loop boundaries, is not captured in the
construct, this variable must be considered as undefined if it was
privatized.
llvm-svn: 372252
construct.
OpenMP 5.0 introduced new clause for declare target directive, device_type clause, which may accept values host, nohost, and any. Host means
that the function must be emitted only for the host, nohost - only for
the device, and any - for both, device and the host.
llvm-svn: 369775
Added basic support for non-rectangular loops. It requires an additional
analysis of min/max boundaries for non-rectangular loops. Since only
linear dependency is allowed, we can do this analysis.
llvm-svn: 368903
Target-based runtime functions use int64_t type for sizes, while the
compiler uses size_t type. It leads to miscompilation in 32 bit mode.
llvm-svn: 364327
If the variable is a firstprivate variable and it was not emitted beause
this a constant variable with the constant initializer, we can use the
initial value instead of the variable itself. It also fixes the problem
with the compiler crash in this case.
llvm-svn: 361564
If the doacross lop construct is used and the loop counter is declare
outside of the loop, the compiler might crash trying to get the address
of the loop counter. Patch fixes this problem.
llvm-svn: 356198
memory.
If the variable with the constant non-scalar type is firstprivatized in
the target region, the local copy is created with the data copying.
Instead, we allocate the copy in the constant memory and avoid extra
copying in the outlined target regions. This global copy is used in the
target regions without loss of the performance.
llvm-svn: 355418
The various EltSize, Offset, DataLayout, and StructLayout arguments
are all computable from the Address's element type and the DataLayout
which the CGBuilder already has access to.
After having previously asserted that the computed values are the same
as those passed in, now remove the redundant arguments from
CGBuilder's Create*GEP functions.
Differential Revision: https://reviews.llvm.org/D57767
llvm-svn: 353629
Emit{Nounwind,}RuntimeCall{,OrInvoke} have been modified to take a
FunctionCallee as an argument, and CreateRuntimeFunction has been
modified to return a FunctionCallee. All callers have been updated.
Additionally, CreateBuiltinFunction is removed, as it was redundant
with CreateRuntimeFunction after some previous changes.
Differential Revision: https://reviews.llvm.org/D57668
llvm-svn: 353184