Calling runtime TRANSPOSE requires a temporary array for the result,
and, sometimes, a temporary array for the argument. Lowering it inline
should provide faster code.
I added -opt-transpose control just for debugging purposes temporary.
I am going to make driver changes that will disable inline lowering
for -O0. For the time being I would like to enable it by default
to expose the code to more tests.
Differential Revision: https://reviews.llvm.org/D129497
In case where the bound(s) of a workshare loop use(s) firstprivate var(s), currently, that use is not updated with the created clone. It still uses the shared variable. This patch fixes that.
Reviewed By: peixin
Differential Revision: https://reviews.llvm.org/D127137
Flang C++ Style Guide tells us to avoid .has_value() in the predicate
expressions of control flow statements. I am treating ternary
expressions as control flow statements for the purpose of this patch.
Differential Revision: https://reviews.llvm.org/D128622
Array-value-copy fails to generate a temporary array for case like this:
subroutine bug(b)
real, allocatable :: b(:)
b = b(2:1:-1)
end subroutine
Since LHS may need to be reallocated, lowering produces the following FIR:
%rhs_load = fir.array_load %b %slice
%lhs_mem = fir.if %b_is_allocated_with_right_shape {
fir.result %b
} else {
%new_storage = fir.allocmem %rhs_shape
fir.result %new_storage
}
%lhs = fir.array_load %lhs_mem
%loop = fir.do_loop {
....
}
fir.array_merge_store %lhs, %loop to %lhs_mem
// deallocate old storage if reallocation occured,
// and update b descriptor if needed.
Since %b in array_load and %lhs_mem in array_merge_store are not the same SSA
values, array-value-copy does not detect the conflict and does not produce
a temporary array. This causes incorrect result in runtime.
The suggested change in lowering is to generate this:
%rhs_load = fir.array_load %b %slice
%lhs_mem = fir.if %b_is_allocated_with_right_shape {
%lhs = fir.array_load %b
%loop = fir.do_loop {
....
}
fir.array_merge_store %lhs, %loop to %b
fir.result %b
} else {
%new_storage = fir.allocmem %rhs_shape
%lhs = fir.array_load %new_storage
%loop = fir.do_loop {
....
}
fir.array_merge_store %lhs, %loop to %new_storage
fir.result %new_storage
}
// deallocate old storage if reallocation occured,
// and update b descriptor if needed.
Note that there are actually 3 branches in FIR, so the assignment loops
are currently produced in three copies, which is a code-size issue.
It is possible to generate just two branches with two copies of the loops,
but it is not addressed in this change-set.
Differential Revision: https://reviews.llvm.org/D129314
Move the device_type parser to a separate parser AccDeviceTypeExprList. Preparatory work for D106968.
Reviewed By: kiranchandramohan
Differential Revision: https://reviews.llvm.org/D106967
Set the isOptional flag for the self clause. Move the optional and parenthesis part of the parser. Update the rest of the code to deal with the optional value.
Preparatory work for D106968.
Reviewed By: kiranchandramohan
Differential Revision: https://reviews.llvm.org/D106965
This patch is part of the upstreaming effort from fir-dev branch.
This is the last patch for the upstreaming effort.
Reviewed By: jeanPerier
Differential Revision: https://reviews.llvm.org/D129187
Co-authored-by: Eric Schweitz <eschweitz@nvidia.com>
Create a TargetCharacteristics class to centralize the few items of
target specific information that are relevant to semantics. Use the
new class for all target queries, including derived type component layout
modeling.
Future work will initialize this class with target information
provided or forwarded by the drivers, and use it to fold layout-dependent
intrinsic functions like TRANSFER().
Differential Revision: https://reviews.llvm.org/D129018
Updates: Attempts to work around build issues on Windows.
Finalization is F2003 and although the runtime supports it already,
lowering is not ensuring all the derived type are finalized properly
when they should. This will require surveying the places where lowering
needs to call it. Add a hard TODO for now.
This patch is part of the upstreaming effort from fir-dev branch.
Reviewed By: jeanPerier
Differential Revision: https://reviews.llvm.org/D129069
Co-authored-by: Jean Perier <jperier@nvidia.com>
This patch is part of the upstreaming effort from fir-dev branch.
Reviewed By: jeanPerier
Differential Revision: https://reviews.llvm.org/D128976
Co-authored-by: V Donaldson <vdonaldson@nvidia.com>
Fix for broken/degenerate forall case where there is no assignment to an
array under the explicit iteration space. While this is a multiple
assignment, semantics only raises a warning.
The fix is to add a test that the explicit space has any sort of array
to be updated, and if not then the do_loop nest will not require a
terminator to forward array values to the next iteration.
This patch is part of the upstreaming effort from fir-dev branch.
Reviewed By: jeanPerier
Differential Revision: https://reviews.llvm.org/D128973
Co-authored-by: Eric Schweitz <eschweitz@nvidia.com>
Add source coordinates to BeginWait and BeginWaitAll calls
This patch is part of the upstreaming effort from fir-dev branch.
Reviewed By: jeanPerier
Differential Revision: https://reviews.llvm.org/D128970
Co-authored-by: V Donaldson <vdonaldson@nvidia.com>
This patch is part of the upstreaming effort from fir-dev branch.
Reviewed By: PeteSteinfeld
Differential Revision: https://reviews.llvm.org/D128935
Co-authored-by: Eric Schweitz <eschweitz@nvidia.com>
Co-authored-by: Peter Steinfeld <psteinfeld@nvidia.com>
The check to see if the arguments for the MIN/MAX intrinsics were of CHARACTER
type was not handling assumed length characters. In this case, the FIR type is
"!fir.ref<!fir.char<1,?>>".
This patch is part of the upstreaming effort from fir-dev branch.
Reviewed By: PeteSteinfeld
Differential Revision: https://reviews.llvm.org/D128922
Co-authored-by: Peter Steinfeld <psteinfeld@nvidia.com>
As Fortran 2018 16.9.163, the reshape is the only intrinsic which
requires the shape argument to be rank-one integer array and the SIZE
of it to be one constant expression. The current expression lowering
converts the shape expression with slice in intrinsic into one box value
with the box element type of unknown extent. However, the genReshape
requires the box element type to be constant size. So, convert the box
value into one with box element type of sequence of 1 x constant. This
corner case is found in cam4 in SPEC 2017
https://github.com/llvm/llvm-project/issues/56140.
Reviewed By: Jean Perier
Differential Revision: https://reviews.llvm.org/D128597
The original assertion is not necessarily correct since the shape
argument may involve a slice of an array (an expression) and not a whole
vector with constant length. In the presence of a slice operation, the
size must be computed (left as a TODO for now).
This patch is part of the upstreaming effort from fir-dev branch.
Reviewed By: jeanPerier
Differential Revision: https://reviews.llvm.org/D128894
Co-authored-by: Eric Schweitz <eschweitz@nvidia.com>
Even though the array is declared with '*' upper bounds, it has an
initial value that has a statically known shape. Use the shape from
the type of the initializer when the declared size is '*'.
This patch is part of the upstreaming effort from fir-dev branch.
Reviewed By: jeanPerier
Differential Revision: https://reviews.llvm.org/D128889
Co-authored-by: Eric Schweitz <eschweitz@nvidia.com>
The names of CHARACTER strings were being truncated leading to invalid
collisions and other failures. This change makes sure to use the entire
string as the seed for the unique name.
This patch is part of the upstreaming effort from fir-dev branch.
Reviewed By: jeanPerier
Differential Revision: https://reviews.llvm.org/D128884
Co-authored-by: Eric Schweitz <eschweitz@nvidia.com>
Here is a character SELECT CASE construct that requires a temp to hold the
result of the TRIM intrinsic call:
```
module m
character(len=6) :: s
contains
subroutine sc
n = 0
if (lge(s,'00')) then
select case(trim(s))
case('11')
n = 1
case default
continue
case('22')
n = 2
case('33')
n = 3
case('44':'55','66':'77','88':)
n = 4
end select
end if
print*, n
end subroutine
end module m
```
This SELECT CASE construct is implemented as an IF/ELSE-IF/ELSE comparison
sequence. The temp must be retained until some comparison is successful.
At that point the temp may be freed. Generalize statement context processing
to allow multiple finalize calls to do this, such that the program always
executes exactly one freemem call.
This patch is part of the upstreaming effort from fir-dev branch.
Reviewed By: klausler, vdonaldson
Differential Revision: https://reviews.llvm.org/D128852
Co-authored-by: V Donaldson <vdonaldson@nvidia.com>
- Add verifiers that determine if an Op requires type parameters or
not and checks that the correct number of parameters is specified.
This patch is part of the upstreaming effort from fir-dev branch.
Reviewed By: PeteSteinfeld
Differential Revision: https://reviews.llvm.org/D128828
Co-authored-by: Eric Schweitz <eschweitz@nvidia.com>
Shared is the default behaviour in the IR, so no handling is required.
Default clause with shared or none do not require any handling since
Shared is the default behaviour in the IR and None is only required
for semantic checks.
This patch is carved out from D123930 to remove couple of false TODOs.
Reviewed By: peixin, shraiysh
Differential Revision: https://reviews.llvm.org/D128797
Co-authored-by: Nimish Mishra <neelam.nimish@gmail.com>
1. Remove the redundant collapse clause in MLIR OpenMP worksharing-loop
operation.
2. Fix several typos.
3. Refactor the chunk size type conversion since CreateSExtOrTrunc has
both type check and type conversion.
Reviewed By: kiranchandramohan
Differential Revision: https://reviews.llvm.org/D128338
Added new -lower-math-early option that defaults to 'true' that matches
the current math lowering scheme. If set to 'false', the intrinsic math
operations will be lowered to MLIR operations, which should potentially
enable more MLIR optimizations, or libm calls, if there is no corresponding
MLIR operation exists or if "precise" mode is requested.
The generated math MLIR operations are then converted to LLVM dialect
during codegen phase.
The -lower-math-early option is not exposed to users currently. I plan to
get rid of the "early" lowering completely, when "late" lowering
is robust enough to support all math intrinsics that are currently
supported via pgmath. So "late" mode will become default and -lower-math-early
option will not be needed. This will effectively eliminate the mandatory
dependency on pgmath in Fortran lowering, but this is WIP.
Differential Revision: https://reviews.llvm.org/D128385
This patch fixes a couple of issues with the lowering of user defined assignment.
This patch is part of the upstreaming effort from fir-dev branch.
Reviewed By: klausler
Differential Revision: https://reviews.llvm.org/D128730
Co-authored-by: Jean Perier <jperier@nvidia.com>
Co-authored-by: Eric Schweitz <eschweitz@nvidia.com>
For the rapid triage push, just add a TODO for the degenerate POINTER
assignment case. The LHD ought to be a variable of type !fir.box, but it
is currently returning a shadow variable for the raw data pointer. More
investigation is needed there.
Make sure that conversions are applied in FORALL degenerate contexts.
This patch is part of the upstreaming effort from fir-dev branch.
Reviewed By: jeanPerier
Differential Revision: https://reviews.llvm.org/D128724
Co-authored-by: Eric Schweitz <eschweitz@nvidia.com>
This patch restores several calls to Optional::value() in preference
to Optional::operator*.
The Flang C++ Style Guide tells us to use x.value() where no presence
test is obviously protecting a *x reference to the contents.
Differential Revision: https://reviews.llvm.org/D128590
This patch is part of the upstreaming effort from fir-dev branch.
Reviewed By: PeteSteinfeld, vdonaldson
Differential Revision: https://reviews.llvm.org/D128502
Co-authored-by: V Donaldson <vdonaldson@nvidia.com>
Lower the `parallel loop` contrsuct and refactor some of the code
of parallel and loop lowering to be reused.
Also add tests for loop and parallel since they were not upstreamed.
This patch is part of the upstreaming effort from fir-dev branch.
Reviewed By: PeteSteinfeld
Differential Revision: https://reviews.llvm.org/D128510
In merge FSOURCE and TSOURCE must have the same Fortran dynamic types,
but this does not imply that FSOURCE and TSOURCE will be lowered to the
same MLIR types. For instance, TSOURCE may be a character expression
with a compile type constant length (!fir.char<1,4>) while FSOURCE may
have dynamic length (!fir.char<1,?>).
Cast FSOURCE to TSOURCE MLIR types to handle these cases.
This patch is part of the upstreaming effort from fir-dev branch.
Reviewed By: PeteSteinfeld
Differential Revision: https://reviews.llvm.org/D128507
Co-authored-by: Jean Perier <jperier@nvidia.com>
Explicitly map host associated symbols in DoConcurrent with shared
locality-spec, clauses in OpenMP/OpenACC. The mapping of host-assoc
symbols is set to their parent SymbolBox. This is achieved through
a new interface function in the AbstractConverter.
This was already upstream for OpenMP.
This patch is part of the upstreaming effort from fir-dev branch.
Reviewed By: PeteSteinfeld
Differential Revision: https://reviews.llvm.org/D128518
Co-authored-by: Kiran Chandramohan <kiran.chandramohan@arm.com>
LBOUND with a non constant DIM argument use the runtime to allow runtime
verification of DIM <= RANK. The interface uses a descriptor. This caused
undefined behavior because the runtime believed it was seeing an explicit
shape arrays with zero extent and returned `1` (the runtime descriptor
does not allow making a difference between an explicit shape and an
assumed size. Assumed size are not meant to be described by runtime
descriptors).
Fix the issue by setting the last extent of assumed size to `1` when
creating the descriptor to inquire about the LBOUND with the runtime.
This patch is part of the upstreaming effort from fir-dev branch.
Reviewed By: PeteSteinfeld
Differential Revision: https://reviews.llvm.org/D128509
Co-authored-by: Jean Perier <jperier@nvidia.com>
This supports the lowering of copyin clause initially. The pointer,
allocatable, common block, polymorphic varaibles will be supported
later.
This also includes the following changes:
1. Resolve the COPYIN clause and make the entity as host associated.
2. Fix collectSymbolSet by adding one option to control collecting the
symbol itself or ultimate symbol of it so that it can be used
explicitly differentiate the host and associated variables in
host-association.
3. Add one helper function `lookupOneLevelUpSymbol` to differentiate the
usage of host and associated variables explicitly. The previous
lowering of firstprivate depends on the order of
`createHostAssociateVarClone` and `lookupSymbol` of host symbol. With
this fix, this dependence is removed.
4. Reuse `copyHostAssociateVar` for copying operation of COPYIN clause.
Reviewed By: kiranchandramohan, NimishMishra
Differential Revision: https://reviews.llvm.org/D127468
When there is a substring operation on a scalar assignment in a FORALL
context, we have to lower the entire substring and not the entire
CHARACTER variable.
This patch is part of the upstreaming effort from fir-dev branch.
Reviewed By: PeteSteinfeld, klausler
Differential Revision: https://reviews.llvm.org/D128459
Co-authored-by: Eric Schweitz <eschweitz@nvidia.com>
Character conversion requires memory storage as it operates on a
sequence of code points.
This patch is part of the upstreaming effort from fir-dev branch.
Reviewed By: PeteSteinfeld
Differential Revision: https://reviews.llvm.org/D128438
Co-authored-by: Eric Schweitz <eschweitz@nvidia.com>
* Make Semantics test doconcurrent01.f90 an expected failure pending a fix
for a problem in recognizing a PURE prefix specifier for a specific procedure
that occurs in new intrinsic module source code,
* review update
* review update
* Increase support for intrinsic module procedures
The f18 standard defines 5 intrinsic modules that define varying numbers
of procedures, including several operators:
2 iso_fortran_env
55 ieee_arithmetic
10 ieee_exceptions
0 ieee_features
6 iso_c_binding
There are existing fortran source files for each of these intrinsic modules.
This PR adds generic procedure declarations to these files for procedures
that do not already have them, together with associated specific procedure
declarations. It also adds the capability of recognizing intrinsic module
procedures in lowering code, making it possible to use existing language
intrinsic code generation for intrinsic module procedures for both scalar
and elemental calls. Code can then be generated for intrinsic module
procedures using existing options, including front end folding, direct
inlining, and calls to runtime support routines. Detailed code generation
is provided for several procedures in this PR, with others left to future PRs.
Procedure calls that reach lowering and don't have detailed implementation
support will generate a "not yet implemented" message with a recognizable name.
The generic procedures in these modules may each have as many as 36 specific
procedures. Most specific procedures are generated via macros that generate
type specific interface declarations. These specific declarations provide
detailed argument information for each individual procedure call, similar
to what is done via other means for standard language intrinsics. The
modules only provide interface declarations. There are no procedure
definitions, again in keeping with how language intrinsics are processed.
This patch is part of the upstreaming effort from fir-dev branch.
Reviewed By: jeanPerier, PeteSteinfeld
Differential Revision: https://reviews.llvm.org/D128431
Co-authored-by: V Donaldson <vdonaldson@nvidia.com>
MODULE FUNCTION and MODULE SUBROUTINE currently cause lowering crash:
"symbol is not mapped to any IR value" because special care is needed
to handle their interface.
Add a TODO for now.
Example of program that crashed and will hit the TODO:
```
module mod
interface
module subroutine sub
end subroutine
end interface
contains
module subroutine sub
x = 42
end subroutine
end module
```
This patch is part of the upstreaming effort from fir-dev branch.
Reviewed By: jeanPerier
Differential Revision: https://reviews.llvm.org/D128412
Co-authored-by: Jean Perier <jperier@nvidia.com>
The case where the dummy argument is OPTIONAL was missing in the
handling of VALUE numerical and logical dummies (passBy::BaseAddressValueAttribute).
This caused segfaults while unconditionally copying actual arguments that were legally
absent at runtime.
Takes this bug as an opportunity to share the code that lowers arguments
that must be passed by BaseAddress, BaseAddressValueAttribute, BoxChar,
and CharBoxValueAttribute.
It has to deal with the exact same issues (being able to make contiguous
copies of the actual argument, potentially conditionally at runtime,
and potentially requiring a copy-back).
The VALUE case is the same as the non value case, except there is never
a copy-back and there is always a copy-in for variables. This two
differences are easily controlled by a byValue flag.
This as the benefit of implementing CHARACTER, VALUE for free that was
previously a hard TODO.
This patch is part of the upstreaming effort from fir-dev branch.
Reviewed By: kiranchandramohan
Differential Revision: https://reviews.llvm.org/D128418
Co-authored-by: Jean Perier <jperier@nvidia.com>
Fixes several bugs relating to initialization expressions. An
initialization expression has no access to dynamic resources like the
stack or the heap. It must reduce to a relocatable expression that the
loader can complete at runtime.
Adds regression test.
This patch is part of the upstreaming effort from fir-dev branch.
Reviewed By: PeteSteinfeld
Differential Revision: https://reviews.llvm.org/D128380
Co-authored-by: Jean Perier <jperier@nvidia.com>
Co-authored-by: Eric Schweitz <eschweitz@nvidia.com>
Character and array results are allocated on the caller side. This
require evaluating the result interface on the call site. When calling
such functions inside an internal procedure, it is possible that the
interface is defined in the host, in which case the lengths/bounds of
the function results must be captured so that they are available in
the internal function to emit the call.
To handle this case, extend the PFT symbol visit to visit the bounds and length
parameters of functions called in the internal procedure parse tree.
This patch is part of the upstreaming effort from fir-dev branch.
Reviewed By: klausler
Differential Revision: https://reviews.llvm.org/D128371
Co-authored-by: Jean Perier <jperier@nvidia.com>
This patch replaces some `auto` with proper type. This was done in fir-dev
but not upstreamed yet.
This patch is part of the upstreaming effort from fir-dev branch.
Reviewed By: klausler
Differential Revision: https://reviews.llvm.org/D128350
- BIND(C) was ignored in lowering for objects (it can be used on
module and common blocks): use the bind name as the fir.global name.
- When an procedure is declared BIND(C) indirectly via an interface,
it should have a BIND(C) name. This was not the case because
GetBindName()/bindingName() return nothing in this case: detect this
case in mangler.cpp and use the symbol name.
Add TODOs for corner cases:
- BIND(C) module variables may be initialized on the C side. This does
not fit well with the current linkage strategy. Add a TODO until this
is revisited.
- BIND(C) internal procedures should not have a binding label (see
Fortran 2018 section 18.10.2 point 2), yet we currently lower them as
if they were BIND(C) external procedure.
I think this and the indirect interface case should really be
handled by symbol.GetBindName instead of adding more logic in
lowering to deal with this case: add a TODO.
This patch is part of the upstreaming effort from fir-dev branch.
Reviewed By: klausler
Differential Revision: https://reviews.llvm.org/D128340
Co-authored-by: Jean Perier <jperier@nvidia.com>
This patch is part of the upstreaming effort from fir-dev branch.
Reviewed By: jeanPerier
Differential Revision: https://reviews.llvm.org/D128186
Co-authored-by: Peter Steinfeld <psteinfeld@nvidia.com>
Lowering was crashing with "fatal internal error: node has not been analyzed"
if a PDT with initial component value was defined inside an internal
procedure. This is because the related expression cannot be analyzed
without the component values (which happens at the instatiation).
These expression do not need to be visited (the instantiations, if any
will be). Use the form of GetExpr that tolerates the parse tree expression
to not be analyzed into an evaluate::Expr when looking through the
symbols used in an internal procedure.
Note that the PDTs TODO will then fire (it happens after the PFT
analysis) as expected if the derived type is used.
Differential Revision: https://reviews.llvm.org/D127735
Codegen does not support fir.addressof of functions returning derived
types, arrays are descriptors inside GlobalOp region.
This is because the abstract-result-opt is required to rewrite such
functions (a hidden argument must be added), but this pass is meant to
run in GlobalOp currently.
Such fir.address_of may be created when lowering procedure pointers
initial value (TODO), or when creating derived type descriptors for
types with bindings.
Add a TODO in lowering until abstract-result-opt is modified to run
on GlobalOp too.
This patch is part of the upstreaming effort from fir-dev branch.
Reviewed By: jeanPerier
Differential Revision: https://reviews.llvm.org/D127722
Loop variables of a worksharing loop and sequential loops in parallel
region are privatised by default. These variables are marked with
OmpPreDetermined. Skip explicit privatisation of these variables.
Note: This is part of upstreaming from the fir-dev branch of
https://github.com/flang-compiler/f18-llvm-project.
Reviewed By: Leporacanthicus
Differential Revision: https://reviews.llvm.org/D127249
Co-authored-by: Jean Perier <jperier@nvidia.com>
Co-authored-by: Mats Petersson <mats.petersson@arm.com>
system_clock intrinsic calls with dynamically optional arguments
Modify intrinsic system_clock calls to allow for an argument that is optional
or a disassociated pointer or an unallocated allocatable. A call with such an
argument is the same as a call that does not specify that argument.
Rename (genIsNotNull -> genIsNotNullAddr) and (genIsNull -> genIsNullAddr)
and add a use of genIsNotNullAddr.
This patch is part of the upstreaming effort from fir-dev branch.
Reviewed By: PeteSteinfeld
Differential Revision: https://reviews.llvm.org/D127616
Co-authored-by: V Donaldson <vdonaldson@nvidia.com>
This patch adds code so that using bbc we are able to see an end-to-end lowering of simd construct in action.
Reviewed By: kiranchandramohan, peixin, shraiysh
Differential Revision: https://reviews.llvm.org/D125282
[flang]Add support for do concurrent
Upstreaming from fir-dev on https://github.com/flang-compiler/f18-llvm-project
Support for concurrent execution in do-loops.
A selection of tests are also added.
Co-authored-by: V Donaldson <vdonaldson@nvidia.com>
Reviewed By: kiranchandramohan
Differential Revision: https://reviews.llvm.org/D127240
Remove a backwards dependence from Optimizer -> Lower by moving Todo.h
to the optimizer and out of lowering.
This patch is part of the upstreaming effort from fir-dev branch.
Co-authored-by: Eric Schweitz <eschweitz@nvidia.com>
Reviewed By: jeanPerier
Differential Revision: https://reviews.llvm.org/D127292
Add support for lowering the schedule modifiers (simd, monotonic,
non-monotonic) in worksharing loops.
Note: This is part of upstreaming from the fir-dev branch of
https://github.com/flang-compiler/f18-llvm-project.
Reviewed By: peixin
Differential Revision: https://reviews.llvm.org/D127311
Co-authored-by: Mats Petersson <mats.petersson@arm.com>
Co-authored-by: Jean Perier <jperier@nvidia.com>
Co-authored-by: Eric Schweitz <eschweitz@nvidia.com>
Co-authored-by: V Donaldson <vdonaldson@nvidia.com>
Getting ompobject symbol is needed in multiple places and will be
needed later for the lowering of other constructs/clauses such as
copyin clause. Extract them into one function.
Reviewed By: kiranchandramohan
Differential Revision: https://reviews.llvm.org/D127280
Move tthe function to allow its usage in the Optimizer/Builder functions.
This patch is part of the upstreaming effort from fir-dev branch.
Reviewed By: jeanPerier
Differential Revision: https://reviews.llvm.org/D127295
Given the flag `--always-execute-loop-body` the compiler emits code
to execute the body of the loop atleast once.
Note: This is part of upstreaming from the fir-dev branch of
https://github.com/flang-compiler/f18-llvm-project.
Reviewed By: awarzynski, schweitz
Differential Revision: https://reviews.llvm.org/D127128
Co-authored-by: Jean Perier <jperier@nvidia.com>
Co-authored-by: Eric Schweitz <eschweitz@nvidia.com>
Co-authored-by: V Donaldson <vdonaldson@nvidia.com>
Co-authored-by: Valentin Clement <clementval@gmail.com>
Co-authored-by: Sameeran Joshi <sameeranjayant.joshi@amd.com>
This supports lowering parse-tree to MLIR for threadprivate directive
following the OpenMP 5.1 [2.21.2] standard. Take the following as an
example:
```
program m
integer, save :: i
!$omp threadprivate(i)
call sub(i)
!$omp parallel
call sub(i)
!$omp end parallel
end
```
```
func.func @_QQmain() {
%0 = fir.address_of(@_QFEi) : !fir.ref<i32>
%1 = omp.threadprivate %0 : !fir.ref<i32> -> !fir.ref<i32>
fir.call @_QPsub(%1) : (!fir.ref<i32>) -> ()
omp.parallel {
%2 = omp.threadprivate %0 : !fir.ref<i32> -> !fir.ref<i32>
fir.call @_QPsub(%2) : (!fir.ref<i32>) -> ()
omp.terminator
}
return
}
```
A threadprivate operation (omp.threadprivate) is created for all
references to a threadprivate variable. The runtime will appropriately
return a threadprivate var (%1 as above) or its copy (%2 as above)
depending on whether it is outside or inside a parallel region. For
threadprivate access outside the parallel region, the threadprivate
operation is created in instantiateVar. Inside the parallel region, it
is created in createBodyOfOp.
One new utility function collectSymbolSet is created for collecting
all the variables with a property within a evaluation, which may be one
Fortran, or OpenMP, or OpenACC construct.
Reviewed By: kiranchandramohan
Differential Revision: https://reviews.llvm.org/D124226
As per issue #1196, the loop induction variable, which is an argument
in the omp.wsloop operation, does not have a memory location, so when
passed to a function or subroutine, the reference to the value is not
a memory location, but the value of the induction variable. The callee
function/subroutine is then trying to dereference memory at address 1
or some other "not a good memory location".
This is fixed by creating a temporary memory location and storing the
value of the induction variable in that.
Test fixes as a consequence of the changed code generated.
Add checking for some of the omp-unstructured.f90 to check for alloca,
store and load operations, to ensure the correct flow. Add a test
for CYCLE inside a omp-do loop.
Also convert to use -emit-fir in the omp-unstructrued, and make
the symbol matching consistent in the omp-wsloop-variable test.
Reviewed By: peixin
Differential Revision: https://reviews.llvm.org/D126711
The basic infinite loop is lowered to a branch to the body of the
loop, and the body containing a back edge as its terminator.
Note: This is part of upstreaming from the fir-dev branch of
https://github.com/flang-compiler/f18-llvm-project.
Reviewed By: rovka
Differential Revision: https://reviews.llvm.org/D126697
Co-authored-by: Eric Schweitz <eschweitz@nvidia.com>
Co-authored-by: V Donaldson <vdonaldson@nvidia.com>
Upstream the code for handling loops with real control variables from
the fir-dev branch at
https://github.com/flang-compiler/f18-llvm-project/tree/fir-dev/
Also add a test.
Loops with real-valued control variables are always lowered to
unstructured loops. The real-valued control variables are handled the
same as integer ones, the only difference is that they need to use
floating point instructions instead of the integer equivalents.
Co-authored-by: V Donaldson <vdonaldson@nvidia.com>
The following changes are made for OpenMP operations with unstructured region,
1. For combined constructs the outer operation is considered a structured
region and the inner one as the unstructured.
2. Added a condition to ensure that we create new blocks only once for nested
unstructured OpenMP constructs.
Tests are added for checking the structure of the CFG.
Note: This is part of upstreaming from the fir-dev branch of
https://github.com/flang-compiler/f18-llvm-project. Code originally reviewed
at https://github.com/flang-compiler/f18-llvm-project/pull/1394.
Reviewed By: vdonaldson, shraiysh, peixin
Differential Revision: https://reviews.llvm.org/D126375
Upstream the code for handling while loops from the fir-dev branch at
https://github.com/flang-compiler/f18-llvm-project/tree/fir-dev/
Also add tests.
The while loop is lowered to a header block that checks the loop
condition and branches either to the exit block or to the body of the
loop. The body of the loop will unconditionally branch back to the
header.
Differential Revision: https://reviews.llvm.org/D126636
Co-authored-by: Eric Schweitz <eschweitz@nvidia.com>
Co-authored-by: V Donaldson <vdonaldson@nvidia.com>
Test that chunk size is passed to the static init function.
Using three different variations:
1. Single constant.
2. Expression with constants.
3. Variable value.
Reviewed By: peixin, shraiysh
Differential Revision: https://reviews.llvm.org/D126383
For pointer variables, using getSymbolAddress cannot get the coorect
address for atomic read/write operands. Use genExprAddr to fix it.
Reviewed By: shraiysh, NimishMishra
Differential Revision: https://reviews.llvm.org/D125793
Since the FIR operations are mostly structured, it is only the functions
that could contain multiple blocks inside an operation. This changes
with OpenMP since OpenMP regions can contain multiple blocks. For
unstructured code, the blocks are created in advance and belong to the
top-level function. This caused code in OpenMP region to be placed under
the function level.
In this fix, if the OpenMP region is unstructured then new blocks are
created inside it.
Note1: This is part of upstreaming from the fir-dev branch of
https://github.com/flang-compiler/f18-llvm-project. The code in this patch is a
subset of the changes in https://github.com/flang-compiler/f18-llvm-project/pull/1178.
Reviewed By: vdonaldson
Differential Revision: https://reviews.llvm.org/D126293
Co-authored-by: Val Donaldson <vdonaldson@nvidia.com>
Co-authored-by: Eric Schweitz <eschweitz@nvidia.com>
Co-authored-by: Valentin Clement <clementval@gmail.com>
A dummy argument in an entry point of a subprogram with multiple
entry points need not be defined in other entry points. It is only
legal to reference such an argument when calling an entry point that
does have a definition. An entry point without such a definition
needs a local "substitute" definition sufficient to generate code.
It is nonconformant to reference such a definition at runtime.
Most such definitions and associated code will be deleted as dead
code at compile time. However, that is not always possible, as in
the following code. This code is conformant if all calls to entry
point ss set m=3, and all calls to entry point ee set n=3.
subroutine ss(a, b, m, d, k) ! no x, y, n
integer :: a(m), b(a(m)), m, d(k)
integer :: x(n), y(x(n)), n
integer :: k
1 print*, m, k
print*, a
print*, b
print*, d
if (m == 3) return
entry ee(x, y, n, d, k) ! no a, b, m
print*, n, k
print*, x
print*, y
print*, d
if (n /= 3) goto 1
end
integer :: xx(3), yy(5), zz(3)
xx = 5
yy = 7
zz = 9
call ss(xx, yy, 3, zz, 3)
call ss(xx, yy, 3, zz, 3)
end
Lowering currently generates fir::UndefOp's for all unused arguments.
This is usually ok, but cases such as the one here incorrectly access
unused UndefOp arguments for m and n from an entry point that doesn't
have a proper definition.
The problem is addressed by creating a more complete definition of an
unused argument in most cases. This is implemented in large part by
moving the definition of an unused argument from mapDummiesAndResults
to mapSymbolAttributes. The code in mapSymbolAttributes then chooses
one of three code generation options, depending on information
available there.
This patch deals with dummy procedures in alternate entries, and adds
a TODO for procedure pointers (the PFTBuilder is modified to analyze
procedure pointer symbol so that they are not silently ignored, and
instead hits proper TODOs).
BoxAnalyzer is also changed because assumed-sized arrays were wrongfully
categorized as constant shape arrays. This had no impact, except when
there were unused entry points.
Co-authored-by: jeanPerier <jperier@nvidia.com>
Differential Revision: https://reviews.llvm.org/D125867
Now that the requirements and implementation of asynchronous I/O are
better understood, adjust their I/O runtime APIs. In particular:
1) Remove the BeginAsynchronousOutput/Input APIs; they're not needed,
since any data transfer statement might have ASYNCHRONOUS= and
(if ASYNCHRONOUS='YES') ID= control list specifiers that need to
at least be checked.
2) Add implementations for BeginWait(All) to check for the error
case of a bad unit number and nonzero ID=.
3) Rearrange and comment SetAsynchronous so that it's clear that
it can be called for READ/WRITE as well as for OPEN.
The implementation remains completely synchronous, but should be conforming.
Where opportunities make sense for true asynchronous implementations of
some big block transfers without SIZE= in the future, we'll need to add
a GetAsynchronousId API to capture ID= on a READ or WRITE; add sourceFile
and sourceLine arguments to BeginWait(All) for good error reporting;
track pending operations in unit.h; and add code to force synchronization
to non-asynchronous I/O operations.
Lowering should call SetAsynchronous when ASYNCHRONOUS= appears as
a control list specifier. It should also set ID=x variables to 0
until such time as we support asynchronous operations, if ever.
This patch only removes the removed APIs from lowering.
Differential Revision: https://reviews.llvm.org/D126143
The types of lower bound, upper bound, and step are converted into the
type of the loop variable if necessary. OpenMP runtime requires 32-bit
or 64-bit loop variables. OpenMP loop iteration variable cannot have
more than 64 bits size and will be narrowed.
This patch is part of upstreaming code from the fir-dev branch of
https://github.com/flang-compiler/f18-llvm-project. (#1256)
Co-authored-by: kiranchandramohan <kiranchandramohan@gmail.com>
Reviewed By: kiranchandramohan, shraiysh
Differential Revision: https://reviews.llvm.org/D125740
When parallel is used in a combined construct, then use a separate
function to create the parallel operation. It handles the parallel
specific clauses and leaves the rest for handling at the inner
operations.
Reviewed By: peixin, shraiysh
Differential Revision: https://reviews.llvm.org/D125465
Co-authored-by: Sourabh Singh Tomar <SourabhSingh.Tomar@amd.com>
Co-authored-by: Eric Schweitz <eschweitz@nvidia.com>
Co-authored-by: Valentin Clement <clementval@gmail.com>
Co-authored-by: Nimish Mishra <neelam.nimish@gmail.com>
Convert Fortran parse-tree into MLIR for collapse-clause.
Includes simple Fortran to LLVM-IR test, with auto-generated
check-lines (some of which have been edited by hand).
Reviewed By: kiranchandramohan, shraiysh, peixin
Differential Revision: https://reviews.llvm.org/D125302
This supports the lowering parse-tree to MLIR for ordered clause in
worksharing-loop directive. Also add the test case for operation
conversion.
Part of this patch is from the fir-dev branch of
https://github.com/flang-compiler/f18-llvm-project.
Co-authored-by: Sourabh Singh Tomar <SourabhSingh.Tomar@amd.com>
Reviewed By: kiranchandramohan, NimishMishra
Differential Revision: https://reviews.llvm.org/D125456
This patch adds lowering for task construct from Fortran to
`omp.task` operation in OpenMPDialect Dialect (mlir). Also added tests
for the same.
Reviewed By: kiranchandramohan, peixin
Differential Revision: https://reviews.llvm.org/D124138
As is already supported for dummy procedures, we need to also accept
declarations of procedure pointers that consist of a POINTER attribute
statement followed by an INTERFACE block. (The case of an INTERFACE
block followed by a POINTER statement already works.)
While cleaning this case up, adjust the utility predicate IsProcedurePointer()
to recognize it (namely a SubprogramDetails symbol with Attr::POINTER)
and delete IsProcName(). Extend tests, and add better comments to
symbol.h to document the two ways in which procedure pointers are
represented.
Differential Revision: https://reviews.llvm.org/D125139
The OpenMP worksharing loop operation in the dialect is a proper loop
operation and not a container of a loop. So we have to lower the
parse-tree OpenMP loop construct and the do-loop inside the construct
to a omp.wsloop operation and there should not be a fir.do_loop inside
it. This is achieved by skipping fir.do_loop creation and calling genFIR
for the nested evaluations in the lowering of the do construct.
Note: Handling of more clauses, parallel do, storage of loop index variable etc will come in separate patches.
Part of the upstreaming effort to move LLVM Flang from fir-dev branch of
https://github.com/flang-compiler/f18-llvm-project to the LLVM Project.
Reviewed By: peixin
Differential Revision: https://reviews.llvm.org/D125024
Co-authored-by: Sourabh Singh Tomar <SourabhSingh.Tomar@amd.com>
Co-authored-by: Shraiysh Vaishay <Shraiysh.Vaishay@amd.com>
The FIR `do_loop` is designed as a structured operation with a single
block inside it. Presence of unstructured constructs like jumps, exits
inside the loop will cause the loop to be marked as unstructured. These
loops are lowered using the `control-flow` dialect branch operations.
Fortran semantics do not allow the loop variable to be modified inside
the loop. To prevent accidental modification, the iteration of the
loop is modeled by two variables, trip-count and loop-variable.
-> The trip-count and loop-variable are initialized in the pre-header.
The trip-count is set as (end-start+step)/step where end, start and
step have the usual meanings. The loop-variable is initialized to start.
-> The header block contains a conditional branch instruction which
selects between branching to the body of the loop or the exit block
depending on the value of the trip-count.
-> Inside the body, the trip-count is decremented and the loop-variable
incremented by the step value. Finally it branches to the header of the
loop.
Part of the upstreaming effort to move LLVM Flang from fir-dev branch of
https://github.com/flang-compiler/f18-llvm-project to the LLVM Project.
Reviewed By: awarzynski
Differential Revision: https://reviews.llvm.org/D124837
Co-authored-by: Val Donaldson <vdonaldson@nvidia.com>
Co-authored-by: Eric Schweitz <eschweitz@nvidia.com>
Co-authored-by: Jean Perier <jperier@nvidia.com>
Co-authored-by: Peter Klausler <pklausler@nvidia.com>
This patch restricts the value of `if` clause expression to an I1 value.
It also restricts the value of `num_threads` clause expression to an I32
value.
Reviewed By: kiranchandramohan
Differential Revision: https://reviews.llvm.org/D124142
A recent change is eliciting a valid warning from the out-of-tree
flang build bot; fix by using a reference in a range-based for().
Differential Revision: https://reviews.llvm.org/D124682
Semantics is not preventing a named common block to appear with
different size in a same file (named common block should always have
the same storage size (see Fortran 2018 8.10.2.5), but it is a common
extension to accept different sizes).
Lowering was not coping with this well, since it just use the first
common block appearance, starting with BLOCK DATAs to define common
blocks (this also was an issue with the blank common block, which can
legally appear with different size in different scoping units).
Semantics is also not preventing named common from being initialized
outside of a BLOCK DATA, and lowering was dealing badly with this,
since it only gave an initial value to common blocks Globals if the
first common block appearance, starting with BLOCK DATAs had an initial
value.
Semantics is also allowing blank common to be initialized, while
lowering was assuming this would never happen, and was never creating
an initial value for it.
Lastly, semantics was not complaining if a COMMON block was initialized
in several scoping unit in a same file, while lowering can only generate
one of these initial value.
To fix this, add a structure to keep track of COMMON block properties
(biggest size, and initial value if any) at the Program level. Once the
size of a common block appearance is know, the common block appearance
is checked against this information. It allows semantics to emit an error
in case of multiple initialization in different scopes of a same common
block, and to warn in case named common blocks appears with different
sizes. Lastly, this allows lowering to use the Program level info about
common blocks to emit the right GlobalOp for a Common Block, regardless
of the COMMON Block appearances order: It emits a GlobalOp with the
biggest size, whose lowest bytes are initialized with the initial value
if any is given in a scope where the common block appears.
Lowering is updated to go emit the common blocks before anything else so
that the related GlobalOps are available when lowering the scopes where
common block appear. It is also updated to not assume that blank common
are never initialized.
Differential Revision: https://reviews.llvm.org/D124622
This patch adds code to lower simple Fortran Do loops with loop control.
Lowering is performed by the the `genFIR` function when called with a
`Fortran::parser::DoConstruct`. `genFIR` function calls `genFIRIncrementLoopBegin`
then calls functions to lower the body of the loop and finally calls
the function `genFIRIncrementLoopEnd`. `genFIRIncrementLoopBegin` is
responsible for creating the FIR `do_loop` as well as storing the value of
the loop index to the loop variable. `genFIRIncrementLoopEnd` returns
the incremented value of the loop index and also stores the index value
outside the loop. This is important since the loop variable can be used
outside the loop. Information about a loop is collected in a structure
`IncrementLoopInfo`.
Note 1: Future patches will bring in lowering for unstructured,
infinite, while loops
Note 2: This patch is part of upstreaming code from the fir-dev branch of
https://github.com/flang-compiler/f18-llvm-project.
Reviewed By: awarzynski
Differential Revision: https://reviews.llvm.org/D124277
Co-authored-by: Eric Schweitz <eschweitz@nvidia.com>
Co-authored-by: Jean Perier <jperier@nvidia.com>
Co-authored-by: Val Donaldson <vdonaldson@nvidia.com>
Co-authored-by: Peter Klausler <pklausler@nvidia.com>
Co-authored-by: Valentin Clement <clementval@gmail.com>
This patch provides the basic infrastructure for lowering declarative
constructs for OpenMP and OpenACC.
This is part of the upstreaming effort from the fir-dev branch in [1].
[1] https://github.com/flang-compiler/f18-llvm-project
Reviewed By: kiranchandramohan, shraiysh, clementval
Differential Revision: https://reviews.llvm.org/D124225
Lowering of FailImage statement generates a runtime call and the
unreachable operation. The unreachable operation cannot terminate
a structured operation like the IF operation, hence mark as
unstructured.
Note: This patch is part of upstreaming code from the fir-dev branch of
https://github.com/flang-compiler/f18-llvm-project.
Reviewed By: clementval
Differential Revision: https://reviews.llvm.org/D124520
Co-authored-by: Eric Schweitz <eschweitz@nvidia.com>
The lowering code was mistakenly assuming that the second argument
in the signature provided by semantics is the DIM argument. This
caused calls with a KIND argument but no DIM to be lowered as if the
KIND argument was DIM.
Differential Revision: https://reviews.llvm.org/D124243
In some case the lowering of `ichar` is generating an `arith.extui` operation
with the same from/to type. This operation do not accept from/to types to be
the same. If the from/to types are identical, we do not generate the extra
operation.
Reviewed By: jeanPerier
Differential Revision: https://reviews.llvm.org/D124107
This patch adds lowering support for atomic read and write constructs.
Also added is pointer modelling code to allow FIR pointer like types to
be inferred and converted while lowering.
Reviewed By: kiranchandramohan
Differential Revision: https://reviews.llvm.org/D122725
Co-authored-by: Kiran Chandramohan <kiran.chandramohan@arm.com>
A missing "!" in the call interface lowering caused all derived type
arguments without length parameters that require and explicit interface
to be passed via fir.box (runtime descriptor).
This was not the intent: there is no point passing a simple derived type
scalars or explicit shapes by descriptor just because they have an attribute
like TARGET. This would actually be problematic with existing code that is
not always 100% compliant: some code implicitly calls procedures with
TARGET dummy attributes (this is not something a compiler can enforce
if the call and procedure definition are not in the same file).
Add a Scope::IsDerivedTypeWithLengthParameter to avoid passing derived
types with only kind parameters by descriptor. There is no point, the
callee knows about the kind parameter values.
Differential Revision: https://reviews.llvm.org/D123990