Early return is accepted in OpenACC loop not directly nested in a
compute construct. Since acc.loop operation has a region, the
`func.return` operation cannot be directly used inside the region.
An early return is materialized by an `acc.yield` operation returning a
`true` value. The standard end of the `acc.loop` region yield a `false`
value in this case.
A conditional branch operation on the `acc.loop` result will branch to
the `finalBlock` or just to the continue block whether an early exit was
produce in the acc.loop.
This got "lost" in the HLFIR transformation. This patch applies the old
attribute to the AssociateOp that needs it, and forwards it to the
AllocaOp that is generated when lowering to FIR.
**Scope of the PR:**
1. Lowering global and local procedure pointer declaration statement
with explicit or implicit interface. The explicit interface can be from
an interface block, a module procedure or an internal procedure.
2. Lowering procedure pointer assignment, where the target procedure
could be external, module or internal procedures.
3. Lowering reference to procedure pointers so that it works end to end.
**PR notes:**
1. The first commit of the PR does not include testing. I would like to
collect some comments first, which may alter the output. Once I confirm
the implementation, I will add some testing as a follow up commit to
this PR.
2. No special handling of the host-associated entities when an internal
procedure is the target of a procedure pointer assignment in this PR.
**Implementation notes:**
1. The implementation is using the HLFIR path.
2. Flang currently uses `getUntypedBoxProcType` to get the
`fir::BoxProcType` for `ProcedureDesignator` when getting the address of
a procedure in order to pass it as an actual argument. This PR inherits
the same design decision for procedure pointer as the `fir::StoreOp`
requires the same memory type.
Note: this commit is actually resubmitting the original commit from
PR #70461 that was reverted. See PR #73221.
**Scope of the PR:**
1. Lowering global and local procedure pointer declaration statement
with explicit or implicit interface. The explicit interface can be from
an interface block, a module procedure or an internal procedure.
2. Lowering procedure pointer assignment, where the target procedure
could be external, module or internal procedures.
3. Lowering reference to procedure pointers so that it works end to end.
**PR notes:**
1. The first commit of the PR does not include testing. I would like to
collect some comments first, which may alter the output. Once I confirm
the implementation, I will add some testing as a follow up commit to
this PR.
2. No special handling of the host-associated entities when an internal
procedure is the target of a procedure pointer assignment in this PR.
**Implementation notes:**
1. The implementation is using the HLFIR path.
2. Flang currently uses `getUntypedBoxProcType` to get the
`fir::BoxProcType` for `ProcedureDesignator` when getting the address of
a procedure in order to pass it as an actual argument. This PR inherits
the same design decision for procedure pointer as the `fir::StoreOp`
requires the same memory type.
Using an op with a region cause some issue with unstructured code. This
patch make use of acc.declare_enter and acc.declare_exit to represent
the implicit declare region.
Before emitting a warning message, code should check that the usage in
question should be diagnosed by calling ShouldWarn(). A fair number of
sites in the code do not, and can emit portability warnings
unconditionally, which can confuse a user that hasn't asked for them
(-pedantic) and isn't terribly concerned about portability *to* other
compilers.
Add calls to ShouldWarn() or IsEnabled() around messages that need them,
and add -pedantic to tests that now require it to test their portability
messages, and add more expected message lines to those tests when
-pedantic causes other diagnostics to fire.
The front-end is making implicit conversions explicit in assignment and
structure constructors.
While this generally helps and is needed by semantics to fold structure
constructors correctly, this is incorrect when the LHS or component is
an allocatable. The RHS may have non default lower bounds that should be
propagated to the LHS, and making the conversion explicit changes the
semantics. In the structure constructor, the situation is even worse
since Fortran 2018 7.5.10 point 7 allows the value to be a reference to
an unallocated allocatable, and adding an explicit conversion in
semantics will cause a segfault.
This patch removes the explicit convert in semantics when the
LHS/component is a whole allocatable, and update lowering to deal with
the conversion insertion, dealing with preserving the lower bounds and
the tricky structure constructor case.
OpenACC/OpenMP atomic lowering needs a finer control over expression
lowering. This patch allows mapping evaluate::Expr<T> to mlir::Value so
that any subsequent expression lowering will use these values when an
operand is a mapped Expr<T>.
This is an alternative to
https://github.com/llvm/llvm-project/pull/69866 From which I took the
test and some of the logic to extract the non-atomic sub-expression.
---------
Co-authored-by: Nimish Mishra <neelam.nimish@gmail.com>
Some compilers allow the `$acc routine(<name>)` to be placed at the
program unit level. To be compatible, this patch enables the use of acc
routine at this level. These acc routine directives must have a name.
The code in `copyHostAssociateVar` is using `createSomeArrayAssignment`
for arrays which is using the soon legacy expression lowering. Update
the copy to use hlfir.assign instead.
I used the temporary_lhs flag to mimic the current behavior, but maybe
user defined assignment should be called when needed .This flag also
prevents any finalizers to be called on the LHS if the LHS type has
finalizers (which would occur otherwise in normal intrinsic assignment).
Again, I am not sure what the OpenMP spec wants here.
Also, I added special handling for ALLOCATABLE, the current code seems
broken to me since it is basically copying the descriptor which would
lead to memory leak given the TEMP was previously allocated with the
shape of the variable in createHostAssociateVarClone. So copying the
DATA instead seemed like the right thing to do.
When a variable is used in a specification expression in a scope, it is
added to the list of variables that must be instantiated when lowering
the scope. When lowering a BLOCK, this caused instantiateVar to be
called again on all the host block variables appearing in block variable
specification expressions. This caused an extra declare to be emitted
for dummy inside block (for non dummy, instantiateVar is a no-op if the
symbol is already mapped).
Only call instantiateVar if the symbol is not mapped when lowering BLOCK
variables.
The goal is to progressively propagate all the derived type info that is
currently in the runtime type info globals into a FIR operation that can
be easily queried and used by FIR/HLFIR passes.
When this will be complete, the last step will be to stop generating the
runtime info global in lowering, but to do that later in or just before
codegen to keep the FIR files readable (on the added type-info.f90
tests, the lowered runtime info globals takes a whooping 2.6 millions
characters on 1600 lines of the FIR textual output. The fir.type_info that
contains all the info required to generate those globals for such
"trivial" types takes 1721 characters on 9 lines).
So far this patch simply starts by replacing the fir.dispatch_table
operation by the fir.type_info operation and to add the noinit/
nofinal/nodestroy flags to it. These flags will soon be used in HLFIR to
better rewrite hlfir.assign with derived types.
Currently flang-new -g is failing when compiling code containing a call
in a macro to a function defined in the same file.
The verification added in https://reviews.llvm.org/D157447 is valid,
flang lowering was failing to propagate location information in code
from macro expansion because GetSourcePositionRange does not work with
them (it fails to come with an end location), but we do not need a range
for the MLIR location, only the start.
Use GetSourcePosition instead that works with code from macro expansion.
Note that the source location is the one of the statement where the
macro appeared, if needed some FusedLocation could be later built to
keep a link to the macro location in the debug info.
There are currently several places that automatically deallocate
allocatble if they are allocated:
- INTENT(OUT) allocatable are deallocated on entry in the callee
- INTENT(OUT) allocatable are also deallocated on the caller side of
BIND(C) function in case the implementation is in C.
- Results of function returning allocatable are deallocated after usage.
- OPENMP privatized allocatable are deallocated at the end of OPENMP
region.
Introduce genDeallocateIfAllocated that centralize all this code, except
for the function return that use genFreememIfAllocated since
finalization is done separately currently.
`fir:🏭:genFinalization` and
`fir:🏭:genInlinedDeallocation` are removed and replaced by
genFreemem since their name were misleading: finalization was not
called.
There is a fallout in the tests because previous generated code did not
check the allocated status when doing inline deallocation. This was OK
since free(null) is guaranteed to be a no-op, but this makes compiler
code more complex, is a bit surprising in the generated IR IMHO, and it
relied on knowing when genDeallocateBox inserts runtime calls or uses
inlined code.
Flang was generating invalid IR when there was a GOTO to the body
of a DO loop. This happened because the value of step, computed at
the beginning of the loop, was being reused at the end of the loop,
that, for unstructured loops, is in another basic block. Because of
this, a GOTO could skip the beginning of the loop, that defined
step, and yet try to use it at the end of the loop, which is
invalid.
Instead of reusing the step value, it can be recomputed if it is a
constant, or stored and loaded to/from a temporary variable, for
non-constant step expressions.
Note that, while this change prevents the generation of invalid IR
on the presence of jumps to DO loop bodies, what happens if the
program reaches the end of a DO loop without ever passing through
its beginning is undefined behavior, as some control variables,
such as trip, will be uninitialized. It doesn't seem worth the
effort and overhead to ensure this legacy extension will behave
correctly in this case. This is consistent with at least gfortran,
that doesn't behave correctly if step is not equal to one.
Fixes: https://github.com/llvm/llvm-project/issues/65036
This patch builds on top of a prior patch in review which adds a new map
and bounds operation by modifying the OpenMP PFT lowering to support
these operations and generate them from the PFT.
A significant amount of the support for the Bounds operation is borrowed
from OpenACC's own current implementation and lowering, just ported
over to OpenMP.
The patch also adds very preliminary/initial support for lowering to
a new Capture attribute, which is stored on the new Map Operation,
which helps the later lowering from OpenMP -> LLVM IR by indicating
how a map argument should be handled. This capture type will
influence how a map argument is accessed on device and passed by
the host (different load/store handling etc.). It is reflective of a
similar piece of information stored in the Clang AST which performs a
similar role.
As well as some minor adjustments to how the map type (map bitshift
which dictates to the runtime how it should handle an argument) is
generated to further support more use-cases for future patches that
build on this work.
Finally it adds the map entry operation creation and tying it to the relevant
target operations as well as the addition of some new tests and alteration
of previous tests to support the new changes.
Depends on D158732
reviewers: kiranchandramohan, TIFitis, clementval, razvanlupusoru
Differential Revision: https://reviews.llvm.org/D158734
This patch adds `host_assoc` attribute for operations that implement
FortranVariableInterface (e.g. `hlfir.declare`). The attribute is used
by the alias analysis to make better conclusions about memory overlap.
For example, a dummy argument of an inner subroutine and a host's
variable used inside the inner subroutine cannot refer to the same
object (if the dummy argument does not satisify exceptions in F2018
15.5.2.13).
This closes a performance gap between HLFIR optimization pipeline
and FIR ArrayValueCopy for Polyhedron/nf.
It is possible for a derived type extending a type with private
components to define components with the same name as the private
components.
This was not properly handled by lowering where several fir.record type
component names could end-up being the same, leading to bad generated
code (only the first component was accessed via fir.field_index, leading
to bad generated code).
This patch handles the situation by adding the derived type mangled name
to private component.
This patch implements the lowering of the OpenMP 'requires' directive
from Flang parse tree to MLIR attributes attached to the top-level
module.
Target-related 'requires' clauses are gathered and combined for each top-level
unit during semantics. Lastly, a single module-level `omp.requires` attribute
is attached to the MLIR module with that information at the end of the process.
The `atomic_default_mem_order` clause is not addressed by this patch, but
rather it will come as a separate patch and follow a different approach.
Depends on D147214, D150328, D150329 and D157983.
Differential Revision: https://reviews.llvm.org/D147218
Since the OpenACC atomics specification is a subset of OpenMP atomics,
the same lowering implementation can be used. This change extracts out
the necessary pieces from the OpenMP lowering and puts them in a shared
spot. The shared spot is a header file so that each implementation can
template specialize directly.
After putting the OpenMP implementation in a common spot, the following
changes were needed to make it work for OpenACC:
* Ensure parsing works correctly by avoiding hardcoded offsets.
* Templatize based on atomic type.
* The checking whether it is OpenMP or OpenACC is done by checking for
OmpAtomicClauseList (OpenACC does not implement this so we just
templatize with void). It was preferable to check this instead of atomic
type because in some cases, like atomic capture, the read/write/update
implementations are called - and we want compile time evaluation of
these conditional parts.
* The memory order and hint are used only for OpenMP.
* Generate acc dialect operations instead of omp dialect operations.
This patch changes how common blocks are aggregated and named in
lowering in order to:
* fix one obvious issue where BIND(C) and non BIND(C) with the same
Fortran name were "merged"
* go further and deal with a derivative where the BIND(C) C name matches
the assembly name of a Fortran common block. This is a bit unspecified
IMHO, but gfortran, ifort, and nvfortran "merge" the common block
without complaints as a linker would have done. This required getting
rid of all the common block mangling early in FIR (\_QC) instead of
leaving that to the phase that emits LLVM from FIR because BIND(C)
common blocks did not have mangled names. Care has to be taken to deal
with the underscoring option of flang-new.
See added flang/test/Lower/HLFIR/common-block-bindc-conflicts.f90 for an
illustration.
A Cray pointee reference must be done using the characteristics
(bounds, type params) of the original pointee declaration, but
using the actual address value of the associated Cray pointer.
There might be multiple Cray pointees associated with the same
Cray pointer.
The proposed solution is to lower each Cray pointee into a POINTER
variable with a descriptor. The descriptor is initialized at the point
of declaration of the pointee, though its base_addr is set to null.
Before each reference of the Cray pointee its descriptor's base_addr
is updated to the current value of the Cray pointer.
The update of the base_addr is done using PointerAssociateScalar
runtime call, which just updates the base_addr of the descriptor.
This is a temporary solution just to make Cray pointers work
to the same extent they work with FIR lowering.
When a module variable is referenced inside an internal procedure, but
the use statement for the module is inside the host, semantics may not
create any symbols with HostAssocDetails directly under the internal
procedure scope.
So pft::getScopeVariableList, that is called in the bridge when lowering
the internal procedure scope, failed to instantiate the module
variables. This lead to "symbol is not mapped to any IR value" compile
time errors.
This patch fixes the issue by adding the variables to the list of
"captured" global variables from the host program, so that they are
instantiated as part of the `internalProcedureBindings` in the bridge.
The rational of doing it that way instead of changing
`getScopeVariableList` is that `getScopeVariableList` would have to
import all the module variables used inside the host since it cannot
know which ones are referenced inside the internal procedure from the
semantics::Scope information. The fix in this patch only instantiates
the module variables from the host that are actually referenced inside
the internal procedure.
With HLFIR the lbounds for the ALLOCATABLE result are taken from the
mutable box created for the result, so the non-default lbounds might be
propagated further causing incorrect result, e.g.:
```
program p
real, allocatable :: p5(:)
allocate(p5, source=real_init())
print *, lbound(p5, 1) ! must print 1, but prints 7
contains
function real_init()
real, allocatable :: real_init(:)
allocate(real_init(7:8))
end function real_init
end program p
```
With FIR lowering the box passed for `source` has explicit lower bound 1
at the call site, but the runtime box initialized by `real_init` call
still has lower bound 7. I am not sure if the runtime box initialized by
`real_init` will ever be accessed in a debugger via Fortran variable
names, but I think that having the right runtime bounds that can be
accessible via examining registers/stack might be good in general. So I
decided to update the runtime bounds at the point of return.
This change fixes the test above for HLFIR.
Reviewed By: jeanPerier
Differential Revision: https://reviews.llvm.org/D156187
HLFIR lowering always adds hlfir.declare when symbols are bound to their
address allocated on the stack. Ensure that the declare is placed along
with the alloca if it is hoisted. And always return the mlir value that
is bound to the symbol (i.e the alloca in FIR lowering and the declare
in HLFIR lowering).
Context: Loop index variables in OpenMP parallel regions should be
privatised to work correctly.
Reviewed By: tblah
Differential Revision: https://reviews.llvm.org/D158594
Unlike other executable constructs with associating selectors, the
selector of a SELECT RANK construct can have the ALLOCATABLE or POINTER
attribute, and will work as an allocatable or object pointer within
each rank case, so long as there is no RANK(*) case.
Getting this right exposed a correctness risk with the popular
predicate IsAllocatableOrPointer() -- it will be true for procedure
pointers as well as object pointers, and in many contexts, a procedure
pointer should not be acceptable. So this patch adds the new predicate
IsAllocatableOrObjectPointer(), and updates some call sites of the original
function to use the new one.
Differential Revision: https://reviews.llvm.org/D159043
Commonblock names are not variables, but they can be marked as
threadprivate in OpenMP. This requires the commonblock name to
be bound to the address of the Commonblock. hlfir.declares are
not required for these, but we should be able to retrieve the
mlir Value corresponding to the Commonblock. This patch enables
this by special casing the Commonblocks like procedures.
Reviewed By: tblah, vzakhari
Differential Revision: https://reviews.llvm.org/D158070
Lower the acc delcare directive in function/subroutine
to the newly introduced acc.declare operation. Only a single
acc.declare operation is procduced in a function or subroutine
so they don't end up nested.
Depends on D158314
Reviewed By: razvanlupusoru
Differential Revision: https://reviews.llvm.org/D158315
The routine directive can appear in the specification part of
a subroutine, function or module and therefore appear before the
function or subroutine is lowered. We keep track of the created
routine info attribute and attach them to the function at the end
of the lowering if the directive appeared before the function was
lowered.
Reviewed By: razvanlupusoru
Differential Revision: https://reviews.llvm.org/D158204
This patch adds lowering for the exit part of the OpenACC declare construct
in function/subroutine.
Depends on D156560
Reviewed By: razvanlupusoru
Differential Revision: https://reviews.llvm.org/D156568
This patch provides support for usage of common block
in private/firstprivate and lastprivate clauses.
Reviewed By: kiranchandramohan
Differential Revision: https://reviews.llvm.org/D156120
This supports the common block in OpenMP privat clause by making
each common block member host-associated privatization and
adds the test case.
Reviewed By: kiranchandramohan
Differential Revision: https://reviews.llvm.org/D127215
This patch adds the skeleton and the basic lowering for OpenACC declare
construct when located in the module declaration. This patch just lower the
create clause with or without modifier. Other clause and global descrutor
lowering will come in follow up patches to keep this one small enough for
review.
Reviewed By: razvanlupusoru
Differential Revision: https://reviews.llvm.org/D156266
This is an attempt at mimicing the method in which
threadprivate handles the following type of variables:
program main
integer :: i
!$omp declare target to(i)
end
Which essentially generates a GlobalOp for the variable (which
would normally only be an alloca) when it's instantiated. The
main difference is there is no operation generated within the
function, instead the declare target attribute is appended
later within handleDeclareTarget.
Reviewers: kiranchandramohan
Differential Revision: https://reviews.llvm.org/D152037
This patch implements an early outlining transform of omp.target operations in
flang. The pass is needed because optimizations may cross target op region
boundaries, but with the outlining the resulting functions only contain a
single omp.target op plus a func.return, so there should not be any opportunity
to optimize across region boundaries.
The patch also adds an interface to be able to store and retrieve the parent
function name of the original target operation. This is needed to be able to
create correct kernel function names when lowering to LLVM-IR.
Reviewed By: kiranchandramohan, domada
Differential Revision: https://reviews.llvm.org/D154879
This patch lowers allocatables and pointers named in "private" OpenMP clause.
Reviewed By: kiranchandramohan
Differential Revision: https://reviews.llvm.org/D148570
This patch extends the logic for lowering loop construct reductions to parallel block reductions.
Reviewed By: kiranchandramohan
Differential Revision: https://reviews.llvm.org/D154182
This patch adds 'unordered' attribute handling the HLFIR elementals'
builders and fixes the attribute handling in lowering and transformations.
Depends on D154031, D154032
Reviewed By: jeanPerier, tblah
Differential Revision: https://reviews.llvm.org/D154035