When dealing with overlaps in user defined assignments, some entities
with descriptors (fir.box) may be saved without descriptors. The current
code was replacing the original box entity with the "raw" copy with a
simple cast instead of creating a box for the copy. This patch ensures a
fir.embox is emitted instead.
This reverts commit 775754e32840c6b6ca64c8bc0b7ae2c778b97d1e.
Relanding with removing part of the LIT test. There seems to be operations
ordering indeterminism that is unrelated to my change. I will address this
issue separately.
This patch places the finalization code for the RHS of a user-defined
assignment after the assignment code. The change only affects
standalone RegionAssignOp operations.
When the analysis of hlfir.region_assign determined that the LHS region
evaluation may be impacted by the assignment effects, all LHS must be
fully evaluated and saved before any assignment is done.
This patch adds TemporaryStorage variants to save address, including
vector subscripted entities addresses whose shape must be saved.
It uses the DescriptorStack runtime to deal with complex cases inside
forall. For the sake of simplicity, this is also used for vector
subscripted LHS outside of foralls (each element address is saved as
a descriptor on this stack. This is a bit suboptimal, but it is a safe
start that will work with all kinds of type (polymorphic, PDTs...)
without further work). Another approach would be to saved only the
values that are conflicting in the LHS computation, but this would
require a much more complex analysis of the LHS region DAG.
Differential Revision: https://reviews.llvm.org/D154057
In WHERE and masked FORALL assignment, both the mask and the
RHS may need to be saved in some temporary storage before evaluating
the assignment.
The code was trying to "optimize" that case when evaluating the RHS
by not fetching the mask temporary that was just created, but in simple
cases of WHERE construct where the evaluated mask is an hlfir.expr,
this caused the hlfir.expr to be both used in an hlfir.associate and
later in an hlfir.apply to create the fir.if to mask the RHS evaluation.
This double usage prevents codegen from inlining the hlfir.expr at the
hlfir.apply, and from "moving" the hlfir.expr storage into the temp
during hlfir.associate bufferization. So this is pessimizing the code:
this would lead to created two mask array temporary storages
This was caught by the unexpectedly high number of "not yet implemented:
hlfir.associate of hlfir.expr with more than one use" that were firing.
Use the mask temporary instead (the hlfir.associate result) when possible.
Some temporary (the "inlined stack") do not support fetching and pushing
in the same run (a single counter is used to keep track of the fetching
and pushing position). Add a canBeFetchedAfterPush() for safety,
but this limitation is anyway not relevant for hlfir.expr since the
inlined stack is only used to save "trivial" scalars.
Also update the temporary storage name to only indicate "forall" if
the top level construct is a FORALL. This is not a very precise name,
but it should at least give a correct context to indicate in the IR
why some temporary array storage was created.
Differential Revision: https://reviews.llvm.org/D153880
This patch adds support for vector subscripted assignment left-hand
side. It does not yet add support for the cases where the LHS must be
saved because its evaluation could be impacted by the assignment.
The implementation adds an hlfir::ElementalOpInterface to share the
elemental inlining utility and some other tools between
hlfir::ElementalOp and hlfir::ElelemntalAddrOp.
It adds generateYieldedLHS() to allow retrieving the LHS value
in lowering, whether or not it is vector subscripted. If it is vector
subscripted, this utility creates a loop nest iterating over the
elements and returns the address of an element.
Differential Revision: https://reviews.llvm.org/D153759
Add codegen support for hlfir.region_assign with user defined
assignment.
It is currently a bit pessimistic, because outside of forall, it
does not use the PURE aspect, if any, of the assignment routine to
rule out that the routine can write to something else than the LHS that
could overlap with the RHS.
However, the current lowering is anyway adding parenthesis around the
RHS, so this should not cause performance regressions.
Differential Revision: https://reviews.llvm.org/D153516
Previously only a constant reference was stored in the FirOpBuilder.
However, a lot of code was merged using
FirOpBuilder builder{rewriter, getKindMapping(mod)};
This is incorrect because the KindMapping returned will go out of scope
as soon as FirOpBuilder's constructor had run. This led to an infinite
loop running some tests using HLFIR (because the stack space containing
the kind mapping was re-used and corrupted).
One solution would have just been to fix the incorrect call sites,
however, as a large number of these had already made it past review, I
decided to instead change FirOpBuilder to store its own copy of the
KindMapping. This is not costly because nearly every time we construct a
KindMapping is exclusively to construct a FirOpBuilder. To make this
common pattern simpler, I added a new constructor to FirOpBuilder which
calls getKindMapping().
Differential Revision: https://reviews.llvm.org/D151881
Generate temporary storage inside WHERE and FORALL using the temporary
stack runtime. This covers all cases outside of LHS temporary, where the
descriptor stack will have to be used.
Reviewed By: vzakhari
Differential Revision: https://reviews.llvm.org/D151251
Generate temporary storage inline inside WHERE and FORALL when possible.
A following patch will use the runtime to cover the generic cases.
Reviewed By: vzakhari
Differential Revision: https://reviews.llvm.org/D151247
When inner forall bound computations do not depend on previous
forall indices, they can be hoisted.
This is possible because:
- bound computation are required to be pure (so evaluating them only
once is possible).
- If the bound computation depends on a value previously assigned, the
forall scheduling analysis created different run for it: the
assignment impacting the bounds value is not part of the current loop
nest.
The reason this optimization is done at that point and not as part of
generic loop hoisting optimization is that having the all the loop
bound computation hoisted will allow allocating simple temporary
storages. The number of iteration can be pre-computed and used as the
extent for the temporary.
Differential Revision: https://reviews.llvm.org/D151110
Nothing special is needed, other than adding the logging code for where
masks and to plug the pattern. This tests mainly adds test.
Note that some of the justifications to create temps shows some lacks of
side effect interface on operations (like hlfir.transpose), or on some
transparent llvm intrinsic calls (llvm.stacksave/restore).
I think we should as much as possible try to improve this on the ops
generate code rather than special casing it here.
Differential Revision: https://reviews.llvm.org/D150581
The patch applies the schedule built with the utility added in the
previous D150455 patch to generate the code for an ordered assignment
tree. For now, it only supports forall that do not contain user defined
assignments or assignments to vector subscripted entities, and for
which the scheduling analysis does not require temporary storages.
Support for temporary, WHERE, and user-defined/vector subscript
assignment will be added in later patches.
This enables end-to-end support with HLFIR for forall where the schedule
analysis can prove there is no need to create temporary storage.
Differential Revision: https://reviews.llvm.org/D150564
The lowering of hlfir.forall to loops (and later hlfir.where) requires
doing a data dependency analysis to avoid creating temporary storage for
every control/mask/rhs/lhs expressions.
The added code implements a data dependency analysis for the hlfir
ordered assignment trees (it is not specific to Forall since these nodes
includes Where, user defined assignments, and assignment to vector
subscripted entities, but the added code is only plugged and tested
with hlfir.forall in this patch).
This data dependency analysis returns a "schedule", which is a list of
runs containing actions. Each runs will result in a single loop nest
evaluating all its action "at the same time" inside the loop body.
Actions may either evaluating an assignment, or saving some expression
evaluation (the value yielded inside the ordered assignment hlfir operations)
in a temporary storage before doing the assignment that requires this
expression value but may "conflict" with it.
A "conflict" is a read in an expression E to a variable that is, or may
be (analysis is conservative), written by an assignment that depends on
E.
The analysis is based on MLIR SideEffectInterface and fir AliasAnalysis
which makes it generic.
For now, the codegen that will apply the schedule and rewrite the
hlfir.forall into a set of loops is not implemented, but the scheduling
is tested on its own (from Fortran, because it allows testing many cases
in very readable fashions).
The current scheduling has limitations, for instance
"forall(i=1, 10) x(i)=2*x(i)" does not require saving the RHS values for
all "i" before doing the assignments since the RHS does not depend
on values computed during previous iterations. Any user call will also
trigger a conservative assumption that there is a conflict. Finally,
a lot of operations are missing memory effect interfaces (especially
in HLFIR). This patch adds a few so that it can be tested, but more
will be added in later patches.
Differential Revision: https://reviews.llvm.org/D150455
I plan to implement lowering from parse tree to HLFIR first for forall
and where to ease testing of the rewrite pass while writing it.
To avoid cryptic errors in ConvertToFir pass about unhandled operations,
this patch already defines the pass that will further lower these
operations and make it throw clear TODO messages.
Differential Revision: https://reviews.llvm.org/D149852