This is a heavy process, and it can trigger a massive explosion in
adding block arguments. While potentially reducing the code size, the
resulting merged blocks with arguments are hiding some of the def-use
chain and can even hinder some further analyses/optimizations: a merge
block does not have it's own path-sensitive context, instead the context
is merged from all the predecessors.
Previous behavior can be restored by passing:
{test-convergence region-simplify=aggressive}
to the canonicalize pass.
Dynamic type and element size of the descriptor dummy must match the
dummy static type when the dummy is not polymorphic, otherwise
IS_CONTIGUOUS, C_SIZEOF.... won't work properly inside the callee.
When the actual argument is polymorphic the descriptor of the actual may
have a different dynamic type/element size. Hence, the dummy argument
cannot simply take or copy the descriptor of the actual argument.
Add pass to lower assumed-rank operations. The current patch adds
codegen for fir.rebox_assumed_rank. It will be the pass lowering
fir.select_rank.
fir.rebox_assumed_rank is lowered to a call to CopyAndUpdateDescriptor
runtime API.
Note that the lowering ends-up allocating two new descriptors at the
LLVM level (one alloca created by the pass for the CopyAndUpdateDescriptor
result descriptor argument, the second one is created by the fir.load
of the result descriptor in codegen).
LLVM is currently unable to properly optimize and merge those allocas.
The "nocapture" attribute added to CopyAndUpdateDescriptor arguments
gives part of the information to LLVM, but the fir.load codegen of
descriptors must be updated to use llvm.memcpy instead of
llvm.load+store to allow LLVM to optimize it. This will be done in later patch.