The greedy rewriter is used in many different flows and it has a lot of
convenience (work list management, debugging actions, tracing, etc). But
it combines two kinds of greedy behavior 1) how ops are matched, 2)
folding wherever it can.
These are independent forms of greedy and leads to inefficiency. E.g.,
cases where one need to create different phases in lowering and is
required to applying patterns in specific order split across different
passes. Using the driver one ends up needlessly retrying folding/having
multiple rounds of folding attempts, where one final run would have
sufficed.
Of course folks can locally avoid this behavior by just building their
own, but this is also a common requested feature that folks keep on
working around locally in suboptimal ways.
For downstream users, there should be no behavioral change. Updating
from the deprecated should just be a find and replace (e.g., `find ./
-type f -exec sed -i
's|applyPatternsAndFoldGreedily|applyPatternsGreedily|g' {} \;` variety)
as the API arguments hasn't changed between the two.
This is a heavy process, and it can trigger a massive explosion in
adding block arguments. While potentially reducing the code size, the
resulting merged blocks with arguments are hiding some of the def-use
chain and can even hinder some further analyses/optimizations: a merge
block does not have it's own path-sensitive context, instead the context
is merged from all the predecessors.
Previous behavior can be restored by passing:
{test-convergence region-simplify=aggressive}
to the canonicalize pass.
Dynamic type and element size of the descriptor dummy must match the
dummy static type when the dummy is not polymorphic, otherwise
IS_CONTIGUOUS, C_SIZEOF.... won't work properly inside the callee.
When the actual argument is polymorphic the descriptor of the actual may
have a different dynamic type/element size. Hence, the dummy argument
cannot simply take or copy the descriptor of the actual argument.
Add pass to lower assumed-rank operations. The current patch adds
codegen for fir.rebox_assumed_rank. It will be the pass lowering
fir.select_rank.
fir.rebox_assumed_rank is lowered to a call to CopyAndUpdateDescriptor
runtime API.
Note that the lowering ends-up allocating two new descriptors at the
LLVM level (one alloca created by the pass for the CopyAndUpdateDescriptor
result descriptor argument, the second one is created by the fir.load
of the result descriptor in codegen).
LLVM is currently unable to properly optimize and merge those allocas.
The "nocapture" attribute added to CopyAndUpdateDescriptor arguments
gives part of the information to LLVM, but the fir.load codegen of
descriptors must be updated to use llvm.memcpy instead of
llvm.load+store to allow LLVM to optimize it. This will be done in later patch.