Add ForLoopBoundSpecialization pass, which specializes scf.for loops into a "main loop" where `step` divides the iteration space evenly and into an scf.if that handles the last iteration.
This transformation is useful for vectorization and loop tiling. E.g., when vectorizing loads/stores, programs will spend most of their time in the main loop, in which only unmasked loads/stores are used. Only the in the last iteration (scf.if), slower masked loads/stores are used.
Subsequent commits will apply this transformation in the SparseDialect and in Linalg's loop tiling.
Differential Revision: https://reviews.llvm.org/D105804
Summary:
We already had a parallel loop specialization pass that is used to
enable unrolling and consecutive vectorization by rewriting loops
whose bound is defined as a min of a constant and a dynamic value
into a loop with static bound (the constant) and the minimum as
bound, wrapped into a conditional to dispatch between the two.
This adds the same rewriting for for loops.
Differential Revision: https://reviews.llvm.org/D82189