Following our
[discussion](https://discourse.llvm.org/t/rfc-plan-to-improve-loopinterchange-by-undoing-simple-reductions/89071),
I ported GCC’s undo_simple_reduction into LLVM.
**Key changes**
- Implement an Reduction2Memory step in LoopInterchange to support the
reduction in the inner loop.
- The feature is behind an option `-loop-interchange-reduction-to-mem`
and is OFF by default. With the feature off, the pass behaves as before
(minimal impact).
- Add a regression test.
**Validation & performance**
- No compile or semantic errors observed on SPEC2006 and SPEC2017 with
the new feature enabled for validation.
- With options: `-da-disable-delinearization-checks` and
`-loop-interchange-reduction-to-mem`
- SPEC2006 410.bwaves on x86 (Intel i9-11900K, Rocket Lake): **+6%**
- SPEC2017 603.bwaves_s on x86 (Intel i9-11900K, Rocket Lake): **+6%**
- SPEC2006 410.bwaves on SpacemiT Key Stone K1: **+29%**
- SPEC2006 410.bwaves on KMH RTL: **+56%**
- SPEC2017 603.bwaves_s on KMH RTL: **+24%**
**Note**
- Reduction2Memory only runs when legality and profitability checks
indicate the interchange will actually be performed. If interchange is
illegal or not profitable, no reduction2mem is applied.
---------
Co-authored-by: ict-ql <168183727+ict-ql@users.noreply.github.com>
Co-authored-by: Lin Wang <wanglulin@ict.ac.cn>
Co-authored-by: Ryotaro Kasuga <kasuga.ryotaro@fujitsu.com>