Currently loop cache analysis uses following formula to evaluate cost of
an RefGroup for a consecutive memory access:
`RefCost=(TripCount*Stride)/CLS`
This cost evaluates to zero when `TripCount*Stride` is smaller than
cache-line-size. This results in wrong cost value for a loop and
misleads loopInterchange decisions as shown in [this
case](https://llvm.godbolt.org/z/jTz1vn4hn).
This patch fixes the problem by rounding the cost to 1 once this problem
happens.
In this patch we change test cases from using "CHECK" to using
"CHECK-NEXT", which is to ensure the order of loops output by
loop cache analysis is correct. After D124725 we fixed the
non-deterministic output order hence we did not use "CHECK-DAG"
anymore, and now we should really use "CHECK-NEXT" to make sure
the loops in the output loop vector follow the right order.
Reviewed By: bmahjour, #loopoptwg
Differential Revision: https://reviews.llvm.org/D124984