
Currently loop cache analysis uses following formula to evaluate cost of an RefGroup for a consecutive memory access: `RefCost=(TripCount*Stride)/CLS` This cost evaluates to zero when `TripCount*Stride` is smaller than cache-line-size. This results in wrong cost value for a loop and misleads loopInterchange decisions as shown in [this case](https://llvm.godbolt.org/z/jTz1vn4hn). This patch fixes the problem by rounding the cost to 1 once this problem happens.