Florian Hahn 21f439f132
[LoopRotate] Use SCEV exit counts to improve rotation profitability (#187483)
Most loop transformations, like unrolling and vectorization, expect the
latch branch to be countable. Allow rotation, if it turns the latch from
uncountable to countable.

This use SCEV to check for countable exits, if CheckExitCount set.
Currently it is not set for the LPM1 run (where SCEV is not used by
other passes), only in LPM.

With that compile-time impact is mostly neutral

https://llvm-compile-time-tracker.com/compare.php?from=eba342d0ba930a404a026c80aada51c43974f0db&to=2e676337b45fae63ce9498116d8e6e43772363c5&stat=instructions:u

ClamAV is consistently slower (~+0.15%) and 7zip faster in most cases
(~-0.13%)

Across a large test set based on C/C++ workloads, this rotates ~0.8%
more loops with ~2.68M rotated loops.

For the test set, ~2.7% more loops are runtime-unrolled and +6.36% more
early exit loops vectorized on ARM64 macOS.

This fixes a regression where std::ranges::find_last loops stopped
being runtime-unrolled after
5f648c370e
which changed the loop
structure so we stopped rotating.

https://clang.godbolt.org/z/6baeE1av6

Based on https://github.com/llvm/llvm-project/pull/162654.

Co-authored-by:  Marek Sedláček <mr.mareksedlacek@gmail.com>

PR: https://github.com/llvm/llvm-project/pull/187483
2026-03-20 10:21:15 +00:00

98 lines
3.5 KiB
C++

//===- LoopRotation.cpp - Loop Rotation Pass ------------------------------===//
//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//
//===----------------------------------------------------------------------===//
//
// This file implements Loop Rotation Pass.
//
//===----------------------------------------------------------------------===//
#include "llvm/Transforms/Scalar/LoopRotation.h"
#include "llvm/Analysis/AssumptionCache.h"
#include "llvm/Analysis/InstructionSimplify.h"
#include "llvm/Analysis/LazyBlockFrequencyInfo.h"
#include "llvm/Analysis/LoopInfo.h"
#include "llvm/Analysis/LoopPass.h"
#include "llvm/Analysis/MemorySSA.h"
#include "llvm/Analysis/MemorySSAUpdater.h"
#include "llvm/Analysis/ScalarEvolution.h"
#include "llvm/Analysis/TargetTransformInfo.h"
#include "llvm/Support/CommandLine.h"
#include "llvm/Transforms/Scalar.h"
#include "llvm/Transforms/Utils/LoopRotationUtils.h"
#include "llvm/Transforms/Utils/LoopUtils.h"
#include <optional>
using namespace llvm;
#define DEBUG_TYPE "loop-rotate"
static cl::opt<unsigned> DefaultRotationThreshold(
"rotation-max-header-size", cl::init(16), cl::Hidden,
cl::desc("The default maximum header size for automatic loop rotation"));
static cl::opt<bool> PrepareForLTOOption(
"rotation-prepare-for-lto", cl::init(false), cl::Hidden,
cl::desc("Run loop-rotation in the prepare-for-lto stage. This option "
"should be used for testing only."));
LoopRotatePass::LoopRotatePass(bool EnableHeaderDuplication, bool PrepareForLTO,
bool CheckExitCount)
: EnableHeaderDuplication(EnableHeaderDuplication),
PrepareForLTO(PrepareForLTO), CheckExitCount(CheckExitCount) {}
void LoopRotatePass::printPipeline(
raw_ostream &OS, function_ref<StringRef(StringRef)> MapClassName2PassName) {
static_cast<PassInfoMixin<LoopRotatePass> *>(this)->printPipeline(
OS, MapClassName2PassName);
OS << "<";
if (!EnableHeaderDuplication)
OS << "no-";
OS << "header-duplication;";
if (!PrepareForLTO)
OS << "no-";
OS << "prepare-for-lto;";
if (!CheckExitCount)
OS << "no-";
OS << "check-exit-count";
OS << ">";
}
PreservedAnalyses LoopRotatePass::run(Loop &L, LoopAnalysisManager &AM,
LoopStandardAnalysisResults &AR,
LPMUpdater &) {
// Vectorization requires loop-rotation. Use default threshold for loops the
// user explicitly marked for vectorization, even when header duplication is
// disabled.
int Threshold =
(EnableHeaderDuplication && !L.getHeader()->getParent()->hasMinSize()) ||
hasVectorizeTransformation(&L) == TM_ForcedByUser
? DefaultRotationThreshold
: 0;
const DataLayout &DL = L.getHeader()->getDataLayout();
const SimplifyQuery SQ = getBestSimplifyQuery(AR, DL);
std::optional<MemorySSAUpdater> MSSAU;
if (AR.MSSA)
MSSAU = MemorySSAUpdater(AR.MSSA);
bool Changed =
LoopRotation(&L, &AR.LI, &AR.TTI, &AR.AC, &AR.DT, &AR.SE,
MSSAU ? &*MSSAU : nullptr, SQ, false, Threshold, false,
PrepareForLTO || PrepareForLTOOption, CheckExitCount);
if (!Changed)
return PreservedAnalyses::all();
if (AR.MSSA && VerifyMemorySSA)
AR.MSSA->verifyMemorySSA();
auto PA = getLoopPassPreservedAnalyses();
if (AR.MSSA)
PA.preserve<MemorySSAAnalysis>();
return PA;
}