312 Commits

Author SHA1 Message Date
Florian Hahn
8ddbc01295
[VPlan] Manage FindLastIV start value in ComputeFindLastIVResult (NFC) (#132690)
Keep the start value as operand of ComputeFindLastIVResult. A follow-up
patch will use this to make sure the start value is frozen if needed.

Depends on https://github.com/llvm/llvm-project/pull/132689

PR: https://github.com/llvm/llvm-project/pull/132690
2025-03-27 18:34:13 +00:00
Kazu Hirata
41b76119ec
[llvm] Use range constructors for *Set (NFC) (#132636) 2025-03-23 15:50:34 -07:00
Kazu Hirata
fae34938f6
[llvm] Use *Set::insert_range (NFC) (#132591)
DenseSet, SmallPtrSet, SmallSet, SetVector, and StringSet recently
gained C++23-style insert_range.  This patch uses insert_range with
iterator ranges.  For each case, I've verified that foos is defined as
make_range(foo_begin(), foo_end()) or in a similar manner.
2025-03-22 22:14:45 -07:00
Luke Lau
01f04252b6
[LV] Get FMFs from VectorBuilder in createSimpleReduction. NFC (#132017)
The other createSimpleReduction takes the FMFs from the IRBuilder, so
this aligns the VectorBuilder variant to do the same and reduce the
possibility of there being a mismatch in flags.
2025-03-20 16:38:56 +08:00
Luke Lau
f536f71580
[LV] Split RecurrenceDescriptor into RecurKind + FastMathFlags in LoopUtils. NFC (#132014)
Split off from #131300, this splits up RecurrenceDescriptor arguments so
that arbitrary recurrence kinds may be used down the line.
2025-03-19 22:56:57 +08:00
Luke Lau
67f1c033b8
[VPlan] Remove createReduction. NFCI (#131336)
This is split off from #131300.

A VPReductionRecipe will never have a AnyOf or FindLastIV recurrence, so
when it calls createReduction it always calls createSimpleReduction.

If we replace the call then it leaves createReduction with one user in
VPInstruction::ComputeReductionResult, which we can inline and then
remove.
2025-03-18 00:18:15 +08:00
Ramkumar Ramachandra
c9e250af8e
[LoopUtils] Rename a var in addDiffRuntimeChecks (NFC) (#130128) 2025-03-06 19:31:18 +00:00
Ramkumar Ramachandra
03da079968
[LoopUtils] Saturate at INT_MAX when estimating TC (#129683)
getLoopEstimatedTripCount returns std::nullopt when the trip count would
overflow the return type, but since it is an estimate anyway, we might
as well saturate at UINT_MAX, improving results.
2025-03-05 18:19:39 +00:00
Ramkumar Ramachandra
80bdfcd411
[LoopUtils] Don't wrap in getLoopEstimatedTripCount (#129080)
getLoopEstimatedTripCount returns the trip count based on profiling
data, and its documentation says that it could return 0 when the trip
count is zero, but this is not the case: a valid trip count can never be
zero, and it returns 0 when the unsigned ExitCount is incremented by 1
and wraps. Some callers are careful about checking for this zero value
in an std::optional, but it makes for an API with footguns, as a
std::optional return value indicates that a non-nullopt value would be a
valid trip count. Fix this by explicitly returning std::nullopt when the
return value would wrap, and strip additional checks in callers. This
also fixes a minor bug in LoopVectorize.
2025-03-04 08:43:08 +00:00
Mikhail Gudim
f5d153ef26
[VectorCombine] Fold binary op of reductions. (#121567)
Replace binary of of two reductions with one reduction of the binary op
applied to vectors. For example:

```
%v0_red = tail call i32 @llvm.vector.reduce.add.v16i32(<16 x i32> %v0)
%v1_red = tail call i32 @llvm.vector.reduce.add.v16i32(<16 x i32> %v1)
%res = add i32 %v0_red, %v1_red
```
gets transformed to:

```
%1 = add <16 x i32> %v0, %v1
%res = call i32 @llvm.vector.reduce.add.v16i32(<16 x i32> %1)
```
2025-02-22 06:11:33 -05:00
Mel Chen
b3cba9be41
[LoopVectorize] Vectorize select-cmp reduction pattern for increasing integer induction variable (#67812)
Consider the following loop:
```
  int rdx = init;
  for (int i = 0; i < n; ++i)
    rdx = (a[i] > b[i]) ? i : rdx;
```
We can vectorize this loop if `i` is an increasing induction variable.
The final reduced value will be the maximum of `i` that the condition
`a[i] > b[i]` is satisfied, or the start value `init`.

This patch added new RecurKind enums - IFindLastIV and FFindLastIV.

---------

Co-authored-by: Alexey Bataev <5361294+alexey-bataev@users.noreply.github.com>
2024-12-12 16:48:31 +08:00
Joshua Cao
0bc98349c8
[LICM] Use DomTreeUpdater version of SplitBlockPredecessors, nfc (#107190)
The DominatorTree version is marked for deprecation, so we use the
DomTreeUpdater version. We also update sinkRegion() to iterate over
basic blocks instead of DomTreeNodes. The loop body calls
SplitBlockPredecessors. The DTU version calls
DomTreeUpdater::apply_updates(), which may call DominatorTree::reset().
This invalidates the worklist of DomTreeNodes to iterate over.
2024-09-29 21:28:45 -07:00
Philip Reames
3d9abfc9f8 Consolidate all IR logic for getting the identity value of a reduction [nfc]
This change merges the three different places (at the IR layer) for
finding the identity value of a reduction into a single copy.  This
depends on several prior commits which fix ommissions and bugs in
the distinct copies, but this patch itself should be fully
non-functional.

As the new comments and naming try to make clear, the identity value
is a property of the @llvm.vector.reduce.* intrinsic, not of e.g.
the recurrence descriptor.  (We still provide an interface for
clients using recurrence descriptors, but the implementation simply
translates to the intrinsic which each corresponds to.)

As a note, the getIntrinsicIdentity API does not support fminnum/fmaxnum
or fminimum/fmaximum which is why we still need manual logic (but at
least only one copy of manual logic) for those cases.
2024-09-04 08:23:21 -07:00
Philip Reames
3e8840ba71 Remove "Target" from createXReduction naming [nfc]
Despite the stale comments, none of these actually use TTI, and they're
solely generating standard LLVM IR.
2024-09-03 17:03:55 -07:00
Philip Reames
2c7786e94a
Prefer use of 0.0 over -0.0 for fadd reductions w/nsz (in IR) (#106770)
This is a follow up to 924907bc6, and is mostly motivated by consistency
but does include one additional optimization. In general, we prefer 0.0
over -0.0 as the identity value for an fadd. We use that value in
several places, but don't in others. So, let's be consistent and use the
same identity (when nsz allows) everywhere.

This creates a bunch of test churn, but due to 924907bc6, most of that
churn doesn't actually indicate a change in codegen. The exception is
that this change enables the use of 0.0 for nsz, but *not* reasoc, fadd
reductions. Or said differently, it allows the neutral value of an
ordered fadd reduction to be 0.0.
2024-09-03 09:16:37 -07:00
Philip Reames
897b00f3c5 Reuse getBinOpIdentity in createAnyOfTargetReduction [nfc]
Consolidating code so that we have one copy instead of multiple reasoning
about identity element.  Note that we're (deliberately) not passing
the FMF flags to common utility to preserve behavior in this change.
2024-08-30 11:57:24 -07:00
Philip Reames
5b3ba438df Restructure createSimpleTargetReduction to match VP path [NFC]
Reduces code significantly, but more importantly makes it obvious that
this variant matches the VP variant just below.
2024-08-30 09:46:59 -07:00
Sergei Barannikov
4e93b16f3f
[llvm] Make InstSimplifyFolder constructor explicit (NFC) (#101654) 2024-08-02 16:45:50 +03:00
Mel Chen
6d12b3f67d
[VP] Refactor VectorBuilder to avoid layering violation. NFC (#99276)
This patch refactors the handling of reduction to eliminate layering
violations.

* Introduced `getReductionIntrinsicID` in LoopUtils.h for mapping
recurrence kinds to llvm.vector.reduce.* intrinsic IDs.
* Updated `VectorBuilder::createSimpleTargetReduction` to accept
llvm.vector.reduce.* intrinsic directly.
* New function `VPIntrinsic::getForIntrinsic` for mapping intrinsic ID
to the same functional VP intrinsic ID.
2024-07-25 15:14:39 +08:00
Sam Parker
d28ed29d6b
[TTI][WebAssembly] Pairwise reduction expansion (#93948)
WebAssembly doesn't support horizontal operations nor does it have a way
of expressing fast-math or reassoc flags, so runtimes are currently
unable to use pairwise operations when generating code from the existing
shuffle patterns.

This patch allows the backend to select which, arbitary, shuffle pattern
to be used per reduction intrinsic. The default behaviour is the same as
the existing, which is by splitting the vector into a top and bottom
half. The other pattern introduced is for a pairwise shuffle.

WebAssembly enables pairwise reductions for int/fp add/sub.
2024-07-17 09:21:52 +01:00
Mel Chen
4eb30cfb34
[LV][EVL] Support in-loop reduction using tail folding with EVL. (#90184)
Following from #87816, add VPReductionEVLRecipe to describe vector
predication reduction.

Address one of TODOs from #76172.
2024-07-16 16:15:24 +08:00
Nikita Popov
2d209d964a
[IR] Add getDataLayout() helpers to BasicBlock and Instruction (#96902)
This is a helper to avoid writing `getModule()->getDataLayout()`. I
regularly try to use this method only to remember it doesn't exist...

`getModule()->getDataLayout()` is also a common (the most common?)
reason why code has to include the Module.h header.
2024-06-27 16:38:15 +02:00
Florian Hahn
e775efcec4
[LV] Apply loop guards when checking recur during hoisting RT checks.
Apply loop guards when checking if the recurrence is non-negative in
cases where runtime checks are hoisted out of an inner loop.
2024-06-04 20:37:46 +01:00
Florian Hahn
fc5254c8ac
[LoopUtils] Simplify code for runtime check generation a bit (NFCI).
Store getSE result in variable to re-use and use structured bindings
when looping over bounds.
2024-06-04 12:12:29 +01:00
Carlos Alberto Enciso
3c5738f3ec
Revert "[indvars] Missing variables at Og (#88270)" (#93016)
This reverts commit 89e1f7784be40bea96d5e65919ce8d34151c1d69.

https://github.com/llvm/llvm-project/pull/88270#discussion_r1609559724
https://github.com/llvm/llvm-project/pull/88270#discussion_r1609552972

Main concerns from @nikic are the interaction between the
'IndVars' and 'LoopDeletion' passes, increasing build times
and adding extra complexity.
2024-05-22 11:36:10 +01:00
Carlos Alberto Enciso
89e1f7784b
[indvars] Missing variables at Og (#88270)
https://bugs.llvm.org/show_bug.cgi?id=51735
https://github.com/llvm/llvm-project/issues/51077

In the given test case:
 ```
4 ...
 5 void bar() {
 6   int End = 777;
 7   int Index = 27;
 8   char Var = 1;
 9   for (; Index < End; ++Index)
10     ;
11   nop(Index);
12 }
13 ...
```
Missing local variable `Index` after loop `Induction Variable Elimination`.
When adding a breakpoint at line `11`, LLDB does not have information
on the variable. But it has info on `Var` and `End`.
2024-05-22 07:09:50 +01:00
Kazu Hirata
677dddebae
[Transforms] Use StringRef::operator== instead of StringRef::equals (NFC) (#91072)
I'm planning to remove StringRef::equals in favor of
StringRef::operator==.

- StringRef::operator==/!= outnumber StringRef::equals by a factor of
  31 under llvm/ in terms of their usage.

- The elimination of StringRef::equals brings StringRef closer to
  std::string_view, which has operator== but not equals.

- S == "foo" is more readable than S.equals("foo"), especially for
  !Long.Expression.equals("str") vs Long.Expression != "str".
2024-05-04 12:33:12 -07:00
Florian Hahn
bccb7ed8ac
Reapply "[LV] Improve AnyOf reduction codegen. (#78304)"
This reverts the revert commit c6e01627acf859.

This patch includes a fix for any-of reductions and epilogue
vectorization. Extra test coverage for the issue that caused the revert
has been added in bce3bfced5fe0b019 and an assertion has been added in
c7209cbb8be7a3c65813.

--------------------------------
Original commit message:

Update AnyOf reduction code generation to only keep track of the AnyOf
property in a boolean vector in the loop, only selecting either the new
or start value in the middle block.

The patch incorporates feedback from https://reviews.llvm.org/D153697.

This fixes the #62565, as now there aren't multiple uses of the
start/new values.

Fixes https://github.com/llvm/llvm-project/issues/62565

PR: https://github.com/llvm/llvm-project/pull/78304
2024-05-03 14:40:49 +01:00
Matthew Devereau
6561fa3d02
[LoopUnswitch] Allow i1 truncs in loop unswitch (#89738)
With the addition of #84628, truncs to i1 are being emitted as
conditions to branch instructions. This caused significant regressions
in cases which were previously improved by loop unswitch. Adding truncs
to i1 restore the previous performance seen.
2024-04-29 15:17:48 +01:00
Arthur Eubanks
c6e01627ac Revert "Reapply "[LV] Improve AnyOf reduction codegen. (#78304)""
This reverts commit c6e38b928c56f562aea68a8e90f02dbdf0eada85.

Causes miscompiles, see comments on #78304.
2024-04-16 20:40:21 +00:00
Nikita Popov
91189afef5 Revert "[indvars] Missing variables at Og: (#69920)"
This reverts commit 739fa1c84b92b8af7dceedf2e5ad808a64e85a57.

This introduces a layering violation by using IR in Support headers.
2024-04-08 14:31:52 +09:00
Carlos Alberto Enciso
739fa1c84b
[indvars] Missing variables at Og: (#69920)
https://bugs.llvm.org/show_bug.cgi?id=51735
https://github.com/llvm/llvm-project/issues/51077

In the given test case:
 ```
4 ...
 5 void bar() {
 6   int End = 777;
 7   int Index = 27;
 8   char Var = 1;
 9   for (; Index < End; ++Index)
10     ;
11   nop(Index);
12 }
13 ...
```
Missing local variable `Index` after loop `Induction Variable Elimination`. When adding a breakpoint at line `11`, LLDB does not have information on the variable. But it has info on `Var` and `End`.
2024-04-08 05:31:56 +01:00
Florian Hahn
c6e38b928c
Reapply "[LV] Improve AnyOf reduction codegen. (#78304)"
This reverts the revert commit 589c7abb03448.

This patch includes a fix for any-of reductions and epilogue
vectorization. Extra test coverage for the issue that caused the revert
has been added in 399ff08e29d.

--------------------------------
Original commit message:

Update AnyOf reduction code generation to only keep track of the AnyOf
property in a boolean vector in the loop, only selecting either the new
or start value in the middle block.

The patch incorporates feedback from https://reviews.llvm.org/D153697.

This fixes the #62565, as now there aren't multiple uses of the
start/new values.

Fixes https://github.com/llvm/llvm-project/issues/62565

PR: https://github.com/llvm/llvm-project/pull/78304
2024-04-05 13:45:13 +01:00
Stephen Tozer
ffd08c7759
[RemoveDIs][NFC] Rename DPValue -> DbgVariableRecord (#85216)
This is the major rename patch that prior patches have built towards.
The DPValue class is being renamed to DbgVariableRecord, which reflects
the updated terminology for the "final" implementation of the RemoveDI
feature. This is a pure string substitution + clang-format patch. The
only manual component of this patch was determining where to perform
these string substitutions: `DPValue` and `DPV` are almost exclusively
used for DbgRecords, *except* for:

- llvm/lib/target, where 'DP' is used to mean double-precision, and so
appears as part of .td files and in variable names. NB: There is a
single existing use of `DPValue` here that refers to debug info, which
I've manually updated.
- llvm/tools/gold, where 'LDPV' is used as a prefix for symbol
visibility enums.

Outside of these places, I've applied several basic string
substitutions, with the intent that they only affect DbgRecord-related
identifiers; I've checked them as I went through to verify this, with
reasonable confidence that there are no unintended changes that slipped
through the cracks. The substitutions applied are all case-sensitive,
and are applied in the order shown:

```
  DPValue -> DbgVariableRecord
  DPVal -> DbgVarRec
  DPV -> DVR
```

Following the previous rename patches, it should be the case that there
are no instances of any of these strings that are meant to refer to the
general case of DbgRecords, or anything other than the DPValue class.
The idea behind this patch is therefore that pure string substitution is
correct in all cases as long as these assumptions hold.
2024-03-19 20:07:07 +00:00
Kirill Stoimenov
589c7abb03 Revert "[LV] Improve AnyOf reduction codegen. (#78304)"
Broke sanitizer bots: https://lab.llvm.org/buildbot/#/builders/74/builds/26697

This reverts commit 95fef1dfefd5467206e74c089d29806fcd82889b.
2024-03-14 14:57:01 +00:00
Stephen Tozer
2e865353ed
[RemoveDIs][NFC] Move DPValue::filter -> filterDbgVars (#85208)
This patch changes DPValue::filter to be a non-member method
filterDbgVars. There are two reasons for this: firstly, the name of
DPValue is about to change to DbgVariableRecord, which will result in
every `for` loop that uses DPValue::filter to require a line break. This
is a small thing, but it makes the rename patch more difficult to
review, and is just generally more awkward for what is a fairly common
loop. Secondly, the intent is to later break up the DPValue class into
subclasses, at which point it would be better to have a non-member
function that allows template arguments for the cases we want to filter
with greater specificity.
2024-03-14 12:19:15 +00:00
Florian Hahn
95fef1dfef
[LV] Improve AnyOf reduction codegen. (#78304)
Update AnyOf reduction code generation to only keep track of the AnyOf
property in a boolean vector in the loop, only selecting either the new
or start value in the middle block.

The patch incorporates feedback from https://reviews.llvm.org/D153697.

This fixes the #62565, as now there aren't multiple uses of the
start/new values.

Fixes https://github.com/llvm/llvm-project/issues/62565

PR: https://github.com/llvm/llvm-project/pull/78304
2024-03-14 11:22:06 +00:00
Stephen Tozer
15f3f446c5
[RemoveDIs][NFC] Rename common interface functions for DPValues->DbgRecords (#84793)
As part of the effort to rename the DbgRecord classes, this patch
renames the widely-used functions that operate on DbgRecords but refer
to DbgValues or DPValues in their names to refer to DbgRecords instead;
all such functions are defined in one of `BasicBlock.h`,
`Instruction.h`, and `DebugProgramInstruction.h`.

This patch explicitly does not change the names of any comments or
variables, except for where they use the exact name of one of the
renamed functions. The reason for this is reviewability; this patch can
be trivially examined to determine that the only changes are direct
string substitutions and any results from clang-format responding to the
changed line lengths. Future patches will cover renaming variables and
comments, and then renaming the classes themselves.
2024-03-12 14:53:13 +00:00
Patrick O'Neill
8d6e867eb2
[LSR][term-fold] Ensure the simple recurrence is from the current loop (#83085)
If the phi node found by matchSimpleRecurrence is not from the current
loop, then isAlmostDeadIV panics. With this patch we bail out early.

Signed-off-by: Patrick O'Neill <patrick@rivosinc.com>

---------

Signed-off-by: Patrick O'Neill <patrick@rivosinc.com>
2024-03-04 16:40:40 -08:00
Orlando Cazalet-Hyams
ababa96475
[RemoveDIs][NFC] Introduce DbgRecord base class [1/3] (#78252)
Patch 1 of 3 to add llvm.dbg.label support to the RemoveDIs project. The
patch stack adds a new base class

    -> 1. Add DbgRecord base class for DPValue and the not-yet-added
          DPLabel class.
       2. Add the DPLabel class.
       3. Enable dbg.label conversion and add support to passes.

Patches 1 and 2 are NFC.

In the near future we also will rename DPValue to DbgVariableRecord and
DPLabel to DbgLabelRecord, at which point we'll overhaul the function
names too. The name DPLabel keeps things consistent for now.
2024-02-20 16:00:55 +00:00
Simon Pilgrim
2ed2a3ad90 [Transforms][Utils] Add helpers to map between Reduction IntrinsicID and Arithmetic Instruction Opcode and MinMax IntrinsicID / RecurKind
Noticed on #81852
2024-02-16 11:20:34 +00:00
Jeremy Morse
ce1b24330e
[DebugInfo][RemoveDIs] Instrument loop-deletion for DPValues (#73042)
Loop deletion identifies dbg.value intrinsics in the loop, sets them to
undef/poison, and sinks them to the exit of the loop, to ensure that any
variable assignments that happen in a deleted loop are "optimised out".
This needs to be replicated for DPValues, the non-instruction
replacement for dbg.value intrinsics.

The movement API for DPValues is (deliberately) more limited than
dbg.values, which is tricky because inserting the collection of
dbg.values at an arbitrary iterator can insert a dbg.value in the middle
of a sequence of dbg.values. A big no-no for DPValues. This patch
replicates the order by inserting DPValues in reverse at the
head-iterator of the block, to ensure the same output as dbg.value mode.
Technically the order isn't important, but we're trying to ensure
identical outputs from optimisation passes right now.

Add more CHECK lines for dbg.values in diundef.ll to ensure that we
don't create any spurious dbg.values, and to ensure that sequences of
dbg.values come out of the optimisation in the correct order.
2023-11-23 12:24:15 +00:00
Florian Hahn
19e6d54188
[LV] Re-use existing compare if possible for diff checks.
SCEV simplifying the subtraction may result in redundant compares that
are all OR'd together. Keep track of the generated operands in
SeenCompares, with the key being the pair of operands for the compare.

If we alrady generated the same compare previously, skip it.
2023-11-23 11:35:21 +00:00
Florian Hahn
32d1197a8f
[LV] Use SCEV for subtraction of src/sink for diff runtime checks.
Instead of expanding the src/sink SCEV expressions and emitting an IR
sub to compute the difference, the subtraction can be directly be
performed by ScalarEvolution. This allows the subtraction to be
simplified by SCEV, which in turn can reduced the number of redundant
runtime check instructions generated.

It also allows to generate checks that are invariant w.r.t. an outer
loop, if he inner loop AddRecs have the same outer loop AddRec as start.
2023-11-22 12:48:04 +00:00
Florian Hahn
ead35564c0
[LoopUtils] Freeze compare results for diff checks instead of pointers.
THe freezes are introduced to avoid branch on undef/poison, if any of
the pointers may be poison. The same can be achieved by just freezing
the compare, which reduces the number of freezes needed. See
https://alive2.llvm.org/ce/z/NHa_ud

Note that the individual compares need to be frozen and it is not
sufficient to only freeze the resulting OR:

Result OR frozen only (UNSOUND): https://alive2.llvm.org/ce/z/YzFHQY
Individual conds frozen (SOUND): https://alive2.llvm.org/ce/z/5L6Z3f
2023-11-21 10:54:36 +00:00
Simon Pilgrim
3ca4fe80d4 [Transforms] Use StringRef::starts_with/ends_with instead of startswith/endswith. NFC.
startswith/endswith wrap starts_with/ends_with and will eventually go away (to more closely match string_view)
2023-11-06 16:50:18 +00:00
David Sherwood
07f0e75b53
[LoopVectorize] Fix bug with code to hoist runtime checks (#70937)
There was a silly mistake in the expandBounds function that was using
the wrong type when calling expandCodeFor and always assuming the stride
is 64 bits. I've added the following test to defend this fix:

Transforms/LoopVectorize/ARM/mve-hoist-runtime-checks.ll
2023-11-02 10:02:50 +00:00
Mel Chen
26aed5b9a8 [VPlan][LoopUtils] Remove unused parameter TTI
This patch removes the member TTI from VPReductionRecipe, as the
generation of reduction operations no longer requires TTI.

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D158148
2023-09-04 05:30:37 -07:00
Fangrui Song
111fcb0df0 [llvm] Fix duplicate word typos. NFC
Those fixes were taken from https://reviews.llvm.org/D137338
2023-09-01 18:25:16 -07:00
David Sherwood
c02184f286 [LoopVectorize] Allow inner loop runtime checks to be hoisted above an outer loop
Suppose we have a nested loop like this:

  void foo(int32_t *dst, int32_t *src, int m, int n) {
    for (int i = 0; i < m; i++) {
      for (int j = 0; j < n; j++) {
        dst[(i * n) + j] += src[(i * n) + j];
      }
    }
  }

We currently generate runtime memory checks as a precondition for
entering the vectorised version of the inner loop. However, if the
runtime-determined trip count for the inner loop is quite small
then the cost of these checks becomes quite expensive. This patch
attempts to mitigate these costs by adding a new option to
expand the memory ranges being checked to include the outer loop
as well. This leads to runtime checks that can then be hoisted
above the outer loop. For example, rather than looking for a
conflict between the memory ranges:

1. &dst[(i * n)] -> &dst[(i * n) + n]
2. &src[(i * n)] -> &src[(i * n) + n]

we can instead look at the expanded ranges:

1. &dst[0] -> &dst[((m - 1) * n) + n]
2. &src[0] -> &src[((m - 1) * n) + n]

which are outer-loop-invariant. As with many optimisations there
is a trade-off here, because there is a danger that using the
expanded ranges we may never enter the vectorised inner loop,
whereas with the smaller ranges we might enter at least once.

I have added a HoistRuntimeChecks option that is turned off by
default, but can be enabled for workloads where we know this is
guaranteed to be of real benefit. In future, we can also use
PGO to determine if this is worthwhile by using the inner loop
trip count information.

When enabling this option for SPEC2017 on neoverse-v1 with the
flags "-Ofast -mcpu=native -flto" I see an overall geomean
improvement of ~0.5%:

SPEC2017 results (+ is an improvement, - is a regression):
520.omnetpp: +2%
525.x264: +2%
557.xz: +1.2%
...
GEOMEAN: +0.5%

I didn't investigate all the differences to see if they are
genuine or noise, but I know the x264 improvement is real because
it has some hot nested loops with low trip counts where I can
see this hoisting is beneficial.

Tests have been added here:

  Transforms/LoopVectorize/runtime-checks-hoist.ll

Differential Revision: https://reviews.llvm.org/D152366
2023-08-24 12:14:02 +00:00