347 Commits

Author SHA1 Message Date
Alexis Engelke
0d05c882ce
[Support] Use block numbers for LoopInfo BBMap (#103400)
Replace the DenseMap from blocks to their innermost loop a vector
indexed by block numbers, when possible. Supporting number updates is
not trivial as we don't store a list of basic blocks, so this is not
implemented.

NB: I'm generally not happy with the way loops are stored. As I think
that there's room for improvement, I don't want to touch the
representation at this point.

Pull Request: https://github.com/llvm/llvm-project/pull/103400
2026-03-19 11:18:06 +01:00
Alexis Engelke
5b4015e559
[Transforms][NFC] Drop uses of BranchInst in headers (#186580)
Replace BranchInst with CondBrInst/UncondBrInst/Instruction in headers
and handle the related fall out.

The removed code in simplifyUncondBranch was made dead in
0895b836d74ed333468ddece2102140494eb33b6, where FoldBranchToCommonDest
was changed to only handle conditional branches.
2026-03-14 11:03:33 +00:00
Florian Hahn
2dcf858ba0
[LAA] Use SCEVPtrToAddr in tryToCreateDiffChecks. (#178861)
The checks created by LAA only compute a pointer difference and do not
need to capture provenance. Use SCEVPtrToAddr instead of SCEVPtrToInt
for computations.

To avoid regressions while parts of SCEV are migrated to use PtrToAddr
this adds logic to rewrite all PtrToInt to PtrToAddr if possible in the
created expressions. This is needed to avoid regressions.

Similarly, if in the original IR we have a PtrToInt, SCEVExpander tries
to re-use it if possible when expanding PtrToAddr.

Depends on https://github.com/llvm/llvm-project/pull/178727.

Fixes https://github.com/llvm/llvm-project/issues/156978.

PR: https://github.com/llvm/llvm-project/pull/178861
2026-02-11 11:51:51 +00:00
Walter Lee
07ee61d59e
[LoopUnroll] Fix unused variable warning (#178490)
Fixes 362c39d36dd87c5659b0caa3115dfa67f592cdf6.
2026-01-28 19:38:46 +00:00
Marek Sedláček
362c39d36d
[LoopUnroll] Use branch probability in multi-exit loop unrolling (#164799)
This patch improves multi-exit loop unrolling by taking into account
branch probability and not only other exit being deopting one.

This implementation uses branch metadata directly because of unstable
state of BPI in this part of code (runtime unrolling invalidates the
state of the map and using BPI in my tests has caused errors).
If branch probability metadata are not present then the current deopt
heuristic is still used.

---------

Co-authored-by: Marek Sedlacek <msedlacek@azul.com>
2026-01-28 11:12:34 -05:00
serge-sans-paille
84cccfc828
[perf] Replace copy-assign by move-assign in llvm/lib/Transforms/* (#178178) 2026-01-27 16:29:35 +00:00
Graham Hunter
2abd6d6d7a
[LV] Vectorize conditional scalar assignments (#158088)
Based on Michael Maitland's previous work:
https://github.com/llvm/llvm-project/pull/121222

This PR uses the existing recurrences code instead of introducing a
new pass just for CSA autovec. I've also made recipes that are more
generic.
2026-01-14 14:59:18 +00:00
Mircea Trofin
f9c561b561
[profcheck] Fix encoding of 0 loopEstimatedTrip count (#174896)
We currently encode an estimated trip count of 0 as the latch having branch probabilities 0-0. That's an invalid pair of weights. The probability of a branch is computed as a fraction of its corresponding weight and the sum of the weights. In fact, `BranchProbabilityInfo::calcMetadataWeights` will convert this to a 1-1, meaning 50% - 50%, which isn't quite what we want. To indicate the loop is never taken, we just need to initialize the exit probability to non-zero (hence, 1)

Related: https://reviews.llvm.org/D67905

Issue #147390
2026-01-12 17:12:35 -08:00
Florian Hahn
188507e542
[VPlan] Inline createFindLastIVReduction into its only caller. (NFC)
createFindLastIVReduction is only used for generating code for
ComputeFindIVResult. Inline the code there, in preparation for
https://github.com/llvm/llvm-project/pull/172569.
2026-01-04 13:31:47 +00:00
Victor Chernyakin
c438773432
[LLVM][ADT] Migrate users of make_scope_exit to CTAD (#174030)
This is a followup to #173131, which introduced the CTAD functionality.
2026-01-02 20:42:56 -08:00
Craig Topper
f98cc40b52
[LoopDeletion] Check for uses in unreachable basic blocks even when there is no exit block. (#173428)
Fixes #173357
2025-12-30 09:00:09 -08:00
Joel E. Denny
b8ef25aa64
[PGO] Fix zeroed estimated trip count (#167792)
Before PR #152775, `llvm::getLoopEstimatedTripCount` never returned 0.
If `llvm::setLoopEstimatedTripCount` were called with 0, it would zero
branch weights, causing `llvm::getLoopEstimatedTripCount` to return
`std::nullopt`.

PR #152775 changed that behavior: if `llvm::setLoopEstimatedTripCount`
is called with 0, it sets `llvm.loop.estimated_trip_count` to 0, causing
`llvm::getLoopEstimatedTripCount` to return 0. However, it kept
documentation saying `llvm::getLoopEstimatedTripCount` returns a
positive count.

Some passes continue to assume `llvm::getLoopEstimatedTripCount` never
returns 0 and crash if it does, as reported in issue #164254. To restore
the behavior they expect, this patch changes
`llvm::getLoopEstimatedTripCount` to return `std::nullopt` when
`llvm.loop.estimated_trip_count` is 0.
2025-11-25 11:05:13 -05:00
Joel E. Denny
bb9bd5f263
[LoopUnroll] Fix assert fail on zeroed branch weights (#165938)
BranchProbability fails an assert when its denominator is zero.

Reported at
<https://github.com/llvm/llvm-project/pull/159163#pullrequestreview-3406318423>.
2025-11-03 10:19:12 -05:00
Joel E. Denny
cc8ff73fba
[LoopUnroll] Fix block frequencies for epilogue (#159163)
As another step in issue #135812, this patch fixes block frequencies for
partial loop unrolling with an epilogue remainder loop. It does not
fully handle the case when the epilogue loop itself is unrolled. That
will be handled in the next patch.

For the guard and latch of each of the unrolled loop and epilogue loop,
this patch sets branch weights derived directly from the original loop
latch branch weights. The total frequency of the original loop body,
summed across all its occurrences in the unrolled loop and epilogue
loop, is the same as in the original loop. This patch also sets
`llvm.loop.estimated_trip_count` for the epilogue loop instead of
relying on the epilogue's latch branch weights to imply it.

This patch fixes branch weights in tests that PR #157754 adversely
affected.
2025-10-31 11:01:42 -04:00
Florian Hahn
50b9ca4dda
[VPlan] Simplify Plan's entry in removeBranchOnConst. (#154510)
After https://github.com/llvm/llvm-project/pull/153643, there may be a
BranchOnCond with constant condition in the entry block.

Simplify those in removeBranchOnConst. This removes a number of
redundant conditional branch from entry blocks.

In some cases, it may also make the original scalar loop unreachable,
because we know it will never execute. In that case, we need to remove
the loop from LoopInfo, because all unreachable blocks may dominate each
other, making LoopInfo invalid. In those cases, we can also completely
remove the loop, for which I'll share a follow-up patch.

Depends on https://github.com/llvm/llvm-project/pull/153643.

PR: https://github.com/llvm/llvm-project/pull/154510
2025-09-18 19:25:05 +01:00
Joel E. Denny
0e3c5566c0
[PGO] Add llvm.loop.estimated_trip_count metadata (#152775)
This patch implements the `llvm.loop.estimated_trip_count` metadata
discussed in [[RFC] Fix Loop Transformations to Preserve Block
Frequencies](https://discourse.llvm.org/t/rfc-fix-loop-transformations-to-preserve-block-frequencies/85785).
As the RFC explains, that metadata enables future patches, such as PR
#128785, to fix block frequency issues without losing estimated trip
counts.
2025-09-11 15:55:18 -04:00
Ramkumar Ramachandra
5544afd253
[LoopUtils] Simplify expanded RT-checks (#157518)
Follow up on 528b13d ([SCEVExp] Add helper to clean up dead instructions
after expansion.) to hoist the SCEVExapnder::eraseDeadInstructions call
from LoopVectorize into the LoopUtils APIs add[Diff]RuntimeChecks, so
that other callers (LoopDistribute and LoopVersioning) can benefit from
the patch.
2025-09-09 11:38:54 +00:00
Rajveer Singh Bharadwaj
93c96849c8
[VectorCombine] New folding pattern for extract/binop/shuffle chains (#145232)
Resolves #144654
Part of #143088

This adds a new `foldShuffleChainsToReduce` for horizontal reduction of
patterns like:

```llvm
define i16 @test_reduce_v8i16(<8 x i16> %a0) local_unnamed_addr #0 {
  %1 = shufflevector <8 x i16> %a0, <8 x i16> poison, <8 x i32> <i32 4, i32 5, i32 6, i32 7, i32 poison, i32 poison, i32 poison, i32 poison>
  %2 = tail call <8 x i16> @llvm.umin.v8i16(<8 x i16> %a0, <8 x i16> %1)
  %3 = shufflevector <8 x i16> %2, <8 x i16> poison, <8 x i32> <i32 2, i32 3, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison>
  %4 = tail call <8 x i16> @llvm.umin.v8i16(<8 x i16> %2, <8 x i16> %3)
  %5 = shufflevector <8 x i16> %4, <8 x i16> poison, <8 x i32> <i32 1, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison>
  %6 = tail call <8 x i16> @llvm.umin.v8i16(<8 x i16> %4, <8 x i16> %5)
  %7 = extractelement <8 x i16> %6, i64 0
  ret i16 %7
}
```
...which can be reduced to a llvm.vector.reduce.umin.v8i16(%a0)
intrinsic call.

Similar transformation for other ops when costs permit to do so.
2025-08-24 14:21:48 +05:30
Sam Tebbs
0bfa1718af
[LV] Create in-loop sub reductions (#147026)
This PR allows the loop vectorizer to handle in-loop sub reductions by
forming a normal in-loop add reduction with a negated input.

Stacked PRs:
1. -> https://github.com/llvm/llvm-project/pull/147026
2. https://github.com/llvm/llvm-project/pull/147255
3. https://github.com/llvm/llvm-project/pull/147302
4. https://github.com/llvm/llvm-project/pull/147513
2025-08-12 10:22:41 +01:00
Joel E. Denny
37e03b56b8
Revert "[PGO] Add llvm.loop.estimated_trip_count metadata" (#151585)
Reverts llvm/llvm-project#148758

[As
requested.](https://github.com/llvm/llvm-project/pull/148758#pullrequestreview-3076627201)
2025-07-31 15:56:31 -04:00
Joel E. Denny
a85c725952 Revert "[Utils] Fix a warning"
This reverts commit 3a18fe33f0763cd9276c99c276448412100f6270.

So that we can revert PR #148758.
2025-07-31 15:54:01 -04:00
Kazu Hirata
3a18fe33f0 [Utils] Fix a warning
This patch fixes:

  llvm/lib/Transforms/Utils/LoopUtils.cpp:818:28: error: unused
  function 'operator<<' [-Werror,-Wunused-function]
2025-07-31 11:24:33 -07:00
Joel E. Denny
f7b65011de
[PGO] Add llvm.loop.estimated_trip_count metadata (#148758)
This patch implements the `llvm.loop.estimated_trip_count` metadata
discussed in [[RFC] Fix Loop Transformations to Preserve Block
Frequencies](https://discourse.llvm.org/t/rfc-fix-loop-transformations-to-preserve-block-frequencies/85785).
As [suggested in the RFC
comments](https://discourse.llvm.org/t/rfc-fix-loop-transformations-to-preserve-block-frequencies/85785/4),
it adds the new metadata to all loops at the time of profile ingestion
and estimates each trip count from the loop's `branch_weights` metadata.
As [suggested in the PR #128785
review](https://github.com/llvm/llvm-project/pull/128785#discussion_r2151091036),
it does so via a new `PGOEstimateTripCountsPass` pass, which creates the
new metadata for each loop but omits the value if it cannot estimate a
trip count due to the loop's form.

An important observation not previously discussed is that
`PGOEstimateTripCountsPass` *often* cannot estimate a loop's trip count,
but later passes can sometimes transform the loop in a way that makes it
possible. Currently, such passes do not necessarily update the metadata,
but eventually that should be fixed. Until then, if the new metadata has
no value, `llvm::getLoopEstimatedTripCount` disregards it and tries
again to estimate the trip count from the loop's current
`branch_weights` metadata.
2025-07-31 12:28:25 -04:00
Florian Hahn
004c67ea25
[LV] Vectorize maxnum/minnum w/o fast-math flags. (#148239)
Update LV to vectorize maxnum/minnum reductions without fast-math flags,
by adding an extra check in the loop if any inputs to maxnum/minnum are
NaN, due to maxnum/minnum behavior w.r.t to signaling NaNs. Signed-zeros 
are already handled consistently by maxnum/minnum.

If any input is NaN,
 *exit the vector loop,
 *compute the reduction result up to the vector iteration that contained
   NaN inputs and
 * resume in the scalar loop


New recurrence kinds are added for reductions using maxnum/minnum
without fast-math flags.

PR: https://github.com/llvm/llvm-project/pull/148239
2025-07-18 21:58:19 +01:00
Austin
a550fef906
[llvm] Use llvm::fill instead of std::fill(NFC) (#146911)
Use llvm::fill instead of std::fill
2025-07-04 14:10:28 +08:00
Florian Hahn
20fbbd7675
[LV] Add support for cmp reductions with decreasing IVs. (#140451)
Similar to FindLastIV, add FindFirstIVSMin to support select (icmp(), x, y)
reductions where one of x or y is a decreasing induction, producing a SMin
 reduction. It uses signed max as sentinel value.

PR: https://github.com/llvm/llvm-project/pull/140451
2025-06-29 11:17:03 +01:00
Ramkumar Ramachandra
bb8c42e859
[LV] Extend FindLastIV to unsigned case (#141752)
Split the FindLastIV RecurKind into SMax and UMax variants, depending on
the reduction op produced.
2025-06-23 15:27:49 +01:00
Philip Reames
8ee9646b06
[LV] Simplify creation of vp.load/vp.store/vp.reduce intrinsics (#143804)
The use of VectorBuilder here was simply obscuring what was actually
going on. For vp.load and vp.store, the resulting code is significantly
more idiomatic. For the vp.reduce cases, we remove several layers of
indirection, including passing parameters via implicit state on the
builder. In both cases, the code is significantly easier to follow.
2025-06-12 13:46:06 -07:00
Jeremy Morse
459475020a Reapply 76197ea6f91f after removing an assertion
Specifically this is the assertion in BasicBlock.cpp. Now that we're not
examining or setting that flag consistently (because it'll be deleted in
about an hour) there's no need to keep this assertion.

Original commit title:

[DebugInfo][RemoveDIs] Remove some debug intrinsic-only codepaths (#143451)
2025-06-11 17:35:29 +01:00
Jeremy Morse
76197ea6f9 Revert "[DebugInfo][RemoveDIs] Remove some debug intrinsic-only codepaths (#143451)"
This reverts commit c71a2e688828ab3ede4fb54168a674ff68396f61.

/me squints -- this is hitting an assertion I thought had been deleted,
will revert and investigate for a bit.
2025-06-11 14:52:17 +01:00
Jeremy Morse
c71a2e6888
[DebugInfo][RemoveDIs] Remove some debug intrinsic-only codepaths (#143451)
These are opportunistic deletions as more places that make use of the
IsNewDbgInfoFormat flag are removed. It should (TM)(R) all be dead code
now that `IsNewDbgInfoFormat` should be true everywhere.

FastISel: we don't need to do debug-aware instruction counting any more,
because there are no debug instructions,
Autoupgrade: you can no-longer avoid autoupgrading of intrinsics to
records
DIBuilder: Delete the code for creating debug intrinsics (!)
LoopUtils: No need to handle debug instructions, they don't exist
2025-06-11 14:43:15 +01:00
Andrew Rogers
b2584e0b17
[llvm] annotate interfaces in llvm/Transforms for DLL export (#143413)
## Purpose

This patch is one in a series of code-mods that annotate LLVM’s public
interface for export. This patch annotates the `llvm/Transforms`
library. These annotations currently have no meaningful impact on the
LLVM build; however, they are a prerequisite to support an LLVM Windows
DLL (shared library) build.

## Background

This effort is tracked in #109483. Additional context is provided in
[this
discourse](https://discourse.llvm.org/t/psa-annotating-llvm-public-interface/85307),
and documentation for `LLVM_ABI` and related annotations is found in the
LLVM repo
[here](https://github.com/llvm/llvm-project/blob/main/llvm/docs/InterfaceExportAnnotations.rst).

The bulk of these changes were generated automatically using the
[Interface Definition Scanner (IDS)](https://github.com/compnerd/ids)
tool, followed formatting with `git clang-format`.

The following manual adjustments were also applied after running IDS on
Linux:
- Removed a redundant `operator<<` from Attributor.h. IDS only
auto-annotates the 1st declaration, and the 2nd declaration being
un-annotated resulted in an "inconsistent linkage" error on Windows when
building LLVM as a DLL.
- `#include` the `VirtualFileSystem.h` in PGOInstrumentation.h and
remove the local declaration of the `vfs::FileSystem` class. This is
required because exporting the `PGOInstrumentationUse` constructor
requires the class be fully defined because it is used by an argument.
- Add #include "llvm/Support/Compiler.h" to files where it was not
auto-added by IDS due to no pre-existing block of include statements.
- Add `LLVM_TEMPLATE_ABI` and `LLVM_EXPORT_TEMPLATE` to exported
instantiated templates.

## Validation

Local builds and tests to validate cross-platform compatibility. This
included llvm, clang, and lldb on the following configurations:

- Windows with MSVC
- Windows with Clang
- Linux with GCC
- Linux with Clang
- Darwin with Clang
2025-06-10 08:10:17 -07:00
Florian Hahn
249301c779
[LoopUtils] Pass sentinel value directly to createFindLastIVRed (NFC).
Now that there is only a single FindLastIV recurrence kind, simply pass
the sentinel value instead of the full recurrence descriptor to tighten
the interface.
2025-05-28 22:00:11 +01:00
Florian Hahn
0d7b34bfc1
[LoopUtils] Pass start value directly to createAnyOfReduction (NFC).
Now that there is only a single AnyOf recurrence kind, simply pass the
start value instead of the full recurrence descriptor, to tighten the
interface.
2025-05-28 21:28:02 +01:00
Florian Hahn
ec1016f7ef
[IVDescriptors] Support reductions with minimumnum/maximumnum. (#137335)
Add a new reduction recurrence kind for reductions with
minimumnum/maximumnum. Such reductions can be vectorized without
nsz/nnans, same as reductions with maximum/minimum intrinsics.

Note that a new reduction kind is needed to make sure partial reductions
are also combined with minimumnum/maximumnum.

Note that the final reduction to a scalar value is performed with
vector.reduce.fmin/fmax. This should be fine, as the results of the
partial reductions with maximumnum/minimumnum silences any sNaNs.

In-loop and reductions in SLP are not supported yet, as there's no
reduction version of maximumnum/minimumnum yet and fmax may be
incorrect.

PR: https://github.com/llvm/llvm-project/pull/137335
2025-04-28 11:16:36 +01:00
Florian Hahn
8ddbc01295
[VPlan] Manage FindLastIV start value in ComputeFindLastIVResult (NFC) (#132690)
Keep the start value as operand of ComputeFindLastIVResult. A follow-up
patch will use this to make sure the start value is frozen if needed.

Depends on https://github.com/llvm/llvm-project/pull/132689

PR: https://github.com/llvm/llvm-project/pull/132690
2025-03-27 18:34:13 +00:00
Kazu Hirata
41b76119ec
[llvm] Use range constructors for *Set (NFC) (#132636) 2025-03-23 15:50:34 -07:00
Kazu Hirata
fae34938f6
[llvm] Use *Set::insert_range (NFC) (#132591)
DenseSet, SmallPtrSet, SmallSet, SetVector, and StringSet recently
gained C++23-style insert_range.  This patch uses insert_range with
iterator ranges.  For each case, I've verified that foos is defined as
make_range(foo_begin(), foo_end()) or in a similar manner.
2025-03-22 22:14:45 -07:00
Luke Lau
01f04252b6
[LV] Get FMFs from VectorBuilder in createSimpleReduction. NFC (#132017)
The other createSimpleReduction takes the FMFs from the IRBuilder, so
this aligns the VectorBuilder variant to do the same and reduce the
possibility of there being a mismatch in flags.
2025-03-20 16:38:56 +08:00
Luke Lau
f536f71580
[LV] Split RecurrenceDescriptor into RecurKind + FastMathFlags in LoopUtils. NFC (#132014)
Split off from #131300, this splits up RecurrenceDescriptor arguments so
that arbitrary recurrence kinds may be used down the line.
2025-03-19 22:56:57 +08:00
Luke Lau
67f1c033b8
[VPlan] Remove createReduction. NFCI (#131336)
This is split off from #131300.

A VPReductionRecipe will never have a AnyOf or FindLastIV recurrence, so
when it calls createReduction it always calls createSimpleReduction.

If we replace the call then it leaves createReduction with one user in
VPInstruction::ComputeReductionResult, which we can inline and then
remove.
2025-03-18 00:18:15 +08:00
Ramkumar Ramachandra
c9e250af8e
[LoopUtils] Rename a var in addDiffRuntimeChecks (NFC) (#130128) 2025-03-06 19:31:18 +00:00
Ramkumar Ramachandra
03da079968
[LoopUtils] Saturate at INT_MAX when estimating TC (#129683)
getLoopEstimatedTripCount returns std::nullopt when the trip count would
overflow the return type, but since it is an estimate anyway, we might
as well saturate at UINT_MAX, improving results.
2025-03-05 18:19:39 +00:00
Ramkumar Ramachandra
80bdfcd411
[LoopUtils] Don't wrap in getLoopEstimatedTripCount (#129080)
getLoopEstimatedTripCount returns the trip count based on profiling
data, and its documentation says that it could return 0 when the trip
count is zero, but this is not the case: a valid trip count can never be
zero, and it returns 0 when the unsigned ExitCount is incremented by 1
and wraps. Some callers are careful about checking for this zero value
in an std::optional, but it makes for an API with footguns, as a
std::optional return value indicates that a non-nullopt value would be a
valid trip count. Fix this by explicitly returning std::nullopt when the
return value would wrap, and strip additional checks in callers. This
also fixes a minor bug in LoopVectorize.
2025-03-04 08:43:08 +00:00
Mikhail Gudim
f5d153ef26
[VectorCombine] Fold binary op of reductions. (#121567)
Replace binary of of two reductions with one reduction of the binary op
applied to vectors. For example:

```
%v0_red = tail call i32 @llvm.vector.reduce.add.v16i32(<16 x i32> %v0)
%v1_red = tail call i32 @llvm.vector.reduce.add.v16i32(<16 x i32> %v1)
%res = add i32 %v0_red, %v1_red
```
gets transformed to:

```
%1 = add <16 x i32> %v0, %v1
%res = call i32 @llvm.vector.reduce.add.v16i32(<16 x i32> %1)
```
2025-02-22 06:11:33 -05:00
Mel Chen
b3cba9be41
[LoopVectorize] Vectorize select-cmp reduction pattern for increasing integer induction variable (#67812)
Consider the following loop:
```
  int rdx = init;
  for (int i = 0; i < n; ++i)
    rdx = (a[i] > b[i]) ? i : rdx;
```
We can vectorize this loop if `i` is an increasing induction variable.
The final reduced value will be the maximum of `i` that the condition
`a[i] > b[i]` is satisfied, or the start value `init`.

This patch added new RecurKind enums - IFindLastIV and FFindLastIV.

---------

Co-authored-by: Alexey Bataev <5361294+alexey-bataev@users.noreply.github.com>
2024-12-12 16:48:31 +08:00
Joshua Cao
0bc98349c8
[LICM] Use DomTreeUpdater version of SplitBlockPredecessors, nfc (#107190)
The DominatorTree version is marked for deprecation, so we use the
DomTreeUpdater version. We also update sinkRegion() to iterate over
basic blocks instead of DomTreeNodes. The loop body calls
SplitBlockPredecessors. The DTU version calls
DomTreeUpdater::apply_updates(), which may call DominatorTree::reset().
This invalidates the worklist of DomTreeNodes to iterate over.
2024-09-29 21:28:45 -07:00
Philip Reames
3d9abfc9f8 Consolidate all IR logic for getting the identity value of a reduction [nfc]
This change merges the three different places (at the IR layer) for
finding the identity value of a reduction into a single copy.  This
depends on several prior commits which fix ommissions and bugs in
the distinct copies, but this patch itself should be fully
non-functional.

As the new comments and naming try to make clear, the identity value
is a property of the @llvm.vector.reduce.* intrinsic, not of e.g.
the recurrence descriptor.  (We still provide an interface for
clients using recurrence descriptors, but the implementation simply
translates to the intrinsic which each corresponds to.)

As a note, the getIntrinsicIdentity API does not support fminnum/fmaxnum
or fminimum/fmaximum which is why we still need manual logic (but at
least only one copy of manual logic) for those cases.
2024-09-04 08:23:21 -07:00
Philip Reames
3e8840ba71 Remove "Target" from createXReduction naming [nfc]
Despite the stale comments, none of these actually use TTI, and they're
solely generating standard LLVM IR.
2024-09-03 17:03:55 -07:00
Philip Reames
2c7786e94a
Prefer use of 0.0 over -0.0 for fadd reductions w/nsz (in IR) (#106770)
This is a follow up to 924907bc6, and is mostly motivated by consistency
but does include one additional optimization. In general, we prefer 0.0
over -0.0 as the identity value for an fadd. We use that value in
several places, but don't in others. So, let's be consistent and use the
same identity (when nsz allows) everywhere.

This creates a bunch of test churn, but due to 924907bc6, most of that
churn doesn't actually indicate a change in codegen. The exception is
that this change enables the use of 0.0 for nsz, but *not* reasoc, fadd
reductions. Or said differently, it allows the neutral value of an
ordered fadd reduction to be 0.0.
2024-09-03 09:16:37 -07:00