This patch implements the `llvm.loop.estimated_trip_count` metadata
discussed in [[RFC] Fix Loop Transformations to Preserve Block
Frequencies](https://discourse.llvm.org/t/rfc-fix-loop-transformations-to-preserve-block-frequencies/85785).
As [suggested in the RFC
comments](https://discourse.llvm.org/t/rfc-fix-loop-transformations-to-preserve-block-frequencies/85785/4),
it adds the new metadata to all loops at the time of profile ingestion
and estimates each trip count from the loop's `branch_weights` metadata.
As [suggested in the PR #128785
review](https://github.com/llvm/llvm-project/pull/128785#discussion_r2151091036),
it does so via a new `PGOEstimateTripCountsPass` pass, which creates the
new metadata for each loop but omits the value if it cannot estimate a
trip count due to the loop's form.
An important observation not previously discussed is that
`PGOEstimateTripCountsPass` *often* cannot estimate a loop's trip count,
but later passes can sometimes transform the loop in a way that makes it
possible. Currently, such passes do not necessarily update the metadata,
but eventually that should be fixed. Until then, if the new metadata has
no value, `llvm::getLoopEstimatedTripCount` disregards it and tries
again to estimate the trip count from the loop's current
`branch_weights` metadata.
Update LV to vectorize maxnum/minnum reductions without fast-math flags,
by adding an extra check in the loop if any inputs to maxnum/minnum are
NaN, due to maxnum/minnum behavior w.r.t to signaling NaNs. Signed-zeros
are already handled consistently by maxnum/minnum.
If any input is NaN,
*exit the vector loop,
*compute the reduction result up to the vector iteration that contained
NaN inputs and
* resume in the scalar loop
New recurrence kinds are added for reductions using maxnum/minnum
without fast-math flags.
PR: https://github.com/llvm/llvm-project/pull/148239
Similar to FindLastIV, add FindFirstIVSMin to support select (icmp(), x, y)
reductions where one of x or y is a decreasing induction, producing a SMin
reduction. It uses signed max as sentinel value.
PR: https://github.com/llvm/llvm-project/pull/140451
The use of VectorBuilder here was simply obscuring what was actually
going on. For vp.load and vp.store, the resulting code is significantly
more idiomatic. For the vp.reduce cases, we remove several layers of
indirection, including passing parameters via implicit state on the
builder. In both cases, the code is significantly easier to follow.
Specifically this is the assertion in BasicBlock.cpp. Now that we're not
examining or setting that flag consistently (because it'll be deleted in
about an hour) there's no need to keep this assertion.
Original commit title:
[DebugInfo][RemoveDIs] Remove some debug intrinsic-only codepaths (#143451)
This reverts commit c71a2e688828ab3ede4fb54168a674ff68396f61.
/me squints -- this is hitting an assertion I thought had been deleted,
will revert and investigate for a bit.
These are opportunistic deletions as more places that make use of the
IsNewDbgInfoFormat flag are removed. It should (TM)(R) all be dead code
now that `IsNewDbgInfoFormat` should be true everywhere.
FastISel: we don't need to do debug-aware instruction counting any more,
because there are no debug instructions,
Autoupgrade: you can no-longer avoid autoupgrading of intrinsics to
records
DIBuilder: Delete the code for creating debug intrinsics (!)
LoopUtils: No need to handle debug instructions, they don't exist
## Purpose
This patch is one in a series of code-mods that annotate LLVM’s public
interface for export. This patch annotates the `llvm/Transforms`
library. These annotations currently have no meaningful impact on the
LLVM build; however, they are a prerequisite to support an LLVM Windows
DLL (shared library) build.
## Background
This effort is tracked in #109483. Additional context is provided in
[this
discourse](https://discourse.llvm.org/t/psa-annotating-llvm-public-interface/85307),
and documentation for `LLVM_ABI` and related annotations is found in the
LLVM repo
[here](https://github.com/llvm/llvm-project/blob/main/llvm/docs/InterfaceExportAnnotations.rst).
The bulk of these changes were generated automatically using the
[Interface Definition Scanner (IDS)](https://github.com/compnerd/ids)
tool, followed formatting with `git clang-format`.
The following manual adjustments were also applied after running IDS on
Linux:
- Removed a redundant `operator<<` from Attributor.h. IDS only
auto-annotates the 1st declaration, and the 2nd declaration being
un-annotated resulted in an "inconsistent linkage" error on Windows when
building LLVM as a DLL.
- `#include` the `VirtualFileSystem.h` in PGOInstrumentation.h and
remove the local declaration of the `vfs::FileSystem` class. This is
required because exporting the `PGOInstrumentationUse` constructor
requires the class be fully defined because it is used by an argument.
- Add #include "llvm/Support/Compiler.h" to files where it was not
auto-added by IDS due to no pre-existing block of include statements.
- Add `LLVM_TEMPLATE_ABI` and `LLVM_EXPORT_TEMPLATE` to exported
instantiated templates.
## Validation
Local builds and tests to validate cross-platform compatibility. This
included llvm, clang, and lldb on the following configurations:
- Windows with MSVC
- Windows with Clang
- Linux with GCC
- Linux with Clang
- Darwin with Clang
Now that there is only a single FindLastIV recurrence kind, simply pass
the sentinel value instead of the full recurrence descriptor to tighten
the interface.
Now that there is only a single AnyOf recurrence kind, simply pass the
start value instead of the full recurrence descriptor, to tighten the
interface.
Add a new reduction recurrence kind for reductions with
minimumnum/maximumnum. Such reductions can be vectorized without
nsz/nnans, same as reductions with maximum/minimum intrinsics.
Note that a new reduction kind is needed to make sure partial reductions
are also combined with minimumnum/maximumnum.
Note that the final reduction to a scalar value is performed with
vector.reduce.fmin/fmax. This should be fine, as the results of the
partial reductions with maximumnum/minimumnum silences any sNaNs.
In-loop and reductions in SLP are not supported yet, as there's no
reduction version of maximumnum/minimumnum yet and fmax may be
incorrect.
PR: https://github.com/llvm/llvm-project/pull/137335
DenseSet, SmallPtrSet, SmallSet, SetVector, and StringSet recently
gained C++23-style insert_range. This patch uses insert_range with
iterator ranges. For each case, I've verified that foos is defined as
make_range(foo_begin(), foo_end()) or in a similar manner.
The other createSimpleReduction takes the FMFs from the IRBuilder, so
this aligns the VectorBuilder variant to do the same and reduce the
possibility of there being a mismatch in flags.
This is split off from #131300.
A VPReductionRecipe will never have a AnyOf or FindLastIV recurrence, so
when it calls createReduction it always calls createSimpleReduction.
If we replace the call then it leaves createReduction with one user in
VPInstruction::ComputeReductionResult, which we can inline and then
remove.
getLoopEstimatedTripCount returns std::nullopt when the trip count would
overflow the return type, but since it is an estimate anyway, we might
as well saturate at UINT_MAX, improving results.
getLoopEstimatedTripCount returns the trip count based on profiling
data, and its documentation says that it could return 0 when the trip
count is zero, but this is not the case: a valid trip count can never be
zero, and it returns 0 when the unsigned ExitCount is incremented by 1
and wraps. Some callers are careful about checking for this zero value
in an std::optional, but it makes for an API with footguns, as a
std::optional return value indicates that a non-nullopt value would be a
valid trip count. Fix this by explicitly returning std::nullopt when the
return value would wrap, and strip additional checks in callers. This
also fixes a minor bug in LoopVectorize.
Replace binary of of two reductions with one reduction of the binary op
applied to vectors. For example:
```
%v0_red = tail call i32 @llvm.vector.reduce.add.v16i32(<16 x i32> %v0)
%v1_red = tail call i32 @llvm.vector.reduce.add.v16i32(<16 x i32> %v1)
%res = add i32 %v0_red, %v1_red
```
gets transformed to:
```
%1 = add <16 x i32> %v0, %v1
%res = call i32 @llvm.vector.reduce.add.v16i32(<16 x i32> %1)
```
Consider the following loop:
```
int rdx = init;
for (int i = 0; i < n; ++i)
rdx = (a[i] > b[i]) ? i : rdx;
```
We can vectorize this loop if `i` is an increasing induction variable.
The final reduced value will be the maximum of `i` that the condition
`a[i] > b[i]` is satisfied, or the start value `init`.
This patch added new RecurKind enums - IFindLastIV and FFindLastIV.
---------
Co-authored-by: Alexey Bataev <5361294+alexey-bataev@users.noreply.github.com>
The DominatorTree version is marked for deprecation, so we use the
DomTreeUpdater version. We also update sinkRegion() to iterate over
basic blocks instead of DomTreeNodes. The loop body calls
SplitBlockPredecessors. The DTU version calls
DomTreeUpdater::apply_updates(), which may call DominatorTree::reset().
This invalidates the worklist of DomTreeNodes to iterate over.
This change merges the three different places (at the IR layer) for
finding the identity value of a reduction into a single copy. This
depends on several prior commits which fix ommissions and bugs in
the distinct copies, but this patch itself should be fully
non-functional.
As the new comments and naming try to make clear, the identity value
is a property of the @llvm.vector.reduce.* intrinsic, not of e.g.
the recurrence descriptor. (We still provide an interface for
clients using recurrence descriptors, but the implementation simply
translates to the intrinsic which each corresponds to.)
As a note, the getIntrinsicIdentity API does not support fminnum/fmaxnum
or fminimum/fmaximum which is why we still need manual logic (but at
least only one copy of manual logic) for those cases.
This is a follow up to 924907bc6, and is mostly motivated by consistency
but does include one additional optimization. In general, we prefer 0.0
over -0.0 as the identity value for an fadd. We use that value in
several places, but don't in others. So, let's be consistent and use the
same identity (when nsz allows) everywhere.
This creates a bunch of test churn, but due to 924907bc6, most of that
churn doesn't actually indicate a change in codegen. The exception is
that this change enables the use of 0.0 for nsz, but *not* reasoc, fadd
reductions. Or said differently, it allows the neutral value of an
ordered fadd reduction to be 0.0.
Consolidating code so that we have one copy instead of multiple reasoning
about identity element. Note that we're (deliberately) not passing
the FMF flags to common utility to preserve behavior in this change.
This patch refactors the handling of reduction to eliminate layering
violations.
* Introduced `getReductionIntrinsicID` in LoopUtils.h for mapping
recurrence kinds to llvm.vector.reduce.* intrinsic IDs.
* Updated `VectorBuilder::createSimpleTargetReduction` to accept
llvm.vector.reduce.* intrinsic directly.
* New function `VPIntrinsic::getForIntrinsic` for mapping intrinsic ID
to the same functional VP intrinsic ID.
WebAssembly doesn't support horizontal operations nor does it have a way
of expressing fast-math or reassoc flags, so runtimes are currently
unable to use pairwise operations when generating code from the existing
shuffle patterns.
This patch allows the backend to select which, arbitary, shuffle pattern
to be used per reduction intrinsic. The default behaviour is the same as
the existing, which is by splitting the vector into a top and bottom
half. The other pattern introduced is for a pairwise shuffle.
WebAssembly enables pairwise reductions for int/fp add/sub.
This is a helper to avoid writing `getModule()->getDataLayout()`. I
regularly try to use this method only to remember it doesn't exist...
`getModule()->getDataLayout()` is also a common (the most common?)
reason why code has to include the Module.h header.
https://bugs.llvm.org/show_bug.cgi?id=51735https://github.com/llvm/llvm-project/issues/51077
In the given test case:
```
4 ...
5 void bar() {
6 int End = 777;
7 int Index = 27;
8 char Var = 1;
9 for (; Index < End; ++Index)
10 ;
11 nop(Index);
12 }
13 ...
```
Missing local variable `Index` after loop `Induction Variable Elimination`.
When adding a breakpoint at line `11`, LLDB does not have information
on the variable. But it has info on `Var` and `End`.
I'm planning to remove StringRef::equals in favor of
StringRef::operator==.
- StringRef::operator==/!= outnumber StringRef::equals by a factor of
31 under llvm/ in terms of their usage.
- The elimination of StringRef::equals brings StringRef closer to
std::string_view, which has operator== but not equals.
- S == "foo" is more readable than S.equals("foo"), especially for
!Long.Expression.equals("str") vs Long.Expression != "str".
This reverts the revert commit c6e01627acf859.
This patch includes a fix for any-of reductions and epilogue
vectorization. Extra test coverage for the issue that caused the revert
has been added in bce3bfced5fe0b019 and an assertion has been added in
c7209cbb8be7a3c65813.
--------------------------------
Original commit message:
Update AnyOf reduction code generation to only keep track of the AnyOf
property in a boolean vector in the loop, only selecting either the new
or start value in the middle block.
The patch incorporates feedback from https://reviews.llvm.org/D153697.
This fixes the #62565, as now there aren't multiple uses of the
start/new values.
Fixes https://github.com/llvm/llvm-project/issues/62565
PR: https://github.com/llvm/llvm-project/pull/78304
With the addition of #84628, truncs to i1 are being emitted as
conditions to branch instructions. This caused significant regressions
in cases which were previously improved by loop unswitch. Adding truncs
to i1 restore the previous performance seen.
https://bugs.llvm.org/show_bug.cgi?id=51735https://github.com/llvm/llvm-project/issues/51077
In the given test case:
```
4 ...
5 void bar() {
6 int End = 777;
7 int Index = 27;
8 char Var = 1;
9 for (; Index < End; ++Index)
10 ;
11 nop(Index);
12 }
13 ...
```
Missing local variable `Index` after loop `Induction Variable Elimination`. When adding a breakpoint at line `11`, LLDB does not have information on the variable. But it has info on `Var` and `End`.
This reverts the revert commit 589c7abb03448.
This patch includes a fix for any-of reductions and epilogue
vectorization. Extra test coverage for the issue that caused the revert
has been added in 399ff08e29d.
--------------------------------
Original commit message:
Update AnyOf reduction code generation to only keep track of the AnyOf
property in a boolean vector in the loop, only selecting either the new
or start value in the middle block.
The patch incorporates feedback from https://reviews.llvm.org/D153697.
This fixes the #62565, as now there aren't multiple uses of the
start/new values.
Fixes https://github.com/llvm/llvm-project/issues/62565
PR: https://github.com/llvm/llvm-project/pull/78304
This is the major rename patch that prior patches have built towards.
The DPValue class is being renamed to DbgVariableRecord, which reflects
the updated terminology for the "final" implementation of the RemoveDI
feature. This is a pure string substitution + clang-format patch. The
only manual component of this patch was determining where to perform
these string substitutions: `DPValue` and `DPV` are almost exclusively
used for DbgRecords, *except* for:
- llvm/lib/target, where 'DP' is used to mean double-precision, and so
appears as part of .td files and in variable names. NB: There is a
single existing use of `DPValue` here that refers to debug info, which
I've manually updated.
- llvm/tools/gold, where 'LDPV' is used as a prefix for symbol
visibility enums.
Outside of these places, I've applied several basic string
substitutions, with the intent that they only affect DbgRecord-related
identifiers; I've checked them as I went through to verify this, with
reasonable confidence that there are no unintended changes that slipped
through the cracks. The substitutions applied are all case-sensitive,
and are applied in the order shown:
```
DPValue -> DbgVariableRecord
DPVal -> DbgVarRec
DPV -> DVR
```
Following the previous rename patches, it should be the case that there
are no instances of any of these strings that are meant to refer to the
general case of DbgRecords, or anything other than the DPValue class.
The idea behind this patch is therefore that pure string substitution is
correct in all cases as long as these assumptions hold.