The initial VPlan closely reflects the original scalar loop, so unsing
VPWidenPHIRecipe here is premature. Widened phi recipes should only be
introduced together with other widened recipes.
PR: https://github.com/llvm/llvm-project/pull/150847
## Purpose
This patch is one in a series of code-mods that annotate LLVM’s public
interface for export. This patch annotates symbols that were recently
added to LLVM and fixes incorrectly annotated symbols.
## Background
This effort is tracked in #109483. Additional context is provided in
[this
discourse](https://discourse.llvm.org/t/psa-annotating-llvm-public-interface/85307),
and documentation for `LLVM_ABI` and related annotations is found in the
LLVM repo
[here](https://github.com/llvm/llvm-project/blob/main/llvm/docs/InterfaceExportAnnotations.rst).
## Overview
The bulk of these changes were generated automatically using the
[Interface Definition Scanner (IDS)](https://github.com/compnerd/ids)
tool, followed formatting with `git clang-format`.
The following manual adjustments were also applied after running IDS:
- Add `LLVM_EXPORT_TEMPLATE` and `LLVM_TEMPLATE_ABI` annotations to
explicitly instantiated instances of `llvm::object::SFrameParser`.
## Validation
On Windows 11:
```
cmake -B build -S llvm -G Ninja -DLLVM_ENABLE_PROJECTS="llvm;clang;clang-tools-extra;lldb;lld" -DLLVM_OPTIMIZED_TABLEGEN=ON -DLLVM_BUILD_LLVM_DYLIB=ON -DLLVM_BUILD_LLVM_DYLIB_VIS=ON -DLLVM_LINK_LLVM_DYLIB=ON -DLLVM_BUILD_TESTS=ON -DCLANG_LINK_CLANG_DYLIB=OFF -DCMAKE_BUILD_TYPE=Release
ninja -C build
```
For `select`, we don't have the equivalent of the branch probability analysis to offer defaults, so we make up our own and allow their overriding with flags.
Issue #147390
Update handling of canonical IV resume phi for the epilogue loop to make
sure the resume phi for the canonical IV is always the first phi in the
scalar preheader.
This makes it easier to retrieve it in preparePlanForEpilogueVectorLoop.
For now, we keep an assert to make sure we use the same resume phi as
before. This will be removed in the future.
Split out from #150248:
Since #150944 the size passed to lifetime.start/end is considered
meaningless. The lifetime always applies to the whole alloca.
Accordingly, remove checks of the lifetime size from MemCpyOpt.
This patch adds Module splitting by categories. The splitting algorithm
is the necessary step in the SYCL compilation pipeline. Also it could be
reused for other heterogenous targets.
The previous attempt was at #119713. In this patch there is no
dependency in `TransformUtils` on "IPO" and on "Printing Passes". In
this patch a module splitting is self-contained and it doesn't introduce
linking issues.
This is the VPWidenPointerInductionRecipe equivalent of #118638, with
the motivation of allowing us to use the EVL as the induction step.
There is a new VPInstruction added, WidePtrAdd to allow adding the step
vector to the induction phi, since VPInstruction::PtrAdd only handles
scalars or multiple scalar lanes.
Originally this transformation was copied from the original recipe's
execute code, but it's since been simplifed by teaching
`unrollWidenInductionByUF` to unroll the recipe, which brings it inline
with VPWidenIntOrFpInductionRecipe.
Move selectInterleaveCount to LoopVectorizationPlanner and retrieve some
information directly from VPlan. Register pressure was already computed
for a VPlan, and with this patch we now also check for reductions
directly on VPlan, as well as checking how many load and store
operations remain in the loop.
This should be mostly NFC, but we may compute slightly different
interleave counts, except for some edge cases, e.g. where dead loads
have been removed. This shouldn't happen in practice, and the patch
doesn't cause changes across a large test corpus on AArch64.
Computing the interleave count based on VPlan allows for making better
decisions in presence of VPlan optimizations, for example when
operations on interleave groups are narrowed.
Note that there are a few test changes for tests that were still
checking the legacy cost-model output when it was computed in
selectInterleaveCount.
PR: https://github.com/llvm/llvm-project/pull/149702
Resolves#151574.
> SROA pass does not perform aggregate load/store rewriting on a pointer
whose source is a `launder.invariant.group`.
>
> This causes failed assertion in `AllocaSlices`.
>
> ```
> void (anonymous
namespace)::AllocaSlices::SliceBuilder::visitStoreInst(StoreInst &):
> Assertion `(!SI.isSimple() || ValOp->getType()->isSingleValueType())
&&
> "All simple FCA stores should have been pre-split"' failed.
> ```
Disables support for `{launder,strip}.invariant.group` intrinsics in
SROA.
Updates SROA test for `invariant.group` support.
If we have a known `p == p2` equality, we cannot replace `p2` with `p`
unless they are known to have the same provenance. GVN handles this when
propagating equalities from conditions, but not for assumes, as these go
through a different code path for uses in the same block.
Call canReplacePointersInUseIfEqual() before performing the replacement.
This is subject to the usual approximations (e.g. that we always allow
replacement with a dereferenceable constant and null).
This restriction does not appear to have any impact in practice.
Look through `not` when propagating equalities in GVN. Usually these
will be canonicalized away, but they may be retained due to multi-use or
involvement in logical expressions.
Fixes https://github.com/llvm/llvm-project/issues/143529.
This patch uses VPTypeAnalysis to determine its type since the induction
step is not always a live-in value in the VPlan and may be defined by a
recipe.
We iterate over the scalar costs of instruction when printing costs, and
currently the iteration order is not deterministic. Currently no tests
check the output with multiple instructions in the map, but those will
come soon.
If the instruction is checked for matching the main instruction, need to
check if the opcode of the main instruction is compatible with the
operands of the instruction. If they are not, need to check the
alternate instruction and its operands for compatibility and return
alternate instruction as a match.
Fixes#151699
Fixed check for non-supported binary operations.
If the instruction is checked for matching the main instruction, need to
check if the opcode of the main instruction is compatible with the
operands of the instruction. If they are not, need to check the
alternate instruction and its operands for compatibility and return
alternate instruction as a match.
Fixes#151699
Currently hasEarlyExit returns true, if there are multiple exit blocks.
ExitBlocks contains the wrapped original IR exit blocks. Without
checking the predecessors we incorrectly return true for loops with
multiple countable exits, that have been vectorized by requiring a
scalar epilogue. In that case, the exit blocks will get disconnected.
Fix this by filtering out disconnected exit blocks.
Currently this should only impact the 'early-exit vectorized' statistic.
PR: https://github.com/llvm/llvm-project/pull/151718
There is a larger problem here in that we should not be performing
arbitrary pointer replacements for assumes. This is handled for
branches, but assume goes through a different code path.
Fixes https://github.com/llvm/llvm-project/issues/151785.
Attempt to shrink the size of vector loads where only some of the incoming lanes are used for rebroadcasts in shufflevector instructions.
---------
Co-authored-by: Leon Clark <leoclark@amd.com>
Co-authored-by: Simon Pilgrim <llvm-dev@redking.me.uk>
This slightly relaxes the invariant established in #149310, by also
allowing the lifetime argument to be poison. This is to support the
typical pattern of RAUWing with poison when removing an instruction.
It's worth noting that this does not require any conservative
assumptions, lifetimes with poison arguments can simply be skipped.
Fixes https://github.com/llvm/llvm-project/issues/151119.
There is a case when branch profile metadata is OK to miss, namely, cold functions. The goal of the RFC (see the referenced issue) is to avoid accidental omission (and, at a later date, corruption) of profile metadata. However, asking cold functions to have all their conditional branches marked with "0" probabilities would be overdoing it. We can just ask cold functions to have an explicit 0 entry count.
This patch:
- injects an entry count for functions, unless they have one (synthetic or not)
- if the entry count is 0, doesn't inject, nor does it verify the rest of the metadata
- at verification, if the entry count is missing, it reports an error
Issue #147390
Explicitly compute the backedge-taken count using VPInstruction. This is
needed to model the full skeleton in VPlan.
NFC modulo some instruction re-ordering.
My understanding is that gep [n x i8] and gep i8 can be treated
equivalently - the array type conveys no extra information and could be
removed. This goes through foldCmpLoadFromIndexedGlobal and tries to
make it work for non-array gep types, so long as the index type still
matches the array being loaded.
The operands of the replicate recipe may have been narrowed, resulting
in a narrower result type. Update the type of the cloned instruction to
the correct type.
Fixes https://github.com/llvm/llvm-project/issues/151392.
Update isConditionTrueViaVFAndUF to use the vector trip count if
computable. This is the case when it has been materialized to a
constant. Otherwise fall back to the trip count.
PR: https://github.com/llvm/llvm-project/pull/151034
Make sure to check that the vector trip count is containedin the list of
incoming values to serve as tie-breaker with phis with all-zero incoming
values.
Fixes https://github.com/llvm/llvm-project/issues/151686.
We weren't performing node merging on newly created nodes in some cases.
Use a simple iteration over the node and its callers until no more
opportunities are found. I confirmed that for several large codes the
max iterations is 3 (meaning we only needed to do any work on the first
2, as expected). This can potentially be made more elegant in the
future, but it is a simple and effective solution.
Also fix a bug exposed by the test case, getting the function for a call
instruction in the FullLTO handling, using an existing method to look
through aliases if needed.
We iterate over InstsToScalarize when printing costs, and currently the
iteration order is not deterministic. Currently no tests check the
output with multiple instructions in InstsToScalarize, but those will
come soon.
Extracting any element from a subvector starting at index 0 is
equivalent to extracting from the original vector, i.e.
extract_elt(vector_extract(x, 0), y) -> extract_elt(x, y)