The initial VPlan closely reflects the original scalar loop, so unsing
VPWidenPHIRecipe here is premature. Widened phi recipes should only be
introduced together with other widened recipes.
PR: https://github.com/llvm/llvm-project/pull/150847
## Purpose
This patch is one in a series of code-mods that annotate LLVM’s public
interface for export. This patch annotates symbols that were recently
added to LLVM and fixes incorrectly annotated symbols.
## Background
This effort is tracked in #109483. Additional context is provided in
[this
discourse](https://discourse.llvm.org/t/psa-annotating-llvm-public-interface/85307),
and documentation for `LLVM_ABI` and related annotations is found in the
LLVM repo
[here](https://github.com/llvm/llvm-project/blob/main/llvm/docs/InterfaceExportAnnotations.rst).
## Overview
The bulk of these changes were generated automatically using the
[Interface Definition Scanner (IDS)](https://github.com/compnerd/ids)
tool, followed formatting with `git clang-format`.
The following manual adjustments were also applied after running IDS:
- Add `LLVM_EXPORT_TEMPLATE` and `LLVM_TEMPLATE_ABI` annotations to
explicitly instantiated instances of `llvm::object::SFrameParser`.
## Validation
On Windows 11:
```
cmake -B build -S llvm -G Ninja -DLLVM_ENABLE_PROJECTS="llvm;clang;clang-tools-extra;lldb;lld" -DLLVM_OPTIMIZED_TABLEGEN=ON -DLLVM_BUILD_LLVM_DYLIB=ON -DLLVM_BUILD_LLVM_DYLIB_VIS=ON -DLLVM_LINK_LLVM_DYLIB=ON -DLLVM_BUILD_TESTS=ON -DCLANG_LINK_CLANG_DYLIB=OFF -DCMAKE_BUILD_TYPE=Release
ninja -C build
```
Update handling of canonical IV resume phi for the epilogue loop to make
sure the resume phi for the canonical IV is always the first phi in the
scalar preheader.
This makes it easier to retrieve it in preparePlanForEpilogueVectorLoop.
For now, we keep an assert to make sure we use the same resume phi as
before. This will be removed in the future.
This is the VPWidenPointerInductionRecipe equivalent of #118638, with
the motivation of allowing us to use the EVL as the induction step.
There is a new VPInstruction added, WidePtrAdd to allow adding the step
vector to the induction phi, since VPInstruction::PtrAdd only handles
scalars or multiple scalar lanes.
Originally this transformation was copied from the original recipe's
execute code, but it's since been simplifed by teaching
`unrollWidenInductionByUF` to unroll the recipe, which brings it inline
with VPWidenIntOrFpInductionRecipe.
Move selectInterleaveCount to LoopVectorizationPlanner and retrieve some
information directly from VPlan. Register pressure was already computed
for a VPlan, and with this patch we now also check for reductions
directly on VPlan, as well as checking how many load and store
operations remain in the loop.
This should be mostly NFC, but we may compute slightly different
interleave counts, except for some edge cases, e.g. where dead loads
have been removed. This shouldn't happen in practice, and the patch
doesn't cause changes across a large test corpus on AArch64.
Computing the interleave count based on VPlan allows for making better
decisions in presence of VPlan optimizations, for example when
operations on interleave groups are narrowed.
Note that there are a few test changes for tests that were still
checking the legacy cost-model output when it was computed in
selectInterleaveCount.
PR: https://github.com/llvm/llvm-project/pull/149702
This patch uses VPTypeAnalysis to determine its type since the induction
step is not always a live-in value in the VPlan and may be defined by a
recipe.
We iterate over the scalar costs of instruction when printing costs, and
currently the iteration order is not deterministic. Currently no tests
check the output with multiple instructions in the map, but those will
come soon.
If the instruction is checked for matching the main instruction, need to
check if the opcode of the main instruction is compatible with the
operands of the instruction. If they are not, need to check the
alternate instruction and its operands for compatibility and return
alternate instruction as a match.
Fixes#151699
Fixed check for non-supported binary operations.
If the instruction is checked for matching the main instruction, need to
check if the opcode of the main instruction is compatible with the
operands of the instruction. If they are not, need to check the
alternate instruction and its operands for compatibility and return
alternate instruction as a match.
Fixes#151699
Currently hasEarlyExit returns true, if there are multiple exit blocks.
ExitBlocks contains the wrapped original IR exit blocks. Without
checking the predecessors we incorrectly return true for loops with
multiple countable exits, that have been vectorized by requiring a
scalar epilogue. In that case, the exit blocks will get disconnected.
Fix this by filtering out disconnected exit blocks.
Currently this should only impact the 'early-exit vectorized' statistic.
PR: https://github.com/llvm/llvm-project/pull/151718
Attempt to shrink the size of vector loads where only some of the incoming lanes are used for rebroadcasts in shufflevector instructions.
---------
Co-authored-by: Leon Clark <leoclark@amd.com>
Co-authored-by: Simon Pilgrim <llvm-dev@redking.me.uk>
Explicitly compute the backedge-taken count using VPInstruction. This is
needed to model the full skeleton in VPlan.
NFC modulo some instruction re-ordering.
The operands of the replicate recipe may have been narrowed, resulting
in a narrower result type. Update the type of the cloned instruction to
the correct type.
Fixes https://github.com/llvm/llvm-project/issues/151392.
Update isConditionTrueViaVFAndUF to use the vector trip count if
computable. This is the case when it has been materialized to a
constant. Otherwise fall back to the trip count.
PR: https://github.com/llvm/llvm-project/pull/151034
Make sure to check that the vector trip count is containedin the list of
incoming values to serve as tie-breaker with phis with all-zero incoming
values.
Fixes https://github.com/llvm/llvm-project/issues/151686.
We iterate over InstsToScalarize when printing costs, and currently the
iteration order is not deterministic. Currently no tests check the
output with multiple instructions in InstsToScalarize, but those will
come soon.
When interleaved stores contain gaps, a mask is required to skip the
gaps, regardless of whether scalar epilogues are allowed.
This patch corrects the condition under which a gap mask is needed,
ensuring consistency between the legacy and VPlan-based cost models and
avoiding assertion failures.
Related #149981
This implements the first half of #151459, by changing the AVL so it's
no longer computed as `trip-count - EVL-based IV`, but instead a
separate scalar phi that is decremented by EVL each iteration.
This shortens the dependency chain for computing the AVL and should
eventually allow us to convert the branch condition to `branch-count
avl-next, 0`.
`simplifyBranchConditionForVFAndUF` had to be updated to prevent a
regression because this introduces a VPPhi in the header block.
Noticed this when checking the invariant that all phis in the header
block must be header phis. I think there's a missing set of parentheses
here, since otherwise it only cast<VPInstruction> when RecipeI isn't a
VPInstruction.
https://github.com/llvm/llvm-project/pull/147026 will enable sub
reductions, which require that the phi value is the first operand since
they aren't commutative. This re-orders the operands when executing
reductions, which actually matches other existing code in
VPReductionRecipe::execute.
Using GEP to index into a vector is not disallowed, but not recommended.
The SPIR-V backend needs to generate structured access into types, which
is impossible with an untyped GEP instruction unless we add more info to
the IR. Finding a solution is a work-in-progress, but in the meantime,
we'd like to reduce the amount of failures.
Preventing this optimizations from rewritting extract/insert
instructions into a GEP helps us lower more code to SPIR-V. This change
should be OK as it's only active when targeting SPIR-V and disabling a
non-recommended transformation.
Related to #145002
Loop regions require fixed-length steps and rounded-up trip counts, but
after dissolution creates explicit control flow, EVL loops can leverage
variable-length stepping with original trip counts.
This patch adds a post-dissolution transform pass to convert EVL loops
from fixed-length to variable-length stepping .
With EVL tail folding, the EVL may not always be VF on the
second-to-last iteration.
Recipes that have been converted to VP intrinsics via optimizeMaskToEVL
account for this, but recipes that are left behind will still use the
old header mask which may end up having a different vector length.
This is effectively the same as #95368, and fixes this by converting
header masks from icmp ule wide-canonical-iv, backedge-trip-count ->
icmp ult step-vector, evl. Without it, recipes that fall through
optimizeMaskToEVL may use the wrong vector length, e.g. in #150074 and
#149981.
We really need to split off optimizeMaskToEVL into
VPlanTransforms::optimize and move transformRecipestoEVLRecipes into
tryToBuildVPlanWithVPRecipes, so we don't mix up what is needed for
correctness and what is needed to optimize away the mask computations.
We should be able to still generate a correct albeit suboptimal VPlan
without running optimizeMaskToEVL. I've added a TODO for this, which I
think we can do after #148274Fixes#150197
This partially reverts https://github.com/llvm/llvm-project/pull/140744,
restoring the original TheLoop->isLoopInvariant check instead the more
powerful Legal->isInvariant, which uses SCEV.
This causes a mis-compile, because SCEV can prove that the stored value
is loop-invariant, which in turn converts the store to a uniform store.
But in VPlan, we aren't yet able to determine that the stored value is
loop-invariant, so we extract the last lane, which is incorrect, because
it does not account for the mask of the store.
Restoring the original code is a safe fix and avoids this subtle
divergence.
Fixes https://github.com/llvm/llvm-project/issues/149347.
PR: https://github.com/llvm/llvm-project/pull/150828
Update getSmallConstantTripCount() to return scalable ElementCount
values that is used to acurrately determine the maximum value for UF,
namely:
TripCount / VF ==> X * VScale / Y * VScale ==> X / Y
This improves the chances of being able to remove the scalar loop and
also fixes an issue where a UF=2 is choosen for a scalar loop with
exactly VF(= X * VScale) iterations.
This PR adds a new interface to IRBuilder called CreateVectorInterleave,
which can be used to create vector.interleave intrinsics of factors 2-8.
For convenience I have also moved getInterleaveIntrinsicID and
getDeinterleaveIntrinsicID from VectorUtils.cpp to Intrinsics.cpp where
it can be used by IRBuilder.
VPVectorPointer for part 0 is just the pointer operand. Simplify it
after unrolling. This removes a large number of redundant GEPs with
index 0.
PR: https://github.com/llvm/llvm-project/pull/149735