After https://github.com/llvm/llvm-project/pull/153643, there may be a
BranchOnCond with constant condition in the entry block.
Simplify those in removeBranchOnConst. This removes a number of
redundant conditional branch from entry blocks.
In some cases, it may also make the original scalar loop unreachable,
because we know it will never execute. In that case, we need to remove
the loop from LoopInfo, because all unreachable blocks may dominate each
other, making LoopInfo invalid. In those cases, we can also completely
remove the loop, for which I'll share a follow-up patch.
Depends on https://github.com/llvm/llvm-project/pull/153643.
PR: https://github.com/llvm/llvm-project/pull/154510
The motivation for this patch is to close the gap between the
VPlan-based CSE and the legacy CSE, to make it easier to remove the
legacy CSE. Before this patch, stubbing out the legacy CSE leads to 22
test failures, and after this patch, there are only 12 failures, and all
of them seem to have a single root cause:
VPlanTransforms::createInterleaveGroups() and
VPInterleaveGroup::execute(). The improvements from this patch are of
course welcome.
While developing the patch, a miscompile was found when GEP
source-element-types differ, and this has been fixed.
Co-authored-by: Florian Hahn <flo@fhahn.com>
Co-authored-by: Luke Lau <luke@igalia.com>
Update calculateRegisterUsageForPlan to track live-ness of VPValues
instead of recipes. This gives slightly more accurate results for
recipes that define multiple values (i.e. VPInterleaveRecipe).
When tracking the live-ness of recipes, all VPValues defined by an
VPInterleaveRecipe are considered alive until the last use of any of
them. When tracking the live-ness of individual VPValues, we can
accurately track the individual values until their last use.
Note the changes in large-loop-rdx.ll and pr47437.ll. This patch
restores the original behavior before introducing VPlan-based liveness
tracking.
PR: https://github.com/llvm/llvm-project/pull/155301
Split GEPs that have more than one non-zero offset into two GEPs. This
is in preparation for the ptradd migration, which can only represent
such GEPs.
This also enables CSE and LICM of the common base.
Remove instcombine, simplifycfg and dce from some tests, as they make it
a bit more difficult to see the codegen coming out of LV and most
simplifications are already done on the VPlan-level.
Also modernizes some check lines.
Look through extractvalue to simplify umul_with_overflow where one of
the operands is 1.
This removes some redundant instructions when expanding SCEVs, which in
turn makes the runtime check cost estimate more accurate, reducing the
minimum iterations for which vectorization is profitable.
PR: https://github.com/llvm/llvm-project/pull/157307
The SCEV predicate in the existing tests for optimizing for size is
known to be false. Add additional test with a predicate that cannot be
proven true/false.
Also generate checks with latest version of script.
Remove instcombine, simplifycfg and dce from some tests, as they make it
a bit more difficult to see the codegen coming out of LV and most
simplifications are already done on the VPlan-level.
Also modernizes some check lines.
The second operand when using a safe divisor will always be a select in
the loop, so won't be invariant; don't treat it as such.
This fixes a divergence with legacy and VPlan based cost model.
Fixes https://github.com/llvm/llvm-project/issues/156066.
The 256-bit maximum vector register size control was removed from AVX10
whitepaper, ref: https://cdrdv2.intel.com/v1/dl/getContent/784343
We have warned these options in LLVM21 through #132542. This patch
removes underlying implementations in LLVM22.
This PR reassociates logical ands in order to enable more
simplifications.
The driving motivation for this is that with tail folding all blocks
inside the loop body will end up using the header mask. However this can
end up nestled deep within a chain of logical ands from other edges.
Typically the header mask will be a leaf nested in the LHS, e.g.
(headermask & y) & z. So pulling it out allows it to be simplified
further, e.g. allows it to be optimised away to VP intrinsics with EVL
tail folding.
Introduce a simple common-subexpression-elimination pass at the
VPlan-level, running late during the execution of the VPlan. The
long-term vision is to get rid of the legacy non-VPlan-based cse routine
in LV, but this patch doesn't yet fully subsume it.
GEPs are often in the form `gep [N x %T], ptr %p, i64 0, i64 %idx`.
Canonicalize these to `gep %T, ptr %p, i64 %idx`.
This enables transforms that only support one GEP index to work and
improves CSE.
Various transforms were recently hardened to make sure they still work
without the leading index.
If a phi is widened with tail folding, all of its predecessors will have
a mask of the form
%x = logical-and %active-lane-mask, %foo
%y = logical-and %active-lane-mask, %bar
%z = logical-and %active-lane-mask, %baz
...
We can remove the common %active-lane-mask from all of these edge masks,
which in turn allows us to simplify a lot of VPBlendRecipes.
In particular, it allows the header mask to be removed in selects with
EVL tail folding, improving RISC-V codegen on SPEC CPU 2017 for
525.x264_r, and supersedes #147243.
This also allows us to remove VPBlendRecipe and directly emit
VPInstruction::Select in another patch.
There were 5 X86 loop vectoriser tests that were piping the output from
opt into llc. I think in the directory test/Transforms/LoopVectorize we
should only be testing the output from the loop vectoriser pass. Any
codegen tests should live in test/CodeGen/X86 instead.
avx512.ll: it looks like we were really just testing that we generate
the right vector length.
fp32_to_uint32-cost-model.ll/fp64_to_uint32-cost-model.ll: the tests
only seem to care that we're not scalarising the fptoui, so I've
modified the test to check for vector ops. I've assumed there are
already codegen tests for fptoui vector operations.
vectorization-remarks-loopid-dbg.ll: i've copied this test to
CodeGen/X86/vectorization-remarks-loopid-dbg.ll for the llc RUN line
variant
vectorization-remarks.ll: seems to test the same thing as
vectorization-remarks-loopid-dbg.ll
Currently we only allow folding not (cmp eq) -> icmp ne if the not is
the only user of the compare.
However a common scenario is that some select might also use the
compare. We can still fold the not if we also swizzle the arms of the
selects.
This helps avoid regressions in #150368
If we have entries in Def2LaneDefs, we always have to use it. Move the
check before.
Otherwise we may not pick the correct operand, e.g. if Op was a
replicate recipe that got single-scalar after replicating it.
Fixes https://github.com/llvm/llvm-project/issues/154330.
SimplifyBranchConditionForVFAndUF only recognized canonical IVs and a
few PHI
recipes in the loop header. With more IV-step optimizations,
the canonical widen-canonical-iv can be replaced by a canonical
VPWidenIntOrFpInduction,
which the pass did not handle, causing regressions (missed
simplifications).
This patch replaces canonical VPWidenIntOrFpInduction with a StepVector
in the vector preheader
since the vector loop region only executes once.
If ExtraAnalysis is requested, emit all remarks caused by unvectorizable instructions - instead of only the first.
This is in line with how other places handle DoExtraAnalysis and it can be quite helpful to get info about all instructions in a loop that prevent vectorization.
Dissolving the hierarchical VPlan CFG and converting abstract to
concrete recipes can expose additional simplification opportunities.
Do a final run of simplifyRecipes before executing the VPlan.
This reverts commit 1c7c8e3ad39957285524ff116d9a6aec0d9b62f9.
Recommit with a fix for the verifier error caused for EVL recipes.
Extra test coverage added in 6f939da60e.
Epilogue vectorization currently relies on the resume phi for the
canonical induction being always available, which is why VPPhi are
considered to have side-effects, to prevent their removal.
This patch adds a new ResumeForEpilogue opcode to mark the resume phi as
used for epilogue vectorization. This allows treating VPPhis in general
as not having side-effects, enabling removal of unused VPPhis.
Materialize the vector trip count computation using VPInstruction
instead of directly creating IR. This is one of the last few steps
needed to model the full vector skeleton in VPlan. It also simplifies
vector-trip count computations for scalable vectors, as we can re-use
the UF x VF computation.
PR: https://github.com/llvm/llvm-project/pull/151925
Now that #149310 has restricted lifetime intrinsics to only work on
allocas, we can also drop the explicit size argument. Instead, the size
is implied by the alloca.
This removes the ability to only mark a prefix of an alloca alive/dead.
We never used that capability, so we should remove the need to handle
that possibility everywhere (though many key places, including stack
coloring, did not actually respect this).
This is the VPWidenPointerInductionRecipe equivalent of #118638, with
the motivation of allowing us to use the EVL as the induction step.
There is a new VPInstruction added, WidePtrAdd to allow adding the step
vector to the induction phi, since VPInstruction::PtrAdd only handles
scalars or multiple scalar lanes.
Originally this transformation was copied from the original recipe's
execute code, but it's since been simplifed by teaching
`unrollWidenInductionByUF` to unroll the recipe, which brings it inline
with VPWidenIntOrFpInductionRecipe.
Make sure to check that the vector trip count is containedin the list of
incoming values to serve as tie-breaker with phis with all-zero incoming
values.
Fixes https://github.com/llvm/llvm-project/issues/151686.