Move vector pointer generation to a separate VPVectorPointerRecipe.
This untangles address computation from the memory recipes future
and is also needed to enable explicit unrolling in VPlan.
https://github.com/llvm/llvm-project/pull/72164
This adds support for using dominating conditions in computeKnownBits()
when called from InstCombine. The implementation uses a
DomConditionCache, which stores which branches may provide information
that is relevant for a given value.
DomConditionCache is similar to AssumptionCache, but does not try to do
any kind of automatic tracking. Relevant branches have to be explicitly
registered and invalidated values explicitly removed. The necessary
tracking is done inside InstCombine.
The reason why this doesn't just do exactly the same thing as
AssumptionCache is that a lot more transforms touch branches and branch
conditions than assumptions. AssumptionCache is an immutable analysis
and mostly gets away with this because only a handful of places have to
register additional assumptions (mostly as a result of cloning). This is
very much not the case for branches.
This change regresses compile-time by about ~0.2%. It also improves
stage2-O0-g builds by about ~0.2%, which indicates that this change results
in additional optimizations inside clang itself.
Fixes https://github.com/llvm/llvm-project/issues/74242.
Instead of expanding the src/sink SCEV expressions and emitting an IR
sub to compute the difference, the subtraction can be directly be
performed by ScalarEvolution. This allows the subtraction to be
simplified by SCEV, which in turn can reduced the number of redundant
runtime check instructions generated.
It also allows to generate checks that are invariant w.r.t. an outer
loop, if he inner loop AddRecs have the same outer loop AddRec as start.
Fixed issue where 'ConstantInt::get(IndextTy, -Part)' was executed with the wrong type for Part,
e.g. IndexTy was i64, but Part was 'unsigned', which led to things like 'mul i64 .., 4294967292',
which was obviously wrong.
Also changed sve-vector-reverse.ll to be vectorized with UF>1 to test this.
This reverts commit 1f01cdda68614dba12af3cc3aff38541d0abcc6b.
This is specifically relevant for loops that vectorize using a scalable VF,
where the code results in:
%vscale = call i32 llvm.vscale.i32()
%vf.part1 = mul i32 %vscale, 4
%gep = getelementptr ..., i32 %vf.part1
Which InstCombine then changes into:
%vscale = call i32 llvm.vscale.i32()
%vf.part1 = mul i32 %vscale, 4
%vf.part1.zext = sext i32 %vf.part1 to i64
%gep = getelementptr ..., i32 %vf.part1.zext
D143016 tried to remove these extends, but that only works when
the call to llvm.vscale.i32() has a single use. After doing any
kind of CSE on these calls the combine no longer kicks in.
It seems more sensible to ask DataLayout what type to use, rather
than relying on InstCombine to insert the extend and hoping it can
fold it away.
I've only changed this for indices that are not constant, because
I vaguely remember there was a reason for sticking with i32. It
would also mean patching up loads more tests.
Reviewed By: paulwalker-arm
Differential Revision: https://reviews.llvm.org/D143267
Instcombine prefers this canonical form (see getPreferredVectorIndex),
as does IRBuilder when passing the index as an integer so we may as
well use the prefered form from creation.
NOTE: All test changes are mechanical with nothing else expected
beyond a change of index type from i32 to i64.
Differential Revision: https://reviews.llvm.org/D140983
This mirrors a similar shufflevector transformation so the same
effect is obtained for scalable vectors. The transformation is
only performed when it can be proven the number of resulting
reversals is not increased. By bubbling the reversals from operand
to result this should typically be the case and ideally leads to
back-back shuffles that can be elimitated entirely.
Differential Revision: https://reviews.llvm.org/D139342
This stops Negator from transforming:
`C1 - shl X, C2 --> mul X, (1<<C2) + C1`
...in the general case. There does not seem to be any analysis
benefit to using mul in IR, and there's definitely downside in
codegen (particularly when the multiply has to be expanded).
If `C1` is 0, then there's a stronger argument that the single
mul is a better canonicalization than negate-of-shl, but we may
want to remove that too.
This was noted as a potential conflict for D133667.
Differential Revision: https://reviews.llvm.org/D134310
Whilst writing a patch to add extra tail-folding RUN lines to
existing tests I noticed a few areas where they can be
cleaned up a little:
1. scalable-reductions.ll: fmin_fast does not mark fcmp as fast.
2. sve-inductions-unusual-types.ll: remove direct references to
SSA variable names.
3. sve-strict-fadd-cost.ll: don't force vector width so we see
costs for different VFs in one go. This will be important for
the follow-on patch.
4. sve-vector-reverse.ll,vector-reverse-mask4.ll: add noalias
keyword to simplify IR.
4. sve-widen-gep.ll,sve-widen-phi.ll: regenerate using script.
These changes will make the subsequent patch adding RUN lines much
easier to review!
Differential Revision: https://reviews.llvm.org/D132219
This patch is in preparation for enabling vectorisation with tail-folding
by default for SVE targets. Once we do that many existing tests will
break that depend upon having normal unpredicated vector loops. For
all such tests I have added the flag:
-prefer-predicate-over-epilogue=scalar-epilogue
Differential Revision: https://reviews.llvm.org/D129137
This patch adds initial support for a pointer diff based runtime check
scheme for vectorization. This scheme requires fewer computations and
checks than the existing full overlap checking, if it is applicable.
The main idea is to only check if source and sink of a dependency are
far enough apart so the accesses won't overlap in the vector loop. To do
so, it is sufficient to compute the difference and compare it to the
`VF * UF * AccessSize`. It is sufficient to check
`(Sink - Src) <u VF * UF * AccessSize` to rule out a backwards
dependence in the vector loop with the given VF and UF. If Src >=u Sink,
there is not dependence preventing vectorization, hence the overflow
should not matter and using the ULT should be sufficient.
Note that the initial version is restricted in multiple ways:
1. Pointers must only either be read or written, by a single
instruction (this allows re-constructing source/sink for
dependences with the available information)
2. Source and sink pointers must be add-recs, with matching steps
3. The step must be a constant.
3. abs(step) == AccessSize.
Most of those restrictions can be relaxed in the future.
See https://github.com/llvm/llvm-project/issues/53590.
Reviewed By: dmgreen
Differential Revision: https://reviews.llvm.org/D119078
The availability of SVE should be sufficient to enable scalable
auto-vectorization.
This patch adds a new TTI interface to query the target what style of
vectorization it wants when scalable vectors are available. For other
targets than AArch64, this currently defaults to 'FixedWidthOnly'.
Differential Revision: https://reviews.llvm.org/D115651
Reverted (manually due to merge conflicts) while regressions reported on PR51540 are investigated
As noticed on D106352, after we've folded "(select C, (gep Ptr, Idx), Ptr) -> (gep Ptr, (select C, Idx, 0))" if the inner Ptr was also a (now one use) gep we could then merge the geps, using the sum of the indices instead.
I've limited this to basic 2-op geps - a more general case further down InstCombinerImpl.visitGetElementPtrInst doesn't have the one-use limitation but only creates the add if it can be created via SimplifyAddInst.
https://alive2.llvm.org/ce/z/f8pLfD (Thanks Roman!)
Differential Revision: https://reviews.llvm.org/D106450
As noticed on D106352, after we've folded "(select C, (gep Ptr, Idx), Ptr) -> (gep Ptr, (select C, Idx, 0))" if the inner Ptr was also a (now one use) gep we could then merge the geps, using the sum of the indices instead.
I've limited this to basic 2-op geps - a more general case further down InstCombinerImpl.visitGetElementPtrInst doesn't have the one-use limitation but only creates the add if it can be created via SimplifyAddInst.
https://alive2.llvm.org/ce/z/f8pLfD (Thanks Roman!)
Differential Revision: https://reviews.llvm.org/D106450
This patch adds a new option to the LoopVectorizer to control how
scalable vectors can be used.
Initially, this suggests three levels to control scalable
vectorization, although other more aggressive options can be added in
the future.
The possible options are:
- Disabled: Disables vectorization with scalable vectors.
- Enabled: Vectorize loops using scalable vectors or fixed-width
vectors, but favors fixed-width vectors when the cost
is a tie.
- Preferred: Like 'Enabled', but favoring scalable vectors when the
cost-model is inconclusive.
Reviewed By: paulwalker-arm, vkmr
Differential Revision: https://reviews.llvm.org/D101945
After D98856 these tests will by default break (fatal_error) if any of
the wrong interfaces are used, so there's no longer a need to have a
RUN line that checks for a warning message emitted by the compiler.
This patch adds support for reverse loop vectorization.
It is possible to vectorize the following loop:
```
for (int i = n-1; i >= 0; --i)
a[i] = b[i] + 1.0;
```
with fixed or scalable vector.
The loop-vectorizer will use 'reverse' on the loads/stores to make
sure the lanes themselves are also handled in the right order.
This patch adds support for scalable vector on IRBuilder interface to
create a reverse vector. The IR function
CreateVectorReverse lowers to experimental.vector.reverse for scalable vector
and keedp the original behavior for fixed vector using shuffle reverse.
Differential Revision: https://reviews.llvm.org/D95363