When vectorizing code with function calls in it, if we encounter
a function which only has vectorized variants requiring a mask
we can synthesize an all-true mask to enable us to proceed.
Since we want the mask to be represented in vplan, the pointer
to the chosen Function is now stored as part of the
VPWidenCallRecipe, and mask arguments are added at the
appropriate index to the recipe operands.
Reviewed By: david-arm, fhahn, reames
Differential Revision: https://reviews.llvm.org/D132458
adjustFixedOrderRecurrences may insert instructions after immediately
after the PHI nodes in the block. This invalidates the phis() iterator.
To avoid crashing/accessing invalid recipes, first collect all
first-order recurrence phi recipes.
This should fix a crash reported by @dmgreen after D142589 landed.
Fixed issue where 'ConstantInt::get(IndextTy, -Part)' was executed with the wrong type for Part,
e.g. IndexTy was i64, but Part was 'unsigned', which led to things like 'mul i64 .., 4294967292',
which was obviously wrong.
Also changed sve-vector-reverse.ll to be vectorized with UF>1 to test this.
This reverts commit 1f01cdda68614dba12af3cc3aff38541d0abcc6b.
This patch updates LV to sink recipes directly using the VPlan use
chains. The initial patch only moves sinking to be purely VPlan-based.
Follow-up patches will move legality checks to VPlan as well.
At the moment, there's a single test failure remaining.
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D142589
Fix an issue about module linking with LTO.
When compiling with PIE, the small data limitation needs to be consistent with that in PIC, otherwise there will be linking errors due to conflicting values.
bar.c
```
int bar() { return 1; }
```
foo.c
```
int foo() { return 1; }
```
```
clang --target=riscv64-unknown-linux-gnu -flto -c foo.c -o foo.o -fPIE
clang --target=riscv64-unknown-linux-gnu -flto -c bar.c -o bar.o -fPIC
clang --target=riscv64-unknown-linux-gnu -flto foo.o bar.o -flto -nostdlib -v -fuse-ld=lld
```
```
ld.lld: error: linking module flags 'SmallDataLimit': IDs have conflicting values in 'bar.o' and 'ld-temp.o'
clang-15: error: linker command failed with exit code 1 (use -v to see invocation)
```
Use Min instead of Error for conflicting SmallDataLimit.
Authored by: @joshua-arch1
Signed-off-by: xiaojing.zhang <xiaojing.zhang@xcalibyte.com>
Signed-off-by: jianxin.lai <jianxin.lai@xcalibyte.com>
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D131230
The test contained a unused load that appears unrelated to the test
(store of vector of i1). Remove it to avoid test changes in follow-up
change which will lead to dead loads being removed.
This is specifically relevant for loops that vectorize using a scalable VF,
where the code results in:
%vscale = call i32 llvm.vscale.i32()
%vf.part1 = mul i32 %vscale, 4
%gep = getelementptr ..., i32 %vf.part1
Which InstCombine then changes into:
%vscale = call i32 llvm.vscale.i32()
%vf.part1 = mul i32 %vscale, 4
%vf.part1.zext = sext i32 %vf.part1 to i64
%gep = getelementptr ..., i32 %vf.part1.zext
D143016 tried to remove these extends, but that only works when
the call to llvm.vscale.i32() has a single use. After doing any
kind of CSE on these calls the combine no longer kicks in.
It seems more sensible to ask DataLayout what type to use, rather
than relying on InstCombine to insert the extend and hoping it can
fold it away.
I've only changed this for indices that are not constant, because
I vaguely remember there was a reason for sticking with i32. It
would also mean patching up loads more tests.
Reviewed By: paulwalker-arm
Differential Revision: https://reviews.llvm.org/D143267
Update a few tests to add users to loads to avoid them being optimized
out by future changes. In cases the unused loads didn't matter for the
test, remove them.
This artifact can appear from the vectorizer. (add X, -1) is the
backedge taken count. It gets zero extended and then 1 is added to
it to get the trip count.
There is usually a dominating branch that rules out X being zero.
Alive: https://alive2.llvm.org/ce/z/NsRDwX
This patch vectorizes Phi node loop reductions for select's whos condition
comes from a floating-point comparison, with its operands being integers
for Add, Sub, and Mul reductions.
Example:
int foo(float *x, int n) {
int sum = 0;
for (int i=0; i<n; ++i) {
float elem = x[i];
if (elem > 0) {
sum += 2;
}
}
return sum;
}
This would previously fail to vectorize due to the integer reduction.
This patch vectorizes Phi node loop reductions for select's whos condition
comes from a floating-point comparison, with its operands being integers
for Add, Sub, and Mul reductions.
Example:
int foo(float *x, int n) {
int sum = 0;
for (int i=0; i<n; ++i) {
float elem = x[i];
if (elem > 0) {
sum += 2;
}
}
return sum;
}
Differential Revision: https://reviews.llvm.org/D141842
Update sinkScalarOperands to consider all operands of recipes in
replicate blocks as sink candidates This enables additional sinking
opportunities and is another step towards retiring LLVM IR-based
sinkScalarOperands.
This enables iterative sinking of operands for successive calls of
sinkScalarOperands.
Depends on D139788.
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D139790
It enables trigonometry functions vectorization via SLEEF: http://sleef.org/.
- A new vectorization library enum is added to TargetLibraryInfo.h: SLEEF.
- A new option is added to TargetLibraryInfoImpl - ClVectorLibrary: SLEEF.
- A comprehensive test case is included in this changeset.
- A new vectorization library argument is added to -fveclib: -fveclib=SLEEF.
Trigonometry functions that are vectorized by sleef:
acos
asin
atan
atanh
cos
cosh
exp
exp2
exp10
lgamma
log10
log2
log
sin
sinh
sqrt
tan
tanh
tgamma
Co-authored-by: Stefan Teleman
Reviewed By: paulwalker-arm
Differential Revision: https://reviews.llvm.org/D134719
IR is now always parsed in opaque pointer mode, unless
-opaque-pointers=0 is explicitly given. There is no automatic
detection of typed pointers anymore.
The -opaque-pointers=0 option is added to any remaining IR tests
that haven't been migrated yet.
Differential Revision: https://reviews.llvm.org/D141912
Instcombine prefers this canonical form (see getPreferredVectorIndex),
as does IRBuilder when passing the index as an integer so we may as
well use the prefered form from creation.
NOTE: All test changes are mechanical with nothing else expected
beyond a change of index type from i32 to i64.
Differential Revision: https://reviews.llvm.org/D140983
LAI is cached during the LoopDistribute pass, and is later re-used during LoopVectorize. The problem is that LoopVectorize changes SCEV, and the cached LAI does not get updated. Hence, when re-using the cached LAI, it references an invalid SCEV.
Fixes#59319
Reviewed By: fhahn
Differential Revision: https://reviews.llvm.org/D139601
This is based on @frasercrmck's D107290. At least some of the clang
portion of D107290 has already been committed.
This uses vscale_range for min/max vector width unless the command
line overrides are used.
As a follow up, I plan to add a max or exact VLEN option to clang
to control the vscale_range. This will eliminate many of the reasons
for users to use the overrides through the -mllvm interface.
Reviewed By: reames
Differential Revision: https://reviews.llvm.org/D139873
This patch adds metadata to disable runtime unrolling to the vectorized
loop. If runtime unrolling/interleaving is considered profitable, LV
will interleave the loop directly. There should be no need to perform
runtime unrolling at a later stage.
Note that we already add metadata to disable runtime unrolling to the
scalar loop after vectorization.
The additional unrolling unnecessarily increases code size and compile
time. In addition to that we have several bug reports of unncessary
runtime unrolling for vectorized loops, e.g. PR40961
Compile-time improvements:
NewPM-O3: -1.04%
NewPM-ReleaseThinLTO: -0.59%
NewPM-ReleaseLTO-g: -0.97%
https://llvm-compile-time-tracker.com/compare.php?from=ce1be13a868d0f8afa367975558c1a6175cce33a&to=78bc2e67f22e9e10e61cdb6cdac4bb857d95eb1b&stat=instructions:uFixes#40306.
Reviewed By: lebedev.ri, nikic
Differential Revision: https://reviews.llvm.org/D115261
The validation of vplans could fail if an inloop reduction was created
with a block-in mask that did not dominate the reduction. This makes
sure that the insert point is set when creating the mask, to ensure it
dominates the reduction.
Differential Revision: https://reviews.llvm.org/D141003
This reverts commit aa2414729ebbcb2d8f162e9002a3a6aa768b1f9d.
Previously-valid IR from a tensorflow test case (as shown on the
Diffusion revision for aa2414729ebbcb2d8f162e9002a3a6aa768b1f9d) started
hanging in the loop-vectorize pass. Reverting to keep everyone working.
Check lines for some of these tests were regenerated. The difference
is that with opaque pointers SCEVExpander always emits i8 GEPs,
making the address calculation explicit. This is a known problem
that will be solved long term by making all address calculations
explicit.
Even if the the sink candidate is already in the target block, its
operands can be candidates for sinking. Queue them up as well. Also
moves the queuing logic to a helper.
Merging regions can enable new sinking opportunities (e.g. if users of a
scalar value are moved from different VPBBs into the same VPBB). Sinking
in turn can also enable new merging opportunities (e.g. if a recipe
between to merge-able regions is moved.
To enable more sinking opportunities, repeat sinking & merging if
regions could be merged.
Also fix mergeReplicateRegions to return the correct Changed status.
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D139788
Add and run VPlan transform to fold blocks with a single predecessor
into the predecessor. This remove redundant blocks and addresses a TODO
to replace special handling for the vector latch VPBB.
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D139927
Just comparing constant trip counts causes LV to miss cases where the
vector loop body only executes once.
The motivation for this is to remove the need for unrolling to remove
vector loop back-edges, if the body only executes once in more cases.
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D133017
This sets the stage for D133017 by moving out the code that performs
VPlan based simplifications to a separate transform that takes the
chosen VF & UF as arguments.
The main advantage is that this transform runs before any changes to
the CFG are being made. This allows using SCEV without worrying about
making queries while the IR is in an incomplete state.
Note that this patch switches the reasoning to use SCEV, but still only
simplifies loops with constant trip counts. Using SCEV here is needed to
access the backedge taken count, because the trip count IR value has not
been created yet.
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D135017
This mirrors a similar shufflevector transformation so the same
effect is obtained for scalable vectors. The transformation is
only performed when it can be proven the number of resulting
reversals is not increased. By bubbling the reversals from operand
to result this should typically be the case and ideally leads to
back-back shuffles that can be elimitated entirely.
Differential Revision: https://reviews.llvm.org/D139342
This mirrors a similar shufflevector transformation so the same
effect is obtained for scalable vectors. The transformation is
only performed when it can be proven the number of resulting
reversals is not increased. By bubbling the reversals from operand
to result this should typically be the case and ideally leads to
back-back shuffles that can be elimitated entirely.
Differential Revision: https://reviews.llvm.org/D139340
Code generation now uses the start VPValue of induction recipes.
This makes it possible to adjust the start value of the epilogue
vector loop to use the 'resume' value of the main vector loop.
Fixes#59459.
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D92132
During scalar promotion, if there are additional potentially-aliasing
loads outside the promoted set, we can still perform a load-only
promotion. As the stores are retained, any potentially-aliasing
loads will still read the correct value.
This increases the number of load promotions in llvm-test-suite by
a factor of two:
| Old | New
licm.NumPromotionCandidates | 4448 | 6038
licm.NumLoadPromoted | 479 | 1069
licm.NumLoadStorePromoted | 1459 | 1459
Unfortunately, this does have some impact on compile-time:
http://llvm-compile-time-tracker.com/compare.php?from=57f7f0d6cf0706a88e1ecb74f3d3e8891cceabfa&to=72b811738148aab399966a0435f13b695da1c1c8&stat=instructions
In part this is because we now have less early bailouts from
promotion, but also due to second order effects (e.g. for one case
I looked at we spend more time in SLP now).
Differential Revision: https://reviews.llvm.org/D133192
In scalar plans, replicate recipes will only generate a single value per
UF, independent of whether they are uniform or not. So don't consider
uniformity for plans with scalar VFs only.
This allows us to handle a few additional cases in VPlan sinking instead
of non-VPlan sinkScalarOperands.
Depends on D133762.
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D134218
The vectorizer has code to reject scalable vectorization of loops with very short trip counts, and instead use fixed length vectors. The current code doesn't account for the minimum vscale value known, and thus under estimates the number of lanes in the scalable type for RISCV's default configuration. This results in use of predication and a trivially dead loop where a single straight line piece of code would suffice.
Note that the code quality of the original scalable vectorization could (and probably should) be improved other ways as well. This patch is solely about whether the scalable vectorization was the right choice to begin with.
This bit of code - both with and without my change - does make the unchecked assumption that the target knows how to lower fixed length vectors whose length is provably less than the vector length.
Differential Revision: https://reviews.llvm.org/D137285