We have guarantees that induction variable will not overflow in the main
loop after the loop constrained. Therefore we can add no wrap flags on
its base in order not to miss info that loop is countable.
Add NSW flag now, since adding NUW flag requires a bit more complicated
analysis.
Reviewed By: skatkov
Differential Revision: https://reviews.llvm.org/D154954
This patch is separated from D154953 to see what tests are affected by this
change alone according comment.
Depend on the related updating of LangRef on D155193.
Reviewed By: paulwalker-arm, nikic, david-arm
Differential Revision: https://reviews.llvm.org/D155350
This reverts commit 36a6eb7d12a9f827bf3d5d4e5fdc68b8a62807b2.
[MemCpyOpt] check that load/store and dest/src alloca are all in the same bb
Differential Revision: https://reviews.llvm.org/D153453
Co-authored-by: serge-sans-paille <sguelton@mozilla.com>
According the discuss on D154953, we need to make the LangRef change
before the optimization relied on the new behaviour:
vscale_range implies vscale is a power-of-two value, parse of the
attribute to reject values that are not a power-of-two.
Thanks nikic for the wonderful summary of discussing on D154953:
To provide a bit more context here. We would like to have power of two vscale exposed in a target-independent way, so we can make use of this in places like ValueTracking, just like we currently do the vscale range. Some options that have been discussed are:
- Remove support for non-power-of-two vscales entirely. (This is my personal preference, but this is hard to undo if it turns out someone does need them.)
- Add an extra attribute vscale_pow2, or a data layout property.
- Make vscale_range imply power-of-two vscale, as a compromise solution (what this patch does). This would be relatively easy to turn into one of the two above at a later point.
Reviewed By: paulwalker-arm, nikic, efriedma
Differential Revision: https://reviews.llvm.org/D155193
This showed up when we started to deduce readnone for the argument of
__kmpc_global_thread_num. The known attributes for "getters" did not
allow to read arguments, but that is sometimes the case.
If the function is non-IPO amendable we do skip most attributes/AAs.
However, if an AA has a isImpliedByIR that can deduce the attribute from
other attributes, we can run those. For now, we manually enable them,
if we have more later we can use some automation/flag.
This is very likely the cause of a stage 2 failure in
Transforms/LoopVectorize/check-prof-info.ll. Revert until I can
investigate this.
This reverts commit 3d199d086e076f0b9b90d4c59f2226a417a639b5.
Support replacement of operands not only in the immediate
instruction, but also instructions it uses.
To the most part, this extension is straightforward, but there are
two bits worth highlighting:
First, we can now no longer assume that if the Op is a vector, the
instruction also returns a vector. If Op is a vector and the
instruction returns a scalar, we should consider it as a cross-lane
operation.
Second, for the x ^ x special case, we can no longer assume that
the operand is RepOp, as we might have a replacement higher up the
instruction chain.
There is one optimization regression, but it is in a fuzzer-generated
test case.
Fixes https://github.com/llvm/llvm-project/issues/63104.
match the size of base node (PR63668).
Need to adjust the check for assert and take into account case where the
original scalars are reused and were extended to match the vector factor
of the reused SLP node.
After the semantics change from https://reviews.llvm.org/D154051,
gep inbounds x, 0 can no longer produce poison. As such, we can
also perform this fold during non-refining operand replacement
and avoid unnecessary drops of the inbounds flag.
The online alive2 version has not been update to the new
semantics yet, but we can use the following proof locally:
define ptr @src(ptr %base, i64 %offset) {
%cmp = icmp eq i64 %offset, 0
%gep = getelementptr inbounds i8, ptr %base, i64 %offset
%sel = select i1 %cmp, ptr %base, ptr %gep
ret ptr %sel
}
define ptr @tgt(ptr %base, i64 %offset) {
%gep = getelementptr inbounds i8, ptr %base, i64 %offset
ret ptr %gep
}
match the size of base node (PR63668).
Need to adjust the check for assert and take into account case where the
original scalars are reused and were extended to match the vector factor
of the reused SLP node.
cttz(-a & a) is the same as cttz(a). -a & a is an idiom to extract
the lowest set bit, which naturally does not affect the number of
trailing zeroes.
Proof: https://alive2.llvm.org/ce/z/Yp26x7
D149348 did this for readnone calls, which are handled by SimpleValue.
This patch does the same for all other CSEable calls, which are handled
by CallValue.
Differential Revision: https://reviews.llvm.org/D153151
The select arm that takes the ctlz result can also instead be a
constant with the bit width (as this is what the ctlz evaluates to
for a==0).
This avoids a regression when strengthening the
simplifyWithOpReplaced() fold.
Proof: https://alive2.llvm.org/ce/z/DMRL5A
In CollectLoopInvariantFixupsAndFormulae(), LSR looks at users
outside the loop. E.g. if we have an addrec based on %base, and
%base is also used outside the loop, then we have to keep it in a
register anyway, which may make it more profitable to use
%base + %idx style addressing.
This reasoning doesn't hold up when the base is a constant, because
the constant can be rematerialized. The lsr-memcpy.ll test regressed
when enabling opaque pointers, because inttoptr (i64 6442450944 to ptr)
now also has a use outside the loop (previously it didn't due to a
pointer type difference), and that extra "use" results in worse use
of addressing modes in the loop. However, the use outside the loop
actually gets rematerialized, so the alleged register saving does
not occur.
The same reasoning also applies to other types of constants, such
as global variable references.
Differential Revision: https://reviews.llvm.org/D155073
This comes up when adding two `bool` types in C/C++
```
bool foo(bool a, bool b) {
return a + b;
}
...
->
define i1 @foo(i1 %a, i1 %b) {
%conv = zext i1 %a to i32
%conv3.neg = sext i1 %b to i32
%tobool4 = icmp ne i32 %conv, %conv3.neg
ret i1 %tobool4
}
```
Proof: https://alive2.llvm.org/ce/z/HffWAN
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D154574
This patch adds initial support for the `AAAddressSpace` abstract
attributor interface to deduce and query address space information for a
pointer. We simply query the underlying objects that a pointer can point
to and find a common address space if they exist. This is the minimal
support for the interface, we currently manifest changes on loads and
stores. Additionally we should use the target transform information to
deduce if an address space transformation is a no-op for the target
machine when calculating compatibility.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D120586
replaceCongruentIVs analysis is based on ScalarEvolution; this makes
comparing different PHIs and performing the replacement straightforward.
However, it can have some side-effects: it isn't aware whether an
induction variable is in canonical form, so it can perform replacements
which obscure the meaning of the IR.
In test22 in widen-loop-comp.ll, the resulting loop can't be analyzed by
ScalarEvolution at all.
My attempted solution is to restrict the transform: don't try to replace
induction variables using PHI nodes that don't represent simple
induction variables.
I'm not sure if this is the best solution; suggestions welcome.
Differential Revision: https://reviews.llvm.org/D121950
This patch adds support for vectorized reduction of maximum/minimum
intrinsics which are under the appropriate reduction kind.
Differential Revision: https://reviews.llvm.org/D154463
minimum/maximum tests from D154463. This contains tests where we vectorize
minimum/maximum as well as the tests where we currently do not identify
reduction patterns.
Differential Revision: https://reviews.llvm.org/D155096
Improve computeKnownFPClass select handling to cover the case where
the condition performs a class test. This allows us to recognize
no-nans in cases like:
%not.nan = fcmp ord float %x, 0.0
%select = select i1 %not.nan, float %x, float 0.0
Math library code has similar edge case filtering on the inputs and
final results.
https://reviews.llvm.org/D153089
Prepare to handle class clamping patterns. Working around some kind of
select special casing bug in attributor where computeKnownFPClass is
never called on select.
Arm Performance Libraries contain math library which provides
vectorized versions of common math functions.
This patch allows to use it with clang and llvm via -fveclib=ArmPL or
-vector-library=ArmPL, so loops with such calls can be vectorized.
The executable needs to be linked with the amath library.
Arm Performance Libraries are available at:
https://developer.arm.com/Tools%20and%20Software/Arm%20Performance%20Libraries
Reviewed by: paulwalker-arm
Differential Revision: https://reviews.llvm.org/D154508
This reverts commit 593797ab9bedca6e9b0b7a9ed0589cf76023ab00.
I didn't realize that there was already a fix for the broken tests fd2254b7358d0f78a79784688bd8012c1a52b9cf.