This patch replaces the tight hard cut-off for the number of runtime
checks with a more accurate cost-driven approach.
The new approach allows vectorization with a larger number of runtime
checks in general, but only executes the vector loop (and runtime checks) if
considered profitable at runtime. Profitable here means that the cost-model
indicates that the runtime check cost + vector loop cost < scalar loop cost.
To do that, LV computes the minimum trip count for which runtime check cost
+ vector-loop-cost < scalar loop cost.
Note that there is still a hard cut-off to avoid excessive compile-time/code-size
increases, but it is much larger than the original limit.
The performance impact on standard test-suites like SPEC2006/SPEC2006/MultiSource
is mostly neutral, but the new approach can give substantial gains in cases where
we failed to vectorize before due to the over-aggressive cut-offs.
On AArch64 with -O3, I didn't observe any regressions outside the noise level (<0.4%)
and there are the following execution time improvements. Both `IRSmk` and `srad` are relatively short running, but the changes are far above the noise level for them on my benchmark system.
```
CFP2006/447.dealII/447.dealII -1.9%
CINT2017rate/525.x264_r/525.x264_r -2.2%
ASC_Sequoia/IRSmk/IRSmk -9.2%
Rodinia/srad/srad -36.1%
```
`size` regressions on AArch64 with -O3 are
```
MultiSource/Applications/hbd/hbd 90256.00 106768.00 18.3%
MultiSourc...ks/ASCI_Purple/SMG2000/smg2000 240676.00 257268.00 6.9%
MultiSourc...enchmarks/mafft/pairlocalalign 472603.00 489131.00 3.5%
External/S...2017rate/525.x264_r/525.x264_r 613831.00 630343.00 2.7%
External/S...NT2006/464.h264ref/464.h264ref 818920.00 835448.00 2.0%
External/S...te/538.imagick_r/538.imagick_r 1994730.00 2027754.00 1.7%
MultiSourc...nchmarks/tramp3d-v4/tramp3d-v4 1236471.00 1253015.00 1.3%
MultiSource/Applications/oggenc/oggenc 2108147.00 2124675.00 0.8%
External/S.../CFP2006/447.dealII/447.dealII 4742999.00 4759559.00 0.3%
External/S...rate/510.parest_r/510.parest_r 14206377.00 14239433.00 0.2%
```
Reviewed By: lebedev.ri, ebrevnov, dmgreen
Differential Revision: https://reviews.llvm.org/D109368
This patch marks the induction increment of the main induction variable
of the vector loop as NUW when not folding the tail.
If the tail is not folded, we know that End - Start >= Step (either
statically or through the minimum iteration checks). We also know that both
Start % Step == 0 and End % Step == 0. We exit the vector loop if %IV +
%Step == %End. Hence we must exit the loop before %IV + %Step unsigned
overflows and we can mark the induction increment as NUW.
This should make SCEV return more precise bounds for the created vector
loops, used by later optimizations, like late unrolling.
At the moment quite a few tests still need to be updated, but before
doing so I'd like to get initial feedback to make sure I am not missing
anything.
Note that this could probably be further improved by using information
from the original IV.
Attempt of modeling of the assumption in Alive2:
https://alive2.llvm.org/ce/z/H_DL_g
Part of a set of fixes required for PR50412.
Reviewed By: mkazantsev
Differential Revision: https://reviews.llvm.org/D103255
These intrinsics, not the icmp+select are the canonical form nowadays,
so we might as well directly emit them.
This should not cause any regressions, but if it does,
then then they would needed to be fixed regardless.
Note that this doesn't deal with `SCEVExpander::isHighCostExpansion()`,
but that is a pessimization, not a correctness issue.
Additionally, the non-intrinsic form has issues with undef,
see https://reviews.llvm.org/D88287#2587863
Currently LAA uses getScalarSizeInBits to compute the size of an element
when computing the end bound of an access.
This does not work as expected for pointers to pointers, because
getScalarSizeInBits will return 0 for pointer types.
By using DataLayout to get the size of the element we can also correctly
handle pointer element types.
Note the changes to the existing test, which seems to also use the wrong
offset for the end.
Fixes PR47751.
Reviewed By: anemet
Differential Revision: https://reviews.llvm.org/D88953
As it's causing some bot failures (and per request from kbarton).
This reverts commit r358543/ab70da07286e618016e78247e4a24fcb84077fda.
llvm-svn: 358546
This patch aims to match the changes introduced in gcc by
https://gcc.gnu.org/ml/gcc-cvs/2018-04/msg00534.html. The
IBT feature definition is removed, with the IBT instructions
being freely available on all X86 targets. The shadow stack
instructions are also being made freely available, and the
use of all these CET instructions is controlled by the module
flags derived from the -fcf-protection clang option. The hasSHSTK
option remains since clang uses it to determine availability of
shadow stack instruction intrinsics, but it is no longer directly used.
Comes with a clang patch (D46881).
Patch by mike.dvoretsky
Differential Revision: https://reviews.llvm.org/D46882
llvm-svn: 332705
Summary:
This fixes a build break with r331269.
test/Transforms/LoopVectorize/pr23997.ll
should be in:
test/Transforms/LoopVectorize/X86/pr23997.ll
llvm-svn: 331281