This patch prepares the ground for #76060.
* Unifies ArmPL and SLEEF tests for better coverage
* Replaces deprecated float* and double* types with ptr
* Adds noalias attribute to pointer arguments
* Adds some cmd-line options to the RUN lines to simplify output
* Removes datalayout since target triple is provided
* Removes checks for return statements
* Refactors the regex filter for autogenerated checks
* Removes redundant test file suffix (already under the AArch64 dir)
Added more pre-commit tests for evaluating changes to loop interleaving count computation in (https://github.com/llvm/llvm-project/pull/73766). The new set of tests address the change in IC computation to minimize the remainder TC of the vectorized loop while maximizing the IC when the
remainder TC is the same.
This patch starts initial modeling of VF * UF in VPlan.
Initially, introduce a dedicated VFxUF VPValue, which is then
populated during VPlan::prepareToExecute. Initially, the VF * UF
applies only to the main vector loop region. Once we extend the
scope of VPlan in the future, we may want to associate different VFxUFs
with different vector loop regions (e.g. the epilogue vector loop)
This allows explicitly parameterizing recipes that rely on the
VF * UF, like the canonical induction increment. At the moment, this
mainly helps to avoid generating some duplicated calls to vscale with
scalable vectors. It should also allow using EVL as induction increments
explicitly in D99750. Referring to VF * UF is also needed in other
places that we plan to migrate to VPlan, like the minimum trip count
check during skeleton creation.
The first version creates the value for VF * UF directly in
prepareToExecute to limit the scope of the patch. A follow-on patch will
model VF * UF computation explicitly in VPlan using recipes.
Moved from Phabricator (https://reviews.llvm.org/D157322)
This adds support for using dominating conditions in computeKnownBits()
when called from InstCombine. The implementation uses a
DomConditionCache, which stores which branches may provide information
that is relevant for a given value.
DomConditionCache is similar to AssumptionCache, but does not try to do
any kind of automatic tracking. Relevant branches have to be explicitly
registered and invalidated values explicitly removed. The necessary
tracking is done inside InstCombine.
The reason why this doesn't just do exactly the same thing as
AssumptionCache is that a lot more transforms touch branches and branch
conditions than assumptions. AssumptionCache is an immutable analysis
and mostly gets away with this because only a handful of places have to
register additional assumptions (mostly as a result of cloning). This is
very much not the case for branches.
This change regresses compile-time by about ~0.2%. It also improves
stage2-O0-g builds by about ~0.2%, which indicates that this change results
in additional optimizations inside clang itself.
Fixes https://github.com/llvm/llvm-project/issues/74242.
A new disjoint flag was added for OR instructions in #72583.
Update VPRecipeWithIRFlags to also support the new flag. This
allows printing and preserving the disjoint flag in vectorized code.
These tests rely on SCEV looking recognizing an "or" with no common
bits as an "add". Add the disjoint flag to relevant or instructions
in preparation for switching SCEV to use the flag instead of the
ValueTracking query. The IR with disjoint flag matches what
InstCombine would produce.
MinBWs contains entries that specify the minimum required bitwidth. In
some cases, the old and new bitwidths can be equal (see test case) and
in those cases no truncations are needed, so skip those cases.
Fixes#74307.
Clean up sve-tail-folding-option.ll by removing the unused
CHECK-TF-NEOVERSE-V1 prefix (note the use of non-opaque pointers) and
remove IR value references.
In some cases MinBWs may contain entries for live-ins that are not used
by VPWidenRecipe or VPWidenSelectRecipes. In those cases, the live-ins
won't get processed, so make sure we include them in the count when used
as operands in VPWidenCast and VPWidenSelectRecipe.
Fixes https://github.com/llvm/llvm-project/issues/74231
The disjoint flag was recently added to IR in #72583
We already set it when we turn an add into an or. This patch sets it on Ors that weren't converted from an Add.
This patch replaces the IR based truncateToMinimalBitwidths with a VPlan
version. This has 3 benefits:
1) the VPlan-based version is simpler; we don't need to implement
special codegen for each supported instruction type like the IR based
one.
2) Removes a dependency on the cost-model after VPlan execution and
3) Removes a use of getVPValue that uses underlying values after VPlan
execution (See removed FIXME).
Depends on D149081.
Depends on D149079.
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D149903
Auto-generate test `armpl-intrinsics.ll` and simplify tests:
- Eliminate scalar tail with no tail-folding flag.
- Use active lane mask for shorter check lines (no long `shufflevectors`).
- Eliminate scalar loops by providing `noalias` to relevant arguments and
run `simplifycfg` to drop them.
- Update script now use `@llvm.compiler.used` instead of a longer regex.
Update regex to _explicitly_ show which exp versions are added.
The previous regex used `exp[^e]` to avoid matching calls like:
`@llvm.experimental.stepvector`.
Note: ArmPL Mappings for scalable types are not yet utilized
(eg, `llvm.exp10.nxv2f64`, `llvm.exp10.nxv4f32`), as `replace-with-veclib`
pass needs improvements.
We can determine the VF from a combination of the mangled name (which
indicates the arguments that take vectors) and the element sizes of
the arguments for the scalar function the mapping has been established
for.
The assert when demangling fails has been removed in favour of just
not adding the mapping, which prevents the crash seen in
https://github.com/llvm/llvm-project/issues/71892
This patch also stops using _LLVM_ as an ISA for scalable vector tests,
since there aren't defined rules for the way vector arguments should be
handled (e.g. packed vs. unpacked representation).
Instead of expanding the src/sink SCEV expressions and emitting an IR
sub to compute the difference, the subtraction can be directly be
performed by ScalarEvolution. This allows the subtraction to be
simplified by SCEV, which in turn can reduced the number of redundant
runtime check instructions generated.
It also allows to generate checks that are invariant w.r.t. an outer
loop, if he inner loop AddRecs have the same outer loop AddRec as start.
Reverse mask early on when populating BlockInMask. This will enable
separating mask management and address computation from the memory
recipes in the future and is also needed to enable explicit unrolling in
VPlan.
If there are function calls in the candidate loop and we have vectorized
variants available, try some wider VFs in case the conservative initial
maximum based on the widest types in the loop won't actually allow us
to make use of those function variants.
Builds on #67982 which recently introduced the nneg flag on a zext
instruction. InstCombine is one of our largest canonicalizers of zext
from non-negative sext instructions, so set the flag there.
* Avoid using `CM_ScalarEpilogueNotAllowedLowTripLoop` for loops known
to be predicate tail-folded, delegating to `areRuntimeChecksProfitable`
to decide on the profitability of vectorizing loops with runtime checks.
* Update the `areRuntimeChecksProfitable` function to consider the
`ScalarEpilogueLowering` setting when assessing vectorization of a loop.
With this patch, we can make more informed decisions for loops with low
trip counts, especially when leveraging Profile-Guided Optimization
(PGO) data.
There are many tests that specify a target triple/CPU flags but no
DataLayout which can lead to IR being generated that has unusual
behaviour. This commit attempts to use the default DataLayout based
on the relevant flags if there is no explicit override on the command
line or in the IR file.
One thing that is not currently possible to differentiate from a missing
datalayout `target datalayout = ""` in the IR file since the current
APIs don't allow detecting this case. If it is considered useful to
support this case (instead of passing "-data-layout=" on the command
line), I can change IR parsers to track whether they have seen such a
directive and change the callback type.
Differential Revision: https://reviews.llvm.org/D141060
This patch enables scalable vectors in the VPlan-native path.
If a vectorization factor is specified via loop vectorization hints,
that factor is used. If no vectorization factor is specified, but the
target preferes scalable vectorization, a scalable vectorization factor
is selected.
Reviewed By: fhahn
Differential Revision: https://reviews.llvm.org/D157484
The tests introduced by https://reviews.llvm.org/D134719 and later
modified in https://reviews.llvm.org/D146839 are not testing LV in
isolation. This patch:
1. Assures that all tests test LV in isolation.
2. Adds LV tests using llvm intrinsics that have libm mappings.
llrint, llround and lrint are not included as currently IR verifier pass
does not allow to use vector types with them.
This patch is based off of
https://github.com/llvm/llvm-project/pull/67543.
We are currently using the exact trip count to make decisions regarding
the maximum VF. We can instead use the upper bound TC, which will be the
same as the constant trip count when that is known.
Since the getMaximisedVFForTarget function is called twice, once for fixed-width and once for scalable, it adds no value to always return a fixed-width VF. Instead, when we are tail-folding, we can use either fixed-width or scalable vectors.
Currently the mappings from TLI are used to generate the list of
available "scalar to vector" mappings attached to scalar calls as
"vector-function-abi-variant" LLVM IR attribute. Function names from TLI
are wrapped in mangled name following the pattern:
_ZGV<isa><mask><vlen><parameters>_<scalar_name>[(<vector_redirection>)]
The problem is the mangled name uses _LLVM_ as the ISA name which
prevents the compiler to compute vectorization factor for scalable
vectors as it cannot make any decision based on the _LLVM_ ISA. If we
use "s" as the ISA name, the compiler can make decisions based on VFABI
specification where SVE spacific rules are described.
This patch is only a refactoring stage where there is no change to the
compiler's behaviour.
This patch updates the mask creation code to always create compares of
the form (ICMP_ULE, wide canonical IV, backedge-taken-count) up front
when tail folding and introduce active-lane-mask as later
transformation.
This effectively makes (ICMP_ULE, wide canonical IV, backedge-taken-count)
the canonical form for tail-folding early on. Introducing more specific
active-lane-mask recipes is treated as a VPlan-to-VPlan optimization.
This has the advantage of keeping the logic (and complexity) of
introducing active-lane-mask recipes in a single place, instead of
spreading the logic out across multiple functions. It also simplifies
initial VPlan construction and enables treating introducing EVL as
similar optimization.
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D158779
Split off from D150398 to avoid builder-related diff changes there.
Using IRBuilder to create ICmps simplifies the result if both operands
are constants.
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D158332
When SVE2 is enabled, we can combine an add of 1, add & shift right by 1
to a single s/urhadd instruction. If the operands to the adds are extended,
these extends will fold into the s/urhadd and their costs should be 0.
Reviewed By: dtemirbulatov
Differential Revision: https://reviews.llvm.org/D157628
Model wrap flags directly using VPRecipeWithIRFlags and clean up the
duplicated *NUW opcodes.
D157144 will build on this and also model FMFs for VPInstruction.
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D157194
Use the printOperands for printing VPInstruction's operands to be more
in line with other recipes and ensure consistent printing after D15719.
Also removes some stray spaces in print output.
This reverts commit 245ec675a4e41f7ec24dfc998720bffdc46a6c53.
Recommits eea9258648ce with a fix to only erase the instruction from the
first part if it is defined outside the loop. This fixes a
use-after-free error reported.