As the name of the function suggests, convertPointerToIntegerType should
return an IntegerType instead of a Type, and should only ever be called
with integer or ptr type. Fix the callers getWiderType, and
addInductionPhi to narrow the type of WidestIndTy to IntegerType,
stripping unclear casts. While at it, rename convertPointerToIntegerType
and getWiderType for clarity.
This patch adds initial support for vectorizing literal struct return
values. Currently, this is limited to the case where the struct is
homogeneous (all elements have the same type) and not packed. The users
of the call also must all be `extractvalue` instructions.
The intended use case for this is vectorizing intrinsics such as:
```
declare { float, float } @llvm.sincos.f32(float %x)
```
Mapping them to structure-returning library calls such as:
```
declare { <4 x float>, <4 x float> } @Sleef_sincosf4_u10advsimd(<4 x float>)
```
Or their widened form (such as `@llvm.sincos.v4f32` in this case).
Implementing this required two main changes:
1. Supporting widening `extractvalue`
2. Adding support for vectorized struct types in LV
* This is mostly limited to parts of the cost model and scalarization
Since the supported use case is narrow, the required changes are
relatively small.
I've removed the HasUncountableEarlyExit variable, since we can
already determine whether or not a loop has an early exit by seeing
if we found an uncountable exit.
I have also deleted the old UncountableExitingBlocks and
UncountableExitBlocks lists and replaced them with a single
uncountable edge. This means we don't need to worry about keeping the
list entries in sync and makes it clear which exiting block
corresponds to which exit block.
Currently we emit early-exit related debug messages/remarks even when
there is a single exit. Update to only check isVectorizableEarlyExitLoop
if there isn't a single exit block.
PR: https://github.com/llvm/llvm-project/pull/121994
This is a split-off from #109833 and only adds code relating to checking
if a struct-returning call can be vectorized.
This initial patch only allows the case where all users of the struct
return are `extractvalue` operations that can be widened.
```
%call = tail call { float, float } @foo(float %in_val)
%extract_a = extractvalue { float, float } %call, 0
%extract_b = extractvalue { float, float } %call, 1
```
Note: The tests require the VFABI changes from #119000 to pass.
- update `VectorUtils:isVectorIntrinsicWithScalarOpAtArg` to use TTI for
all uses, to allow specifiction of target specific intrinsics
- add TTI to the `isVectorIntrinsicWithStructReturnOverloadAtField` api
- update TTI api to provide `isTargetIntrinsicWith...` functions and
consistently name them
- move `isTriviallyScalarizable` to VectorUtils
- update all uses of the api and provide the TTI parameter
Resolves#117030
A more lightweight variant of
https://github.com/llvm/llvm-project/pull/109193,
which dispatches to multiple exit blocks via the middle blocks.
The patch also introduces a bit of required scaffolding to enable
early-exit vectorization, including an option. At the moment, early-exit
vectorization doesn't come with legality checks, and is only used if the
option is provided and the loop has metadata forcing vectorization. This
is only intended to be used for testing during bring-up, with @david-arm
enabling auto early-exit vectorization plugging in the changes from
https://github.com/llvm/llvm-project/pull/88385.
PR: https://github.com/llvm/llvm-project/pull/112138
* Rename Speculative -> Uncountable and update tests.
* Add comments explaining why it's safe to ignore the predicates when
building up a list of exiting blocks.
* Reshuffle some code to do (hopefully) cheaper checks first.
Currently if a loop contains loads that we can prove at compile time
are dereferenceable when certain conditions are satisfied the function
isDereferenceableAndAlignedInLoop will still return false because
getSmallConstantMaxTripCount will return 0 when SCEV predicates
are required. This patch changes getSmallConstantMaxTripCount to take
an optional Predicates pointer argument so that we can permit
functions such as isDereferenceableAndAlignedInLoop to consider more
cases.
This patch is split off from PR #88385 and concerns only the code
related to the legality of vectorising early exit loops. It is the first
step in adding support for vectorisation of a simple class of loops that
typically involves searching for something, i.e.
for (int i = 0; i < n; i++) {
if (p[i] == val)
return i;
}
return n;
or
for (int i = 0; i < n; i++) {
if (p1[i] != p2[i])
return i;
}
return n;
In this initial commit LoopVectorizationLegality will only consider
early exit loops legal for vectorising if they follow these criteria:
1. There are no stores in the loop.
2. The loop must have only one early exit like those shown in the above
example. I have referred to such exits as speculative early exits, to
distinguish from existing support for early exits where the
exit-not-taken count is known exactly at compile time.
3. The early exit block dominates the latch block.
4. The latch block must have an exact exit count.
5. There are no loads after the early exit block.
6. The loop must not contain reductions or recurrences. I don't see
anything fundamental blocking vectorisation of such loops, but I just
haven't done the work to support them yet.
7. We must be able to prove at compile-time that loops will not contain
faulting loads.
Tests have been added here:
Transforms/LoopVectorize/AArch64/simple_early_exit.ll
The code makes assumptions later on the operations and their inputs
being scalar in the loops that are processed, so we should make sure
this is the case in the legalizer.
Successful vectorization message is emitted even
after "Result" is false. "Result" = false indicates failure of one of
the legality check and thus
successful message should not be printed.
Update createEdgeMask to created masks where the terminator in Src is a
switch. We need to handle 2 separate cases:
1. Dst is not the default desintation. Dst is reached if any of the
cases with destination == Dst are taken. Join the conditions for each
case where destination == Dst using a logical OR.
2. Dst is the default destination. Dst is reached if none of the cases
with destination != Dst are taken. Join the conditions for each case
where the destination is != Dst using a logical OR and negate it.
Edge masks are created for every destination of cases and/or
default when requesting a mask where the source is a switch.
Fixes https://github.com/llvm/llvm-project/issues/48188.
PR: https://github.com/llvm/llvm-project/pull/99808
This patch implements limited loop vectorization support for the 'all-in-one' histogram intrinsic. The feature is disabled by default, and when enabled will only vectorize if there are no other users of values in the gather-modify-scatter sequence.
Introduce new canFoldTail helper which only checks if tail-folding is
possible, but without modifying MaskedOps.
Just because tail-folding is possible doesn't mean the tail will be
folded; that's up to the cost-model to decide. Separating the check if
tail-folding is possible and preparing for tail-folding makes sure that
MaskedOps is only populated when tail-folding is actually selected.
PR: https://github.com/llvm/llvm-project/pull/77612
This is a helper to avoid writing `getModule()->getDataLayout()`. I
regularly try to use this method only to remember it doesn't exist...
`getModule()->getDataLayout()` is also a common (the most common?)
reason why code has to include the Module.h header.
Update LAA to use PSE::getSymbolicMaxBackedgeTakenCount which returns
the minimum of the countable exits.
When analyzing dependences and computing runtime checks, we need the
smallest upper bound on the number of iterations. In terms of memory
safety, it shouldn't matter if any uncomputable exits leave the loop,
as long as we prove that there are no dependences given the minimum of
the countable exits. The same should apply also for generating runtime
checks.
Note that this shifts the responsiblity of checking whether all exit
counts are computable or handling early-exits to the users of LAA.
Depends on https://github.com/llvm/llvm-project/pull/93498
PR: https://github.com/llvm/llvm-project/pull/93499
Code checking stores to invariant addresses and reductions made an
incorrect assumption that the case of both a load & store to the same
invariant address does not need to be handled.
In some cases when vectorizing with runtime checks, there may be
dependences with a load and store to the same address, storing a
reduction value.
Update LAA to separately track if there was a store-store and a
load-store dependence with an invariant addresses.
Bail out early if there as a load-store dependence with invariant
address. If there was a store-store one, still apply the logic checking
if they all store a reduction.
If there are function calls in the candidate loop and we have vectorized
variants available, try some wider VFs in case the conservative initial
maximum based on the widest types in the loop won't actually allow us
to make use of those function variants.
Replace ConditionalAssume set by treating conditional assumes like other
predicated instructions (i.e. create a VPReplicateRecipe with a mask)
and later remove any assume recipes with masks during VPlan cleanup.
This reduces coupling of VPlan construction and Legal by removing a
shared set between the 2 and results in a cleaner code structure
overall.
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D157034
After 572cfa3fde5433, isUniform now checks VF based uniformity instead of
just invariance as before.
As follow-up cleanup suggested in D148841, separate the invariance check
out and update callers that currently check only for invariance.
This also moves the implementation of isUniform from LoopAccessAnalysis
to LoopVectorizationLegality, as LoopAccesAnalysis doesn't use the more
general isUniform.
This patch uses SCEV to check if a value is uniform across a given VF.
The basic idea is to construct SCEVs where the AddRecs of the loop are
adjusted to reflect the version in the vectorized loop (Step multiplied
by VF). We construct a SCEV for the value of the vector lane 0
(offset 0) compare it to the expressions for lanes 1 to the last vector
lane (VF - 1). If they are equal, consider the expression uniform.
While re-writing expressions, we also need to catch expressions we
cannot determine uniformity (e.g. SCEVUnknown).
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D148841
The original commit wasn't quite NFC, and this was caught by an arguably overly strong assert. Specifically, I'd failed to strip off the integer cast off the SCEV before saving it in the map. The result - other than a failed assert - is that we'd speculate on the casted unknown, not the unknown. The only case I can think of where that might change behavior would be a sext(i1 load). I doubt that case is interesting in practice, but it's good to be strictly NFC on this change regardless.
Original commit message follows..
The existing code makes it hard to tell that collectStridedAccess is really about identifying some loop invariant SCEV which is *profitable* to speculate is equal to one. The odd dual usage structure of Value and SCEV confuses this point.
We could choose to loosen the profitability analysis if desired. I'm not proposing doing so at this time as it exposes too many cases where the speculation is unprofitable.
Differential Revision: https://reviews.llvm.org/D147750
This reverts commit d5b840131223f2ffef4e48ca769ad1eb7bb1869a. Running this through broader testing after rebasing is revealing a crash. Reverting while I investigate.
The existing code makes it hard to tell that collectStridedAccess is really about identifying some loop invariant SCEV which is *profitable* to speculate is equal to one. The odd dual usage structure of Value and SCEV confuses this point.
We could choose to loosen the profitability analysis if desired. I'm not proposing doing so at this time as it exposes too many cases where the speculation is unprofitable.
Differential Revision: https://reviews.llvm.org/D147750
This reverts the revert commit 3d8ed8b5192a59104bfbd5bf7ac84d035ee0a4a5.
The new version of the patch adds a set to avoid duplicating work in
isFixedOrderRecurrence, which was previously done through the removed
SinkAfter map.
Original commit message:
Building on D142885 and D142589, retire the SinkAfter map from the
recurrence handling code. It is replaced by checking whether it is
possible to sink all users of a recurrence directly in VPlan. This
results in simpler code overall and allows to handle additional cases
(see the improvements in @test_crash).
Depends on D142885.
Depends on D142589.
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D142886
Building on D142885 and D142589, retire the SinkAfter map from the
recurrence handling code. It is replaced by checking whether it is
possible to sink all users of a recurrence directly in VPlan. This
results in simpler code overall and allows to handle additional cases
(see the improvements in @test_crash).
Depends on D142885.
Depends on D142589.
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D142886
(JFYI - This has been heavily reframed since original attempt at landing.)
This change updates the InductionDescriptor logic to allow matching a pointer IV with a non-constant stride, but also updates the LoopVectorizer to bailout on such descriptors by default. This preserves the default vectorizer behavior.
In review, it was pointed out that there's multiple unfortunate performance implications which need to be addressed before this can be enabled. Having a flag allows us to exercise the behavior, and write test cases for logic which is otherwise unreachable (or hard to reach).
This will also enable non-constant stride pointer recurrences for other consumers. I've audited said code, and don't see any obvious issues.
Differential Revision: https://reviews.llvm.org/D147336
LLVM has the ability to vectorize using function variants that require
a mask by creating an all-true mask, and to vectorize a conditional
call via scalarization, now we want to join the two parts together
and use a masked variant when a mask is required.
Reviewed By: david-arm
Differential Revision: https://reviews.llvm.org/D136251
This patch adds support for scalarizing calls to a function when
there is a vector variant that cannot be used, either because there
isn't a masked variant or because the cost model indicated a VF
without a masked variant was better.
Reviewed By: paulwalker-arm
Differential Revision: https://reviews.llvm.org/D134422
Use LoopAccessInfoManager directly instead of various GetLAA lambdas.
Depends on D134608.
Reviewed By: aeubanks
Differential Revision: https://reviews.llvm.org/D134609
Fixes#57572
Generally LICM pass is responsible for sinking out code that calculates
invariant address inside loop as it only needed to be calculated once.
But in rare case it does not happen we will not be vectorizing the
loop.
Differential Revision: https://reviews.llvm.org/D133687
This is purely NFC restructure in advance of a change which actually exposes zero strides. This is mostly because I find this interface confusing each time I look at it.