255 Commits

Author SHA1 Message Date
Florian Hahn
c7a44ec031
[VPlan] Check successors in VPlan to check if scalar epi required (NFC)
Now that the branches to the scalar epilogue are modeled in VPlan
directly, check the VPlan to see if a scalar epilogue is required.

Preparation for https://github.com/llvm/llvm-project/pull/100658.
2024-08-12 15:33:52 +01:00
Craig Topper
59728193a6
[RISCV] Disable fixed length vectors with Zve32* without Zvl64b. (#102405)
Fixed length vectors use scalable vector containers. With Zve32* and not
Zvl64b, vscale is a 0.5 due RVVBitsPerBlock being 64.

To support this correctly we need to lower RVVBitsPerBlock to 32 and
change our type mapping. But we need to RVVBitsPerBlock to alway be
>= ELEN.  This means we need two different mapping depending on ELEN.

That is a non-trivial amount of work so disable fixed lenght vectors
without Zvl64b for now.

We had almost no tests for Zve32x without Zvl64b which is probably why
we never realized that it was broken.

Fixes #102352.
2024-08-08 09:17:43 -07:00
Philip Reames
d3fd28a134
[RISCV][TTI] Properly model odd vector sized LD/ST operations (#100436)
The motivation for this change is the costing of a LD or ST with nearly
power of 2 vectors (e.g. <3 x i32> or <7 x i32>) on V. There's an
experimental option in SLP to allow emitting these if the cost model
says they're profitable. This really helps with e.g. RGB vectors.

Our actual lowering for these depends on whether a wider container type
is known available. If so, we use a vle or vse on the wider type with a
restricted VL. If not, we split until a legal type is found, and then
apply the vle/vse on the sub-pieces.

This change is intentionally restricted to only the case where promotion
(widening w/VL predication) is involved. We appear to have at least one
bug in our splitting lowering (see discussion on review), and to avoid
exposing this more widely, I chose to not adjust costs for the splitting
case. The current splitting costing assumes scalarization (which is not
true of the actual lowering), but that has the effect of biasing
vectorization away from such cases strongly.

For the widening case, the true cost scales with the next largest legal
type. The default implementation assumes that such a type is scalarized.
Changing that brings our cost in line with our actual lowering decision.
Note that since scalarization is not possible for scalable types, the
prior costing falsely returned Invalid for that case.
2024-07-26 12:52:20 -07:00
Florian Hahn
67a55e01e3
[VPlan] Replace getBestPlan by getBestVF use also for epilogue vec. (#98821)
Replace getBestPlan by getBestVF which simply finds the best
VF out of the VFs for the available VPlans.

Then use getBestPlan to retrieve the corresponding VPlan.

This allows using getBestVF & getBestPlan for epilogue vectorization
as well. As the same plan may be used to vectorize both the main
and epilogue loop, restricting the VF of the best plan would cause
issues.

PR: https://github.com/llvm/llvm-project/pull/98821
2024-07-26 14:06:46 +01:00
Alexey Bataev
7432ad6af5
[LV][VP][NFC]Add tests for safe store/load forwarding/dependence distance.
Reviewers: fhahn

Reviewed By: fhahn

Pull Request: https://github.com/llvm/llvm-project/pull/100635
2024-07-25 19:55:37 -04:00
Philip Reames
ea202f9f2e [LV,RISCV] Regenerate a test to reduce spurious deltas in upcoming change 2024-07-25 12:22:59 -07:00
Florian Hahn
b72689a5cb
[LV] Ignore live-out users in cost model if scalar epilogue is required.
Follow-up to ba8126b6fef79.

If a scalar epilogue is required, users outside the loop won't use
live-outs from the vector loop but from the scalar epilogue. Ignore them if
that is the case.

This fixes another case where the VPlan-based cost-model more accurately
computes cost.

Fixes https://github.com/llvm/llvm-project/issues/100464.
2024-07-25 11:16:18 +01:00
Florian Hahn
ba8126b6fe
[LV] Mark dead instructions in loop as free.
Update collectValuesToIgnore to also ignore dead instructions in the
loop. Such instructions will be removed by VPlan-based DCE and won't be
considered by the VPlan-based cost model.

This closes a gap between the legacy and VPlan-based cost model. In
practice with the default pipelines, there shouldn't be any dead
instructions in loops reaching LoopVectorize, but it is easy to generate
such cases by hand or automatically via fuzzers.

Fixes https://github.com/llvm/llvm-project/issues/99701.
2024-07-24 09:31:32 +01:00
Luke Lau
58854facb3
[RISCV] Don't cost vector arithmetic fp ops as cheaper than scalar (#99594)
I was comparing some SPEC CPU 2017 benchmarks across rva22u64 and
rva22u64_v, and noticed that in a few cases that rva22u64_v was
considerably slower.

One of them was 519.lbm_r, which has a large loop that was being
unprofitably vectorized. It has an if/else in the loop which requires
large amounts of predication when vectorized, but despite the loop
vectorizer taking this into account the vector cost came out as cheaper
than the scalar.

It looks like the reason for this is because we cost scalar floating
point ops as 2, but their vector equivalents as 1 (for LMUL 1). This
comes from how we use BasicTTIImpl for scalars which treats floats as
twice as expensive as integers.

This patch doubles the cost of vector floating point arithmetic ops so
that they're at least as expensive as their scalar counterparts, which
gives a 13% speedup on 519.lbm_r at -O3 on the spacemit-x60.

Fixes #62576 (the last point there about scalar fsub/fmul)
2024-07-22 13:56:10 +08:00
Mel Chen
4eb30cfb34
[LV][EVL] Support in-loop reduction using tail folding with EVL. (#90184)
Following from #87816, add VPReductionEVLRecipe to describe vector
predication reduction.

Address one of TODOs from #76172.
2024-07-16 16:15:24 +08:00
Mel Chen
a00754bb2a
[LV] Fix the cost of min/max reductions. (#98453)
This patch updates the function `getReductionPatternCost` to handle the
cost of min/max reductions by `TTI.getMinMaxReductionCost`.
2024-07-12 13:47:33 +08:00
Florian Hahn
9a5a8731e7
[VPlan] Introduce ResumePhi VPInstruction, use to create phi for FOR. (#94760)
This patch introduces a new ResumePhi VPInstruction which creates a phi
in a leaf block of a VPlan. The first use is to create the phi node for
fixed-order recurrence resume values in the scalar preheader.

The VPInstruction takes 2 operands: 1) the incoming value from the
middle-block and a default value to be used for all other incoming
blocks.

In follow-up changes, it will also be used to create phis for reduction
and induction resume values.

Depends on https://github.com/llvm/llvm-project/pull/92651

PR: https://github.com/llvm/llvm-project/pull/94760
2024-07-11 16:08:04 +01:00
Florian Hahn
67f4968a57
[LV] Skip cost for ZExt/SExts that will be removed by truncating ops.
If an extend is truncated, it will be removed if the result type is <=
the source type, as there is nothing to extend. Return a cost of 0.

This was caught by the first step to perform cost-modeling based on
VPlan (b841e2e), as the legacy cost model would query the cost of an
invalid extend, while the extend has been folded away by VPlan
transforms.

Fixes https://github.com/llvm/llvm-project/issues/98413.
2024-07-11 11:40:14 +01:00
Florian Hahn
b841e2eca3
Recommit "[VPlan] First step towards VPlan cost modeling. (#92555)"
This reverts commit 6f538f6a2d3224efda985e9eb09012fa4275ea92.

A number of crashes have been fixed by separate fixes, including
ttps://github.com/llvm/llvm-project/pull/96622. This version of the
PR also pre-computes the costs for branches (except the latch) instead
of computing their costs as part of costing of replicate regions, as
there may not be a direct correspondence between original branches and
number of replicate regions.

Original message:
This adds a new interface to compute the cost of recipes, VPBasicBlocks,
VPRegionBlocks and VPlan, initially falling back to the legacy cost model
for all recipes. Follow-up patches will gradually migrate recipes to
compute their own costs step-by-step.

It also adds getBestPlan function to LVP which computes the cost of all
VPlans and picks the most profitable one together with the most
profitable VF.

The VPlan selected by the VPlan cost model is executed and there is an
assert to catch cases where the VPlan cost model and the legacy cost
model disagree. Even though I checked a number of different build
configurations on AArch64 and X86, there may be some differences
that have been missed.

Additional discussions and context can be found in @arcbbb's
https://github.com/llvm/llvm-project/pull/67647 and
https://github.com/llvm/llvm-project/pull/67934 which is an earlier
version of the current PR.

PR: https://github.com/llvm/llvm-project/pull/92555
2024-07-10 14:22:21 +01:00
Florian Hahn
0577cdaa32
[LV] Split checking if tail-folding is possible, collecting masked ops. (#77612)
Introduce new canFoldTail helper which only checks if tail-folding is
possible, but without modifying MaskedOps.

Just because tail-folding is possible doesn't mean the tail will be
folded; that's up to the cost-model to decide. Separating the check if
tail-folding is possible and preparing for tail-folding makes sure that
MaskedOps is only populated when tail-folding is actually selected.

PR: https://github.com/llvm/llvm-project/pull/77612
2024-07-08 16:34:42 +01:00
Florian Hahn
99d6c6d936
[VPlan] Model branch cond to enter scalar epilogue in VPlan. (#92651)
This patch moves branch condition creation to enter the scalar epilogue
loop to VPlan. Modeling the branch in the middle block also requires
modeling the successor blocks. This is done using the recently
introduced VPIRBasicBlock.

Note that the middle.block is still created as part of the skeleton and
then patched in during VPlan execution. Unfortunately the skeleton needs
to create the middle.block early on, as it is also used for induction
resume value creation and is also needed to properly update the
dominator tree during skeleton creation.

After this patch lands, I plan to move induction resume value and phi
node creation in the scalar preheader to VPlan. Once that is done, we
should be able to create the middle.block in VPlan directly.

This is a re-worked version based on the earlier
https://reviews.llvm.org/D150398 and the main change is the use of
VPIRBasicBlock.

Depends on https://github.com/llvm/llvm-project/pull/92525

PR: https://github.com/llvm/llvm-project/pull/92651
2024-07-05 10:08:42 +01:00
Florian Hahn
2b3b405b09
[LV] Don't vectorize first-order recurrence with VF <vscale x 1 x ..>
The assertion added as part of https://github.com/llvm/llvm-project/pull/93395
surfaced cases where first-order recurrences are vectorized with
<vscale x 1 x ..>. If vscale is 1, then we are unable to extract the
penultimate value (second to last lane). Previously this case got
mis-compiled, trying to extract from an invalid lane (-1)
https://llvm.godbolt.org/z/3adzYYcf9.

Fixes https://github.com/llvm/llvm-project/issues/97452.
2024-07-04 11:44:51 +01:00
Kolya Panchenko
49e5cd2acc
[LV][NFC] Marked functions as const. Added LLVM_DEBUG. (#96681) 2024-06-26 17:38:18 -04:00
Florian Hahn
8681bb8bed
[LV] Add additional test coverage for cost modeling.
Add missing tests uncovered by
https://github.com/llvm/llvm-project/pull/92555.

Includes test for https://github.com/llvm/llvm-project/issues/96294 and
https://github.com/llvm/llvm-project/issues/96328
2024-06-26 10:18:01 +01:00
Florian Hahn
abf5969f76
[VPlan] Don't compute costs if there are no vector VPlans.
In some cases, no vector VPlans can be constructed due to failing VPlan
legality checks (e.g. unable to perform sinking for first order
recurrences or plans being incompatible with EVL).

There's no need to compute costs in those cases, so check directly if
there are no vector plans.
2024-06-24 08:38:31 +01:00
Florian Hahn
f0c674f680
[LV] Add test showing cost is computed when there are no vector plans.
Add test showing unnecessary cost computations, as no vector VPlans are
generated.
2024-06-24 08:08:56 +01:00
Florian Hahn
f1f3c34b47
Revert "Recommit "[VPlan] First step towards VPlan cost modeling. (#92555)""
This reverts commit 242cc200ccb24e22eaf54aed7b0b0c84cfc54c0b and
eea150c84053035163f307b46549a2997a343ce9, as it is causing a build bot
failure and there have been a number of crashes reported at
https://github.com/llvm/llvm-project/pull/92555
2024-06-21 19:54:21 +01:00
Florian Hahn
242cc200cc
Recommit "[VPlan] First step towards VPlan cost modeling. (#92555)"
This reverts commit 6f538f6a2d3224efda985e9eb09012fa4275ea92.

Extra tests for crashes discovered when building Chromium have been
added in fb86cb7ec157689e, 3be7312f81ad2.

Original message:
This adds a new interface to compute the cost of recipes, VPBasicBlocks,
VPRegionBlocks and VPlan, initially falling back to the legacy cost model
for all recipes. Follow-up patches will gradually migrate recipes to
compute their own costs step-by-step.

It also adds getBestPlan function to LVP which computes the cost of all
VPlans and picks the most profitable one together with the most
profitable VF.

The VPlan selected by the VPlan cost model is executed and there is an
assert to catch cases where the VPlan cost model and the legacy cost
model disagree. Even though I checked a number of different build
configurations on AArch64 and X86, there may be some differences
that have been missed.

Additional discussions and context can be found in @arcbbb's
https://github.com/llvm/llvm-project/pull/67647 and
https://github.com/llvm/llvm-project/pull/67934 which is an earlier
version of the current PR.

PR: https://github.com/llvm/llvm-project/pull/92555
2024-06-20 17:32:52 +01:00
Florian Hahn
3808ba78de
[VPlan] Model middle block via VPIRBasicBlock. (#95816)
Use VPIRBasicBlock to wrap the middle block and implement patching up
branches in predecessors in VPIRBasicBlock::execute. The IR middle block
is only created after skeleton creation. Initially a regular
VPBasicBlock is created, which will later be replaced by a
VPIRBasicBlock once the middle IR basic block has been created.

Note that this slightly changes the order of instructions created in the
middle block; code generated by recipe execution in the middle block
will now be inserted before the terminator (and in between the compare
to used by the terminator). The original order will be restored in
https://github.com/llvm/llvm-project/pull/92651.


PR: https://github.com/llvm/llvm-project/pull/95816
2024-06-20 13:42:20 +01:00
Arthur Eubanks
6f538f6a2d Revert "Recommit "[VPlan] First step towards VPlan cost modeling. (#92555)""
This reverts commit 90fd99c0795711e1cf762a02b29b0a702f86a264.
This reverts commit 43e6f46936e177e47de6627a74b047ba27561b44.

Causes crashes, see comments on https://github.com/llvm/llvm-project/pull/92555.
2024-06-14 17:47:08 +00:00
Florian Hahn
90fd99c079
Recommit "[VPlan] First step towards VPlan cost modeling. (#92555)"
This reverts commit 46080abe9b136821eda2a1a27d8a13ceac349f8c.

Extra tests have been added in 52d29eb287.

Original message:
This adds a new interface to compute the cost of recipes, VPBasicBlocks,
VPRegionBlocks and VPlan, initially falling back to the legacy cost model
for all recipes. Follow-up patches will gradually migrate recipes to
compute their own costs step-by-step.

It also adds getBestPlan function to LVP which computes the cost of all
VPlans and picks the most profitable one together with the most
profitable VF.

The VPlan selected by the VPlan cost model is executed and there is an
assert to catch cases where the VPlan cost model and the legacy cost
model disagree. Even though I checked a number of different build
configurations on AArch64 and X86, there may be some differences
that have been missed.

Additional discussions and context can be found in @arcbbb's
https://github.com/llvm/llvm-project/pull/67647 and
https://github.com/llvm/llvm-project/pull/67934 which is an earlier
version of the current PR.

PR: https://github.com/llvm/llvm-project/pull/92555
2024-06-14 12:33:48 +01:00
Jay Foad
d4a0154902
[llvm-project] Fix typo "seperate" (#95373) 2024-06-13 20:20:27 +01:00
Arthur Eubanks
46080abe9b Revert "[VPlan] First step towards VPlan cost modeling. (#92555)"
This reverts commit 00798354c553d48d27006a2b06a904bd6013e31b.

Causes crashes, see comments on https://github.com/llvm/llvm-project/pull/92555.
2024-06-13 16:37:21 +00:00
Florian Hahn
00798354c5
[VPlan] First step towards VPlan cost modeling. (#92555)
This adds a new interface to compute the cost of recipes, VPBasicBlocks,
VPRegionBlocks and VPlan, initially falling back to the legacy cost model
for all recipes. Follow-up patches will gradually migrate recipes to 
compute their own costs step-by-step.

It also adds getBestPlan function to LVP which computes the cost of all
VPlans and picks the most profitable one together with the most
profitable VF.

The VPlan selected by the VPlan cost model is executed and there is an
assert to catch cases where the VPlan cost model and the legacy cost
model disagree. Even though I checked a number of different build
configurations on AArch64 and X86, there may be some differences
that have been missed.

Additional discussions and context can be found in @arcbbb's
https://github.com/llvm/llvm-project/pull/67647 and 
https://github.com/llvm/llvm-project/pull/67934 which is an earlier
version of the current PR.


PR: https://github.com/llvm/llvm-project/pull/92555
2024-06-13 14:26:18 +01:00
Florian Hahn
c46a6e6c92
[LV] Remove unnecessary getRuntimeVF call when computing vector TC.
As Step is VF * UF, there is no need to compute it again, which may
require multiple instructions for scalable VFs.
2024-06-12 14:35:37 +01:00
Florian Hahn
f38d84ce32
[VPlan] Use ir-bb prefix for VPIRBasicBlock.
Follow-up to adjust the names and tests after
https://github.com/llvm/llvm-project/pull/93398.
2024-05-30 17:43:40 -07:00
Ramkumar Ramachandra
43100766f2
LV: generalize profitability criterion over TC (#93300)
Generalize LoopVectorizationPlanner::isMoreProfitable smoothly across
the fixed-vector and scalable-vector cases, taking the trip-count into
account, and fixing logical pitfalls that arise from a lack of
generality.
2024-05-30 10:54:32 +01:00
Shih-Po Hung
0338c55ea5
[LV, VPlan] Check if plan is compatible to EVL transform (#92092)
The transform updates all users of inductions to work based on EVL,
instead
of the VF directly. At the moment, widened inductions cannot be updated,
so
bail out if the plan contains any.
This patch introduces a check before applying EVL transform. If any
recipes in loop rely on RuntimeVF, the plan is discarded.
2024-05-25 08:22:49 +08:00
Ramkumar Ramachandra
bb0d29a72d
[LV] fix logical error in trunc cost (#91136)
In LoopVectorizationCostModel::getInstructionCost(), when the condition
canTruncateToMinimalBitwidth() is satisfied, for a trunc, the source
type is computed as the smallest type of the source vector and the
destination vector, and the destination type is computed as the largest
type of the instruction and destination type. This is clearly a logical
error, as the original source vector type could be smaller than the
original destination vector type, and the trunc semantics are broken
because we're attempting to widen.

Fixes #47665.
2024-05-24 18:01:58 +01:00
Shih-Po Hung
b008a2d12a
[LV][NFC] precommit test for EVL transform (#92203)
A precommit test case to show vector loops generated from EVL transform
- This is a precommit test for
https://github.com/llvm/llvm-project/pull/92092
2024-05-24 23:21:59 +08:00
Ramkumar Ramachandra
dc148c9fb8
[LV] add test for #47665, #88802 (#91135) 2024-05-24 10:50:43 +01:00
Florian Hahn
b050048d35
[VPlan] Simplify (X && Y) || (X && !Y) -> X. (#89386)
Simplify a common pattern generated for masks when folding the tail.

PR: https://github.com/llvm/llvm-project/pull/89386
2024-05-19 15:45:23 +00:00
Craig Topper
487b43cdc9
[RISCV] Pass subvector type to isLegalInterleavedAccessType in getInterleavedMemoryOpCost. (#91825)
isLegalInterleavedAccessType expects the subvector type, but
getInterleavedMemoryOpCost is called with the full vector type. So we
need to divide by Factor.
2024-05-15 21:47:29 -07:00
Florian Hahn
d187005cad
[VPlan] Update VPBlendRecipe codegen for for first-lane only.
Update VPBlendRecipe::execute to support generating code for first-lane
only. This fixes a crash in the newly added test
@test_not_first_lane_only_wide_compare_incoming_order_swapped.
2024-05-15 11:00:15 +01:00
Mel Chen
3f1fef3699
[RISCV] Support interleaved accesses for scalable vector. (#90583)
The support for interleaved accesses for scalable vector with a factor
of 2 is enabled in vectorizer. Therefore, the patch removed the
restriction for scalable vector with a factor of 2.
2024-05-03 21:56:31 +08:00
Florian Hahn
bccb7ed8ac
Reapply "[LV] Improve AnyOf reduction codegen. (#78304)"
This reverts the revert commit c6e01627acf859.

This patch includes a fix for any-of reductions and epilogue
vectorization. Extra test coverage for the issue that caused the revert
has been added in bce3bfced5fe0b019 and an assertion has been added in
c7209cbb8be7a3c65813.

--------------------------------
Original commit message:

Update AnyOf reduction code generation to only keep track of the AnyOf
property in a boolean vector in the loop, only selecting either the new
or start value in the middle block.

The patch incorporates feedback from https://reviews.llvm.org/D153697.

This fixes the #62565, as now there aren't multiple uses of the
start/new values.

Fixes https://github.com/llvm/llvm-project/issues/62565

PR: https://github.com/llvm/llvm-project/pull/78304
2024-05-03 14:40:49 +01:00
Alexey Bataev
1d43cdc9f5
[LV][EVL]Support reversed loads/stores.
Support for predicated vector reverse intrinsic was added some time ago.
Adds support for predicated reversed loads/stores in the loop
vectorizer.

Reviewers: fhahn

Reviewed By: fhahn

Pull Request: https://github.com/llvm/llvm-project/pull/88025
2024-05-03 07:28:56 -04:00
Maciej Gabka
bfc0317153
Move several vector intrinsics out of experimental namespace (#88748)
This patch is moving out following intrinsics:
* vector.interleave2/deinterleave2
* vector.reverse
* vector.splice

from the experimental namespace.

All these intrinsics exist in LLVM for more than a year now, and are
widely used, so should not be considered as experimental.
2024-04-29 10:16:45 +01:00
Florian Hahn
e2a72fa583
[VPlan] Introduce recipes for VP loads and stores. (#87816)
Introduce new subclasses of VPWidenMemoryRecipe for VP
(vector-predicated) loads and stores to address multiple TODOs from
https://github.com/llvm/llvm-project/pull/76172

Note that the introduction of the new recipes also improves code-gen for
VP gather/scatters by removing the redundant header mask. With the new
approach, it is not sufficient to look at users of the widened canonical
IV to find all uses of the header mask.

In some cases, a widened IV is used instead of separately widening the
canonical IV. To handle that, first collect all VPValues representing header
masks (by looking at users of both the canonical IV and widened inductions
that are canonical) and then checking all users (recursively) of those header
masks.

Depends on https://github.com/llvm/llvm-project/pull/87411.

PR: https://github.com/llvm/llvm-project/pull/87816
2024-04-19 09:44:23 +01:00
Arthur Eubanks
c6e01627ac Revert "Reapply "[LV] Improve AnyOf reduction codegen. (#78304)""
This reverts commit c6e38b928c56f562aea68a8e90f02dbdf0eada85.

Causes miscompiles, see comments on #78304.
2024-04-16 20:40:21 +00:00
Shih-Po Hung
f3a8112d98
[RISCV][TTI] Scale the cost of ICmp with LMUL (#88235)
Use the Val type to estimate the instruction cost for ICmp.
2024-04-16 09:37:32 +08:00
Florian Hahn
c836983671
[VPlan] Remove unused first mask op from VPBlendRecipe. (#87770)
VPBlendRecipe does not use the first mask operand. Removing it allows
VPlan-based DCE to remove unused mask computations.

This also fixes #87410, where unused Not VPInstructions are considered
having only their first lane demanded, but some of their operands
providing a vector value due to other users.

Fixes https://github.com/llvm/llvm-project/issues/87410

PR: https://github.com/llvm/llvm-project/pull/87770
2024-04-09 11:14:05 +01:00
Florian Hahn
c6e38b928c
Reapply "[LV] Improve AnyOf reduction codegen. (#78304)"
This reverts the revert commit 589c7abb03448.

This patch includes a fix for any-of reductions and epilogue
vectorization. Extra test coverage for the issue that caused the revert
has been added in 399ff08e29d.

--------------------------------
Original commit message:

Update AnyOf reduction code generation to only keep track of the AnyOf
property in a boolean vector in the loop, only selecting either the new
or start value in the middle block.

The patch incorporates feedback from https://reviews.llvm.org/D153697.

This fixes the #62565, as now there aren't multiple uses of the
start/new values.

Fixes https://github.com/llvm/llvm-project/issues/62565

PR: https://github.com/llvm/llvm-project/pull/78304
2024-04-05 13:45:13 +01:00
Alexey Bataev
413a66f339
[LV, VP]VP intrinsics support for the Loop Vectorizer + adding new tail-folding mode using EVL. (#76172)
This patch introduces generating VP intrinsics in the Loop Vectorizer.

Currently the Loop Vectorizer supports vector predication in a very
limited capacity via tail-folding and masked load/store/gather/scatter
intrinsics. However, this does not let architectures with active vector
length predication support take advantage of their capabilities.
Architectures with general masked predication support also can only take
advantage of predication on memory operations. By having a way for the
Loop Vectorizer to generate Vector Predication intrinsics, which (will)
provide a target-independent way to model predicated vector
instructions. These architectures can make better use of their
predication capabilities.

Our first approach (implemented in this patch) builds on top of the
existing tail-folding mechanism in the LV (just adds a new tail-folding
mode using EVL), but instead of generating masked intrinsics for memory
operations it generates VP intrinsics for loads/stores instructions. The
patch adds a new VPlanTransforms to replace the wide header predicate
compare with EVL and updates codegen for load/stores to use VP
store/load with EVL.

Other important part of this approach is how the Explicit Vector Length
is computed. (VP intrinsics define this vector length parameter as
Explicit Vector Length (EVL)). We use an experimental intrinsic
`get_vector_length`, that can be lowered to architecture specific
instruction(s) to compute EVL.

Also, added a new recipe to emit instructions for computing EVL. Using
VPlan in this way will eventually help build and compare VPlans
corresponding to different strategies and alternatives.

Differential Revision: https://reviews.llvm.org/D99750
2024-04-04 18:30:17 -04:00
Florian Hahn
89271b4676
[LV] Add test depending on target to RISCV subdirectory. 2024-04-02 22:02:25 +01:00