2062 Commits

Author SHA1 Message Date
Florian Hahn
632317e9ab
[VPlan] Add non-poison propagating LogicalAnd VPInstruction opcode. (#91897)
Add a new opcode to mode non-poison propagating logical AND operations
used when generating edge masks. This follows the similar decision to
model Not as dedicated opcode as well, to improve clarity.

This also helps to simplify the matchers for
https://github.com/llvm/llvm-project/pull/89386.


PR: https://github.com/llvm/llvm-project/pull/91897
2024-05-14 09:42:49 +01:00
Florian Hahn
e122380445
[LV] Use VPBuilder to create Select (NFCI). 2024-05-13 20:44:39 +01:00
Florian Hahn
082c81ae4a
[LV] Properly extend versioned constant strides.
We only version unknown strides to 1. If the original type is i1, then
the sign of the extension matters. Properly extend the stride value
before replacing it.

Fixes https://github.com/llvm/llvm-project/issues/91369.
2024-05-07 21:31:42 +01:00
Alexey Bataev
6517c5b068 [LV][NFC]Address last comments from https://github.com/llvm/llvm-project/pull/88025. 2024-05-03 06:51:01 -07:00
Florian Hahn
bccb7ed8ac
Reapply "[LV] Improve AnyOf reduction codegen. (#78304)"
This reverts the revert commit c6e01627acf859.

This patch includes a fix for any-of reductions and epilogue
vectorization. Extra test coverage for the issue that caused the revert
has been added in bce3bfced5fe0b019 and an assertion has been added in
c7209cbb8be7a3c65813.

--------------------------------
Original commit message:

Update AnyOf reduction code generation to only keep track of the AnyOf
property in a boolean vector in the loop, only selecting either the new
or start value in the middle block.

The patch incorporates feedback from https://reviews.llvm.org/D153697.

This fixes the #62565, as now there aren't multiple uses of the
start/new values.

Fixes https://github.com/llvm/llvm-project/issues/62565

PR: https://github.com/llvm/llvm-project/pull/78304
2024-05-03 14:40:49 +01:00
Alexey Bataev
1d43cdc9f5
[LV][EVL]Support reversed loads/stores.
Support for predicated vector reverse intrinsic was added some time ago.
Adds support for predicated reversed loads/stores in the loop
vectorizer.

Reviewers: fhahn

Reviewed By: fhahn

Pull Request: https://github.com/llvm/llvm-project/pull/88025
2024-05-03 07:28:56 -04:00
Florian Hahn
c7209cbb8b
[LV] Assert that there's a resume phi for epilogue loops (NFC).
This patch adds an assert to createAndCollectMergePhiForReduction to
make sure there is a resume phi when vectorizing the epilogue loop. This
is needed to set the resume value from the main vector loop.

This assertion guards against the issue caused the revert of
https://github.com/llvm/llvm-project/pull/78304.
2024-05-02 19:20:28 +01:00
Florian Hahn
e846778e52
[VPlan] Make CallInst optional for VPWidenCallRecipe (NFCI).
Replace relying on the underling CallInst for looking up the called
function and its types by instead adding the called function as operand,
in line with how called functions are handled in CallInst.

Operand bundles, metadata and fast-math flags are optionally used if
there's an underlying CallInst.

This enables creating VPWidenCallRecipes without requiring an underlying
IR instruction.
2024-05-01 20:48:22 +01:00
Florian Hahn
9c3f5fe88f
[LV] Don't consider the latch block as ScalarPredicatedBB.
The conditional branch from the loop latch will be replaced by a
single branch controlling the loop, so there is no extra overhead from
scalarization. This improves the cost esimates in some cases.
2024-04-29 19:15:46 +01:00
Maciej Gabka
bfc0317153
Move several vector intrinsics out of experimental namespace (#88748)
This patch is moving out following intrinsics:
* vector.interleave2/deinterleave2
* vector.reverse
* vector.splice

from the experimental namespace.

All these intrinsics exist in LLVM for more than a year now, and are
widely used, so should not be considered as experimental.
2024-04-29 10:16:45 +01:00
Florian Hahn
b6a8f5486b
[LV] Consider all exit branch conditions uniform.
If we vectorize a loop with multiple exits, all exiting branches should
be considered uniform, as the resulting loop will be controlled by the
canonical IV only. Previously we were overestimating the cost of values
contributing to the other exits.
2024-04-28 13:15:55 +01:00
Florian Hahn
9ee8e38cdc
[VPlan] Also propagate versioned strides to users via sext/zext.
The versioned value may not be used in the loop directly but through a
sext/zext. Add new live-ins in those cases.
2024-04-26 21:29:43 +01:00
David Green
a8105026ff [LV] Fix warning about Mask being set twice. NFC 2024-04-20 16:40:08 +01:00
Florian Hahn
e2a72fa583
[VPlan] Introduce recipes for VP loads and stores. (#87816)
Introduce new subclasses of VPWidenMemoryRecipe for VP
(vector-predicated) loads and stores to address multiple TODOs from
https://github.com/llvm/llvm-project/pull/76172

Note that the introduction of the new recipes also improves code-gen for
VP gather/scatters by removing the redundant header mask. With the new
approach, it is not sufficient to look at users of the widened canonical
IV to find all uses of the header mask.

In some cases, a widened IV is used instead of separately widening the
canonical IV. To handle that, first collect all VPValues representing header
masks (by looking at users of both the canonical IV and widened inductions
that are canonical) and then checking all users (recursively) of those header
masks.

Depends on https://github.com/llvm/llvm-project/pull/87411.

PR: https://github.com/llvm/llvm-project/pull/87816
2024-04-19 09:44:23 +01:00
Ramkumar Ramachandra
73e7f2ff70
LoopVectorize: guard marking iv as scalar; fix bug (#88730)
When collecting loop scalars, LoopVectorize over-eagerly marks the
induction variable and its update as scalars after vectorization, even
if the induction variable update is a first-order recurrence. Guard the
process with this check, fixing a crash.

Fixes #72969.
2024-04-18 14:41:07 +01:00
Ramkumar Ramachandra
63d8058ef5
LoopVectorize: guard appending InstsToScalarize; fix bug (#88720)
In the process of collecting instructions to scalarize, LoopVectorize
uses faulty reasoning whereby it also adds instructions that will be
scalar after vectorization. If an instruction satisfies
isScalarAfterVectorization() for the given VF, it should not be appended
to InstsToScalarize. Add this extra guard, fixing a crash.

Fixes #55096.
2024-04-18 10:03:07 +01:00
Florian Hahn
a9bafe91dd
[VPlan] Split VPWidenMemoryInstructionRecipe (NFCI). (#87411)
This patch introduces a new VPWidenMemoryRecipe base class and distinct
sub-classes to model loads and stores.

This is a first step in an effort to simplify and modularize code
generation for widened loads and stores and enable adding further more
specialized memory recipes.

PR: https://github.com/llvm/llvm-project/pull/87411
2024-04-17 11:00:58 +01:00
Mel Chen
cbe148b730
[LV][NFC] Remove the declaration of function fixReduction. (#88491) 2024-04-17 17:59:52 +08:00
Arthur Eubanks
c6e01627ac Revert "Reapply "[LV] Improve AnyOf reduction codegen. (#78304)""
This reverts commit c6e38b928c56f562aea68a8e90f02dbdf0eada85.

Causes miscompiles, see comments on #78304.
2024-04-16 20:40:21 +00:00
Alexey Bataev
e84b2fb48d
[LV][NFCI]Use integer for cost/trip count calculations instead of double, fix possible UB.
Using fp type in the compiler is not the best idea, here it used with
the comparison for equal to 0 and may cause undefined behavior in some
cases.

Reviewers: fhahn

Reviewed By: fhahn

Pull Request: https://github.com/llvm/llvm-project/pull/87241
2024-04-16 09:48:13 -04:00
Florian Hahn
c836983671
[VPlan] Remove unused first mask op from VPBlendRecipe. (#87770)
VPBlendRecipe does not use the first mask operand. Removing it allows
VPlan-based DCE to remove unused mask computations.

This also fixes #87410, where unused Not VPInstructions are considered
having only their first lane demanded, but some of their operands
providing a vector value due to other users.

Fixes https://github.com/llvm/llvm-project/issues/87410

PR: https://github.com/llvm/llvm-project/pull/87770
2024-04-09 11:14:05 +01:00
Florian Hahn
9430a4b9d2
[VPlan] Use getEdgeMask when constructing VPBlendRecipe (NFCI).
After 2d0d65b3babe, block-in and edge masks are create up-front. Only
retrieve the cached edge-mask here.
2024-04-09 09:32:40 +01:00
Florian Hahn
15d11a4de9
[VPlan] Track IsOrdered in VPReductionRecipe, remove use of ILV (NFCI).
Instead of using ILV.useOrderedReductions during ::execute, instead
store the information at recipe construction.

Another step towards making recipe'::execute independent of legacy ILV.
2024-04-07 20:33:22 +01:00
Florian Hahn
c6e38b928c
Reapply "[LV] Improve AnyOf reduction codegen. (#78304)"
This reverts the revert commit 589c7abb03448.

This patch includes a fix for any-of reductions and epilogue
vectorization. Extra test coverage for the issue that caused the revert
has been added in 399ff08e29d.

--------------------------------
Original commit message:

Update AnyOf reduction code generation to only keep track of the AnyOf
property in a boolean vector in the loop, only selecting either the new
or start value in the middle block.

The patch incorporates feedback from https://reviews.llvm.org/D153697.

This fixes the #62565, as now there aren't multiple uses of the
start/new values.

Fixes https://github.com/llvm/llvm-project/issues/62565

PR: https://github.com/llvm/llvm-project/pull/78304
2024-04-05 13:45:13 +01:00
Alexey Bataev
413a66f339
[LV, VP]VP intrinsics support for the Loop Vectorizer + adding new tail-folding mode using EVL. (#76172)
This patch introduces generating VP intrinsics in the Loop Vectorizer.

Currently the Loop Vectorizer supports vector predication in a very
limited capacity via tail-folding and masked load/store/gather/scatter
intrinsics. However, this does not let architectures with active vector
length predication support take advantage of their capabilities.
Architectures with general masked predication support also can only take
advantage of predication on memory operations. By having a way for the
Loop Vectorizer to generate Vector Predication intrinsics, which (will)
provide a target-independent way to model predicated vector
instructions. These architectures can make better use of their
predication capabilities.

Our first approach (implemented in this patch) builds on top of the
existing tail-folding mechanism in the LV (just adds a new tail-folding
mode using EVL), but instead of generating masked intrinsics for memory
operations it generates VP intrinsics for loads/stores instructions. The
patch adds a new VPlanTransforms to replace the wide header predicate
compare with EVL and updates codegen for load/stores to use VP
store/load with EVL.

Other important part of this approach is how the Explicit Vector Length
is computed. (VP intrinsics define this vector length parameter as
Explicit Vector Length (EVL)). We use an experimental intrinsic
`get_vector_length`, that can be lowered to architecture specific
instruction(s) to compute EVL.

Also, added a new recipe to emit instructions for computing EVL. Using
VPlan in this way will eventually help build and compare VPlans
corresponding to different strategies and alternatives.

Differential Revision: https://reviews.llvm.org/D99750
2024-04-04 18:30:17 -04:00
Florian Hahn
e701c1a653
[VPlan] Use recipe's debug loc for VPWidenMemoryInstructionRecipe (NFCI)
Now that VPRecipeBase manages debug locations for recipes, use it in
VPWidenMemoryInstructionRecipe.
2024-04-01 12:07:30 +01:00
Florian Hahn
8a614c1d31
[VPlan] Rename getVPValueOrAddLiveIn -> getOrAddLiveIn (NFCI).
The helper now only deals with live-ins, clarify the name.
2024-03-28 21:02:15 +00:00
Florian Hahn
06bb8c9f20
[VPlan] Explicitly handle scalar pointer inductions. (#83068)
Add a new PtrAdd opcode to VPInstruction that corresponds to
IRBuilder::CreatePtrAdd, which creates a GEP with source element type
i8.

This is then used to model scalarizing VPWidenPointerInductionRecipe by
introducing scalar-steps to model the index increment followed by a
PtrAdd.

Note that PtrAdd needs to be able to generate code for only the first
lane or for all lanes. This may warrant introducing a separate recipe
for scalarizing that can be created without relying on the underlying
IR.

Depends on https://github.com/llvm/llvm-project/pull/80271

PR: https://github.com/llvm/llvm-project/pull/83068
2024-03-26 16:01:57 +01:00
Florian Hahn
39c8e87717
[VPlan] Move recording of Inst->VPValue to VPRecipeBuilder (NFCI). (#84464)
Instead of keeping a mapping of Inst->VPValues (of their corresponding
recipes) in VPlan's Value2VPValue mapping, keep it in VPRecipeBuilder
instead. After recently replacing the last user of this mapping after
initial construction, this mapping is only needed for recipe
construction (to map IR operands to VPValue operands).

By moving the mapping, VPlan's VPValue tracking can be simplified and
limited only to live-ins. It also allows removing disableValue2VPValue
and associated machinery & asserts.

PR: https://github.com/llvm/llvm-project/pull/84464
2024-03-23 18:43:14 +01:00
Florian Hahn
8578b6e912
[VPlan] Store VPlan directly in VPRecipeBuilder (NFCI).
Instead of passing VPlan in a number of places, just store it directly
in VPRecipeBuilder. A single instance is only used for a single VPlan.

This simplifies the code and was suggested by @nikolaypanchenko in
https://github.com/llvm/llvm-project/pull/84464.
2024-03-18 19:23:37 +00:00
Paschalis Mpeis
f795d1a8b1
[AArch64][LV][SLP] Vectorizers use call cost for vectorized frem (#82488)
getArithmeticInstrCost is used by both LoopVectorizer and SLPVectorizer
to compute the cost of frem, which becomes a call cost on AArch64 when
TLI has a vector library function.

Add tests that do SLP vectorization for code that contains 2x double and
4x float frem instructions.
2024-03-14 17:20:29 +00:00
Kirill Stoimenov
589c7abb03 Revert "[LV] Improve AnyOf reduction codegen. (#78304)"
Broke sanitizer bots: https://lab.llvm.org/buildbot/#/builders/74/builds/26697

This reverts commit 95fef1dfefd5467206e74c089d29806fcd82889b.
2024-03-14 14:57:01 +00:00
Florian Hahn
95fef1dfef
[LV] Improve AnyOf reduction codegen. (#78304)
Update AnyOf reduction code generation to only keep track of the AnyOf
property in a boolean vector in the loop, only selecting either the new
or start value in the middle block.

The patch incorporates feedback from https://reviews.llvm.org/D153697.

This fixes the #62565, as now there aren't multiple uses of the
start/new values.

Fixes https://github.com/llvm/llvm-project/issues/62565

PR: https://github.com/llvm/llvm-project/pull/78304
2024-03-14 11:22:06 +00:00
Jeremy Morse
2fe81edef6 [NFC][RemoveDIs] Insert instruction using iterators in Transforms/
As part of the RemoveDIs project we need LLVM to insert instructions using
iterators wherever possible, so that the iterators can carry a bit of
debug-info. This commit implements some of that by updating the contents of
llvm/lib/Transforms/Utils to always use iterator-versions of instruction
constructors.

There are two general flavours of update:
 * Almost all call-sites just call getIterator on an instruction
 * Several make use of an existing iterator (scenarios where the code is
   actually significant for debug-info)
The underlying logic is that any call to getFirstInsertionPt or similar
APIs that identify the start of a block need to have that iterator passed
directly to the insertion function, without being converted to a bare
Instruction pointer along the way.

Noteworthy changes:
 * FindInsertedValue now takes an optional iterator rather than an
   instruction pointer, as we need to always insert with iterators,
 * I've added a few iterator-taking versions of some value-tracking and
   DomTree methods -- they just unwrap the iterator. These are purely
   convenience methods to avoid extra syntax in some passes.
 * A few calls to getNextNode become std::next instead (to keep in the
   theme of using iterators for positions),
 * SeparateConstOffsetFromGEP has it's insertion-position field changed.
   Noteworthy because it's not a purely localised spelling change.

All this should be NFC.
2024-03-05 15:12:22 +00:00
Florian Hahn
4d525f2b9a
[VPlan] Remove unneeded InsertPointGuard (NFCI).
getBlockInMask now simply returns an already computed mask, hence
there's no need to adjust the builder insert point.
2024-02-29 12:37:13 +00:00
Nilanjana Basu
1c211bc76e
[LV] Remove unused configuration option (#82955)
Recent set of changes (PR #67725) in loop interleaving algorithm caused removal of the loop trip count threshold for allowing interleaving. Therefore configuration option interleave-small-loop-scalar-reduction is no longer needed.
2024-02-28 10:17:25 -08:00
Florian Hahn
3fac0562f8
[VPlan] Reset trip count when replacing ExpandSCEV recipe.
Otherwise accessing the trip count may accesses freed memory.

Fixes https://lab.llvm.org/buildbot/#/builders/74/builds/26239 and
others.
2024-02-28 16:31:49 +00:00
Alexey Bataev
80cff27390 [LV][NFC]Fix a misprint, NFC. 2024-02-28 07:56:31 -08:00
Alexey Bataev
fdd60c7b96
[LV][NFC]Preselect folding style while chosing the max VF, NFC.
Selects the tail-folding style while choosing the max vector
factor and storing it in the data member rather than calculating it each
time upon getTailFoldingStyle call.

Part of https://github.com/llvm/llvm-project/pull/76172

Reviewers: ayalz, fhahn

Reviewed By: fhahn

Pull Request: https://github.com/llvm/llvm-project/pull/81885
2024-02-28 10:15:52 -05:00
Florian Hahn
15d9d0fa8f
[VPlan] Also print final VPlan directly before codegen/execute. (#82269)
Some optimizations are apply after UF and VF have been chosen. This
patch adds an extra print of the final VPlan just before
codegen/execution.

In the future, there will be additional transforms that are applied
later (interleaving for example).

PR: https://github.com/llvm/llvm-project/pull/82269
2024-02-28 13:19:43 +00:00
Florian Hahn
911055e34f
[VPlan] Consistently use (Part, 0) for first lane scalar values (#80271)
At the moment, some VPInstructions create only a single scalar value,
but use VPTransformatState's 'vector' storage for this value. Those
values are effectively uniform-per-VF (or in some cases
uniform-across-VF-and-UF). Using the vector/per-part storage doesn't
interact well with other recipes, that more accurately using (Part,
Lane) to look up scalar values and prevents VPInstructions creating
scalars from interacting with other recipes working with scalars.

This PR tries to unify handling of scalars by using (Part, 0) for scalar
values where only the first lane is demanded. This allows using
VPInstructions with other recipes like VPScalarCastRecipe and is also
needed when using VPInstructions in more cases otuside the vector loop
region to generate scalars.

Depends on https://github.com/llvm/llvm-project/pull/80269
2024-02-26 19:06:43 +00:00
Florian Hahn
9923d29cfa
[VPlan] Merge main VPlan verifer with HCFG verifier.
Unify VPlan verifiers in verifyVPlanIsValid. This adds verification for
various properties on blocks to the verifier used for VPlans generated
by the inner loop vectorizer. It also adds def-use checks for the
verifier used in the VPlan native path.

This drops the separate flag to enable HCFG verification. Instead, all
VPlans are verified once they have been created, if assertions are
enabled.

This also removes VPWidenPHIRecipe from VPHeaderPHIRecipe; it is used to
model any phi node in the native path.
2024-02-20 16:43:57 +00:00
Florian Hahn
35ee6de966
[VPlan] Simplify addCanonicalIVRecipes by using VPBuilder (NFC).
Use VPBuilder to construct VPInstructions, which means there's no need
to manually inserting recipes.
2024-02-17 18:30:01 +00:00
David Sherwood
1c10821022
[LoopVectorize] Fix divide-by-zero bug (#80836) (#81721)
When attempting to use the estimated trip count to refine the costs of
the runtime memory checks we should also check for sane trip counts to
prevent divide-by-zero faults on some platforms.

Fixes #80836
2024-02-14 16:07:51 +00:00
Florian Hahn
debca7ee43
[VPlan] Move dropping of poison flags to VPlanTransforms. (NFC)
Move collectPoisonGeneratingFlags from InnerLoopVectorizer to
VPlanTransforms and also update its name. collectPoisonGeneratingFlags
already directly drops poison-generating flags, not only collecting it.
This means it is more appropriate to integerate it directly into the
VPlan transform pipeline.

The current implementation still calls back to legal to check if a block
needs predication, which should be improved in the future.
2024-02-14 12:28:58 +00:00
Nilanjana Basu
c1c5b854ad
[LV] Remove loop trip count threshold for deciding whether to interleave a loop (#67725)
A set of microbenchmarks (https://github.com/llvm/llvm-test-suite/pull/26) showed that loop interleaving can be beneficial for loops with low trip count as well. Loop interleaving count computation is updated accordingly in prior patches while this patch removes the loop trip count threshold for interleaving.
2024-02-05 17:23:58 -08:00
Florian Hahn
2906f3626b
[VPlan] Update ::onlyScalarsGenerated to take IsScalable bool (NFCI).
Instead of passing in a full VF, just pass IsScalable as bool.
2024-02-03 14:51:14 +00:00
Nilanjana Basu
c492eb6b28
[LV] Update interleaving count computation when scalar epilogue loop needs to run at least once (#79651)
Update loop interleaving count computation to address loops that require at least one scalar iteration in the epilogue loop. For this case, the available trip count for interleaving the loop is one less.
2024-01-29 13:41:15 -08:00
Florian Hahn
743946e8ef
[VPlan] Replace VPRecipeOrVPValue with VP2VP recipe simplification. (#76090)
Move simplification of VPBlendRecipes from early VPlan construction to
VPlan-to-VPlan based recipe simplification. This simplifies initial
construction.

Note that some in-loop reduction tests are failing at the moment, due to
the reduction predicate being created after the reduction recipe. I will
provide a patch for that soon.

PR: https://github.com/llvm/llvm-project/pull/76090
2024-01-29 09:52:05 +00:00
Florian Hahn
2d0d65b3ba
[VPlan] Create edge masks all cases up front needed.(NFC)
Similarly to how block masks are created up front and later only
retrieved also make sure masks are created in cases where edge masks are
needed, i.e. blend recipes.

Creating block-in masks for all blocks in the loop also ensures edge
masks for all relevant edges have been created. Later, the new
getEdgeMask can be used to look up cached edge masks.

This makes sure edge masks are available in all cases for
https://github.com/llvm/llvm-project/pull/76090.
2024-01-28 21:20:18 +00:00