2030 Commits

Author SHA1 Message Date
Florian Hahn
95fef1dfef
[LV] Improve AnyOf reduction codegen. (#78304)
Update AnyOf reduction code generation to only keep track of the AnyOf
property in a boolean vector in the loop, only selecting either the new
or start value in the middle block.

The patch incorporates feedback from https://reviews.llvm.org/D153697.

This fixes the #62565, as now there aren't multiple uses of the
start/new values.

Fixes https://github.com/llvm/llvm-project/issues/62565

PR: https://github.com/llvm/llvm-project/pull/78304
2024-03-14 11:22:06 +00:00
Jeremy Morse
2fe81edef6 [NFC][RemoveDIs] Insert instruction using iterators in Transforms/
As part of the RemoveDIs project we need LLVM to insert instructions using
iterators wherever possible, so that the iterators can carry a bit of
debug-info. This commit implements some of that by updating the contents of
llvm/lib/Transforms/Utils to always use iterator-versions of instruction
constructors.

There are two general flavours of update:
 * Almost all call-sites just call getIterator on an instruction
 * Several make use of an existing iterator (scenarios where the code is
   actually significant for debug-info)
The underlying logic is that any call to getFirstInsertionPt or similar
APIs that identify the start of a block need to have that iterator passed
directly to the insertion function, without being converted to a bare
Instruction pointer along the way.

Noteworthy changes:
 * FindInsertedValue now takes an optional iterator rather than an
   instruction pointer, as we need to always insert with iterators,
 * I've added a few iterator-taking versions of some value-tracking and
   DomTree methods -- they just unwrap the iterator. These are purely
   convenience methods to avoid extra syntax in some passes.
 * A few calls to getNextNode become std::next instead (to keep in the
   theme of using iterators for positions),
 * SeparateConstOffsetFromGEP has it's insertion-position field changed.
   Noteworthy because it's not a purely localised spelling change.

All this should be NFC.
2024-03-05 15:12:22 +00:00
Florian Hahn
4d525f2b9a
[VPlan] Remove unneeded InsertPointGuard (NFCI).
getBlockInMask now simply returns an already computed mask, hence
there's no need to adjust the builder insert point.
2024-02-29 12:37:13 +00:00
Nilanjana Basu
1c211bc76e
[LV] Remove unused configuration option (#82955)
Recent set of changes (PR #67725) in loop interleaving algorithm caused removal of the loop trip count threshold for allowing interleaving. Therefore configuration option interleave-small-loop-scalar-reduction is no longer needed.
2024-02-28 10:17:25 -08:00
Florian Hahn
3fac0562f8
[VPlan] Reset trip count when replacing ExpandSCEV recipe.
Otherwise accessing the trip count may accesses freed memory.

Fixes https://lab.llvm.org/buildbot/#/builders/74/builds/26239 and
others.
2024-02-28 16:31:49 +00:00
Alexey Bataev
80cff27390 [LV][NFC]Fix a misprint, NFC. 2024-02-28 07:56:31 -08:00
Alexey Bataev
fdd60c7b96
[LV][NFC]Preselect folding style while chosing the max VF, NFC.
Selects the tail-folding style while choosing the max vector
factor and storing it in the data member rather than calculating it each
time upon getTailFoldingStyle call.

Part of https://github.com/llvm/llvm-project/pull/76172

Reviewers: ayalz, fhahn

Reviewed By: fhahn

Pull Request: https://github.com/llvm/llvm-project/pull/81885
2024-02-28 10:15:52 -05:00
Florian Hahn
15d9d0fa8f
[VPlan] Also print final VPlan directly before codegen/execute. (#82269)
Some optimizations are apply after UF and VF have been chosen. This
patch adds an extra print of the final VPlan just before
codegen/execution.

In the future, there will be additional transforms that are applied
later (interleaving for example).

PR: https://github.com/llvm/llvm-project/pull/82269
2024-02-28 13:19:43 +00:00
Florian Hahn
911055e34f
[VPlan] Consistently use (Part, 0) for first lane scalar values (#80271)
At the moment, some VPInstructions create only a single scalar value,
but use VPTransformatState's 'vector' storage for this value. Those
values are effectively uniform-per-VF (or in some cases
uniform-across-VF-and-UF). Using the vector/per-part storage doesn't
interact well with other recipes, that more accurately using (Part,
Lane) to look up scalar values and prevents VPInstructions creating
scalars from interacting with other recipes working with scalars.

This PR tries to unify handling of scalars by using (Part, 0) for scalar
values where only the first lane is demanded. This allows using
VPInstructions with other recipes like VPScalarCastRecipe and is also
needed when using VPInstructions in more cases otuside the vector loop
region to generate scalars.

Depends on https://github.com/llvm/llvm-project/pull/80269
2024-02-26 19:06:43 +00:00
Florian Hahn
9923d29cfa
[VPlan] Merge main VPlan verifer with HCFG verifier.
Unify VPlan verifiers in verifyVPlanIsValid. This adds verification for
various properties on blocks to the verifier used for VPlans generated
by the inner loop vectorizer. It also adds def-use checks for the
verifier used in the VPlan native path.

This drops the separate flag to enable HCFG verification. Instead, all
VPlans are verified once they have been created, if assertions are
enabled.

This also removes VPWidenPHIRecipe from VPHeaderPHIRecipe; it is used to
model any phi node in the native path.
2024-02-20 16:43:57 +00:00
Florian Hahn
35ee6de966
[VPlan] Simplify addCanonicalIVRecipes by using VPBuilder (NFC).
Use VPBuilder to construct VPInstructions, which means there's no need
to manually inserting recipes.
2024-02-17 18:30:01 +00:00
David Sherwood
1c10821022
[LoopVectorize] Fix divide-by-zero bug (#80836) (#81721)
When attempting to use the estimated trip count to refine the costs of
the runtime memory checks we should also check for sane trip counts to
prevent divide-by-zero faults on some platforms.

Fixes #80836
2024-02-14 16:07:51 +00:00
Florian Hahn
debca7ee43
[VPlan] Move dropping of poison flags to VPlanTransforms. (NFC)
Move collectPoisonGeneratingFlags from InnerLoopVectorizer to
VPlanTransforms and also update its name. collectPoisonGeneratingFlags
already directly drops poison-generating flags, not only collecting it.
This means it is more appropriate to integerate it directly into the
VPlan transform pipeline.

The current implementation still calls back to legal to check if a block
needs predication, which should be improved in the future.
2024-02-14 12:28:58 +00:00
Nilanjana Basu
c1c5b854ad
[LV] Remove loop trip count threshold for deciding whether to interleave a loop (#67725)
A set of microbenchmarks (https://github.com/llvm/llvm-test-suite/pull/26) showed that loop interleaving can be beneficial for loops with low trip count as well. Loop interleaving count computation is updated accordingly in prior patches while this patch removes the loop trip count threshold for interleaving.
2024-02-05 17:23:58 -08:00
Florian Hahn
2906f3626b
[VPlan] Update ::onlyScalarsGenerated to take IsScalable bool (NFCI).
Instead of passing in a full VF, just pass IsScalable as bool.
2024-02-03 14:51:14 +00:00
Nilanjana Basu
c492eb6b28
[LV] Update interleaving count computation when scalar epilogue loop needs to run at least once (#79651)
Update loop interleaving count computation to address loops that require at least one scalar iteration in the epilogue loop. For this case, the available trip count for interleaving the loop is one less.
2024-01-29 13:41:15 -08:00
Florian Hahn
743946e8ef
[VPlan] Replace VPRecipeOrVPValue with VP2VP recipe simplification. (#76090)
Move simplification of VPBlendRecipes from early VPlan construction to
VPlan-to-VPlan based recipe simplification. This simplifies initial
construction.

Note that some in-loop reduction tests are failing at the moment, due to
the reduction predicate being created after the reduction recipe. I will
provide a patch for that soon.

PR: https://github.com/llvm/llvm-project/pull/76090
2024-01-29 09:52:05 +00:00
Florian Hahn
2d0d65b3ba
[VPlan] Create edge masks all cases up front needed.(NFC)
Similarly to how block masks are created up front and later only
retrieved also make sure masks are created in cases where edge masks are
needed, i.e. blend recipes.

Creating block-in masks for all blocks in the loop also ensures edge
masks for all relevant edges have been created. Later, the new
getEdgeMask can be used to look up cached edge masks.

This makes sure edge masks are available in all cases for
https://github.com/llvm/llvm-project/pull/76090.
2024-01-28 21:20:18 +00:00
Florian Hahn
7c03d5d41d
[VPlan] Use unique_ptr to clean up duplicated plan. 2024-01-27 20:51:55 +00:00
Florian Hahn
ec402a2e53
[VPlan] Implement cloning of VPlans. (#73158)
This patch implements cloning for VPlans and recipes. Cloning is used in
the epilogue vectorization path, to clone the VPlan for the main vector
loop. This means we won't re-use a VPlan when executing the VPlan for
the epilogue vector loop, which in turn will enable us to perform
optimizations based on UF & VF.
2024-01-27 13:30:52 +00:00
David Sherwood
962fbafecf
[LoopVectorize] Refine runtime memory check costs when there is an outer loop (#76034)
When we generate runtime memory checks for an inner loop it's
possible that these checks are invariant in the outer loop and
so will get hoisted out. In such cases, the effective cost of
the checks should reduce to reflect the outer loop trip count.

This fixes a 25% performance regression introduced by commit

49b0e6dcc296792b577ae8f0f674e61a0929b99d

when building the SPEC2017 x264 benchmark with PGO, where we
decided the inner loop trip count wasn't high enough to warrant
the (incorrect) high cost of the runtime checks. Also, when
runtime memory checks consist entirely of diff checks these are
likely to be outer loop invariant.
2024-01-26 14:43:48 +00:00
Florian Hahn
0ab539fd67
[VPlan] Add new VPScalarCastRecipe, use for IV & step trunc. (#78113)
Add a new recipe to model scalar cast instructions, without relying on
an underlying instruction.

This allows creating scalar casts, without relying on an underlying
instruction (like the current VPReplicateRecipe). The new recipe is 
used to explicitly model both truncating the induction step and the
VPDerivedIVRecipe, thus simplifying both the recipe and code
needed to introduce it.

Truncating VPWidenIntOrFpInductionRecipes should also be modeled using
the new recipe, as follow-up.

PR: https://github.com/llvm/llvm-project/pull/78113
2024-01-26 11:13:05 +00:00
Florian Hahn
a04f615291
[LV] Check for innermost loop instead of EnableVPlanNativePath in CM.
Replace EnableVPlanNativePath checks in the cost-model by assertions
that the code is only called for innermost loops. This ensures that the
cost model isn't used in the VPlanNativePath, which is only used for
outer-loop vectorization.

Even with EnableVPlanNativePath, inner loops are processed by the
inner loop vectorization path, not the native path, so checking for
EnableVPlanNativePath may impact decisions for inner loops and can
cause crashes, like in the attached test case.
2024-01-25 12:49:52 +00:00
Florian Hahn
42fb1fac9e
[VPlan] Use DebugLoc from recipe in VPWidenCallRecipe (NFCI).
Instead of using the debug location of the underlying instruction, use
the debug location from the recipe. This removes an unneeded dependency
of the underlying instruction.
2024-01-19 13:33:03 +00:00
Florian Hahn
abdb61f5fd
[VPlan] Introduce VPSingleDefRecipe. (#77023)
This patch introduces a new common base class for recipes defining a
single result VPValue. This has been discussed/mentioned at various
previous reviews as potential follow-up and helps to replace various
getVPSingleValue calls.

PR: https://github.com/llvm/llvm-project/pull/77023
2024-01-19 10:27:53 +00:00
Paschalis Mpeis
37c87d5689
[LV][AArch64] LoopVectorizer allows scalable frem instructions (#76247)
LoopVectorizer is aware when a target can replace a scalable frem
instruction with a vector library call for a given VF and it returns the
relevant cost. Otherwise, it returns an invalid cost (as previously).

Add test that check costs on AArch64, when there is no vector library
available and when there is (with and without tail-folding).

NOTE: Invoking CostModel directly (not through LV) would still return
invalid costs.
2024-01-18 08:32:53 +00:00
Florian Hahn
e7671bc9d6
[LV] Fix indent for loop in adjustRecipesForReductions (NFC). 2024-01-16 15:28:46 +00:00
Nikita Popov
6c2fbc3a68
[IRBuilder] Add CreatePtrAdd() method (NFC) (#77582)
This abstracts over the common pattern of creating a gep with i8 element
type.
2024-01-12 14:21:21 +01:00
Craig Topper
1c342571b8
[LV] Use value_or to simplify code. NFC (#77030) 2024-01-10 12:40:26 -08:00
Florian Hahn
8b7bbedec7
[LV] Re-add early exit in VPRecipeBuilder::createBlockInMask.
Re-add early exit that was accidentally dropped in  51afb10.
2024-01-10 15:02:14 +00:00
Florian Hahn
51afb10174
[LV] Create block in mask up-front if needed. (#76635)
At the moment, block and edge masks are created on demand, which means
that they are inserted at the point where they are demanded and then
cached. It is possible that the mask for a block is looked up later at a
point that's not dominated by the point where the mask has been
inserted.

To avoid this, create masks up front on entry to the corresponding basic
block and leave it to VPlan simplification to remove unneeded masks.

Note that we need to create masks for all blocks, if any of the blocks
in the loop needs predication, as computing the mask of a block depends
on the masks of its predecessor.

Needed for #76090.

https://github.com/llvm/llvm-project/pull/76635
2024-01-09 10:50:08 +00:00
Florian Hahn
18ec3304a9
[VPlan] Manage InBounds via VPRecipeWithIRFlags for VectorPtrRecipe.
As suggested as follow-up in
https://github.com/llvm/llvm-project/pull/72164, manage inbounds via
VPRecipeWithIRFlags.

Note that in some cases we can now preserve inbounds in a few more
cases.
2024-01-07 13:58:05 +00:00
Florian Hahn
241fe83704
[VPlan] Introduce ComputeReductionResult VPInstruction opcode. (#70253)
This patch introduces a new ComputeReductionResult opcode to compute the
final reduction result in the middle block. The code from fixReduction
has been moved to ComputeReductionResult, after some earlier cleanup
changes to model parts of fixReduction explicitly elsewhere as needed.

The recipe may be broken down further in the future.

Note that  the phi nodes to merge the reduction result from the trip 
count check and the middle block, to be used as resume value for the
scalar remainder loop are also generated based on 
ComputeReductionResult.

Once we have a VPValue for the reduction result, this can also be
modeled explicitly and moved out of the recipe.
2024-01-04 22:53:18 +00:00
Nilanjana Basu
cd28da390f
[LV] Change loops' interleave count computation (#73766)
[LV] Change loops' interleave count computation

A set of microbenchmarks in llvm-test-suite (https://github.com/llvm/llvm-test-suite/pull/56), when tested on a AArch64 platform, demonstrates that loop interleaving is beneficial when the vector loop runs at least twice or when the epilogue loop trip count (TC) is minimal. Therefore, we choose interleaving count (IC) between TC/VF & TC/2*VF (VF = vectorization factor), such that remainder TC for the epilogue loop is minimum while the IC is maximum in case the remainder TC is same for both.

The initial tests for this change were submitted in PRs:
https://github.com/llvm/llvm-project/pull/70272 and https://github.com/llvm/llvm-project/pull/74689.
2024-01-04 12:45:22 +05:30
Florian Hahn
6dda74cc51
[VPlan] Use createSelect in adjustRecipesForReductions (NFCI).
Simplify the code and rename Result->NewExitingVPV as suggested by
@ayalz in https://github.com/llvm/llvm-project/pull/70253.
2024-01-03 20:54:10 +00:00
Florian Hahn
f18536d642
[VPlan] Model address separately. (#72164)
Move vector pointer generation to a separate VPVectorPointerRecipe.
This untangles address computation from the memory recipes future
and is also needed to enable explicit unrolling in VPlan.

https://github.com/llvm/llvm-project/pull/72164
2024-01-01 19:51:15 +00:00
Florian Hahn
516cc98aff
[LV] Fix typo in comment (NFC). 2023-12-28 21:20:10 +00:00
Florian Hahn
a5891fa4d2
[VPlan] Initial modeling of VF * UF as VPValue. (#74761)
This patch starts initial modeling of VF * UF in VPlan.
Initially, introduce a dedicated VFxUF VPValue, which is then
populated during VPlan::prepareToExecute. Initially, the VF * UF
applies only to the main vector loop region. Once we extend the
scope of VPlan in the future, we may want to associate different VFxUFs
with different vector loop regions (e.g. the epilogue vector loop)

This allows explicitly parameterizing recipes that rely on the
VF * UF, like the canonical induction increment. At the moment, this
mainly helps to avoid generating some duplicated calls to vscale with
scalable vectors. It should also allow using EVL as induction increments
explicitly in D99750. Referring to VF * UF is also needed in other
places that we plan to migrate to VPlan, like the minimum trip count
check during skeleton creation.

The first version creates the value for VF * UF directly in
prepareToExecute to limit the scope of the patch. A follow-on patch will
model VF * UF computation explicitly in VPlan using recipes.

Moved from Phabricator (https://reviews.llvm.org/D157322)
2023-12-08 18:30:30 +00:00
Graham Hunter
d0d5ef8133
[LV] Add support for linear arguments for vector function variants (#73941)
If we have vectorized variants of a function which take linear
parameters, we should be able to vectorize assuming the strides match.
2023-12-08 10:24:05 +00:00
Alexey Bataev
056367bb19
[LV]Support dropping of nneg flag for zext widencast recipes. (#74112)
Compiler crashes when the assertion triggered for zext nneg instruction,
that checks that the instruction cannot produce poison. Changed the base
class for widencast recipe to handle dropping nneg flag to avoid
compiler crash.
2023-12-05 09:17:23 -05:00
Florian Hahn
70535f5e60
[VPlan] Replace IR based truncateToMinimalBitwidths with VPlan version.
This patch replaces the IR based truncateToMinimalBitwidths with a VPlan
version. This has 3 benefits:
1) the VPlan-based version is simpler; we don't need to implement
   special codegen for each supported instruction type like the IR based
   one.
2) Removes a dependency on the cost-model after VPlan execution and
3) Removes a use of getVPValue that uses underlying values after VPlan
   execution (See removed FIXME).

Depends on D149081.

Depends on D149079.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D149903
2023-12-02 16:12:38 +00:00
Florian Hahn
cbf7b52a65
[VPlan] Properly update reduction live-out after placing select.
After inserting a select for the final value, update the VPlan def-use
chains. At the moment, the incorrect live-out doesn't cause a
mis-compile, as computing the final reduction value is not yet modeled
in VPlan.
2023-12-02 15:22:09 +00:00
Graham Hunter
104b7c624e
[LV] Add support for uniform parameters on vectorized function variants (#72891)
Parameters marked as uniform take a scalar value, assuming the value is
invariant in the scalar loop.
2023-11-28 15:01:32 +00:00
Florian Hahn
906f598263
[VPlan] Remove dead IsEpilogueVec argument from prepareToExecute (NFC). 2023-11-23 16:59:50 +00:00
Matthias Braun
2404477219
LoopVectorize: Add better heuristic for vectorized epilogue skip test (#72589)
This is a follow-up to PR #72450 correcting the branch_weights used
for the test whether the vectorized epilogue loop should be skipped.
2023-11-20 11:02:27 -08:00
Graham Hunter
4d64a2bcd3
[LV] Refactor vector function variant selection to prepare for uniform args (#68879)
Parameters marked as uniform take a scalar value, assuming the value is
invariant in the scalar loop. In order to support this, we need to stop
asking for a vector function variant with a default shape assuming that all
arguments will become vector arguments, and instead consider all available
variants and their parameter types.
2023-11-20 13:30:03 +00:00
Florian Hahn
7fd021a092
[LV] Don't crash on vector masks during scalar VPReductionRecipe::exec.
VPReductionRecipe may be executed for scalar VFs. Make sure to access
part 0 of the condition, as it could be an active-lane-mask, which is a
vector <1 x i1>

Fixes https://github.com/llvm/llvm-project/issues/72720.
2023-11-18 21:52:22 +00:00
Florian Hahn
2a9aed1730
[LV] Retain mask-reversal comment as suggested after e5e71af.
Address post-commit comment to retain comment.
2023-11-18 20:53:23 +00:00
Florian Hahn
e5e71affb7
[LV] Reverse mask up front, not when creating vector pointer. (#72163)
Reverse mask early on when populating BlockInMask. This will enable
separating mask management and address computation from the memory
recipes in the future and is also needed to enable explicit unrolling in
VPlan.
2023-11-17 13:59:35 +00:00
Nikita Popov
de176d8c54
[SCEV][LV] Invalidate LCSSA exit phis more thoroughly (#69909)
This an alternative to #69886. The basic problem is that SCEV can look
through trivial LCSSA phis. When the phi node later becomes non-trivial,
we do invalidate it, but this doesn't catch uses that are not covered by
the IR use-def walk, such as those in BECounts.

Fix this by adding a special invalidation method for LCSSA phis, which
will also invalidate all the SCEVUnknowns/SCEVAddRecExprs used by the
LCSSA phi node and defined in the loop.

We should probably also use this invalidation method in other places
that add predecessors to exit blocks, such as loop unrolling and loop
peeling.

Fixes #69097.
Fixes #66616.
Fixes #63970.
2023-11-17 09:34:24 +01:00