172 Commits

Author SHA1 Message Date
Florian Hahn
dadf6f2c5a
[VPlan] Ignore incoming values with constant false mask. (#89384)
Ignore incoming values with constant false masks when trying to simplify
VPBlendRecipes.

As a follow-on optimization, we should also be able to drop all incoming
values with false masks by creating a new VPBlendRecipe with those
operands dropped.

PR: https://github.com/llvm/llvm-project/pull/89384
2024-04-23 13:59:01 +01:00
Florian Hahn
17fb3e82f6
[VPlan] Skip extending ICmp results in trunateToMinimalBitwidth.
Results of icmp don't need extending after truncating their operands, as
the result will always be i1. Skip them during extending.

Fixes https://github.com/llvm/llvm-project/issues/79742
Fixes https://github.com/llvm/llvm-project/issues/85185
2024-04-23 11:50:26 +01:00
Florian Hahn
e2a72fa583
[VPlan] Introduce recipes for VP loads and stores. (#87816)
Introduce new subclasses of VPWidenMemoryRecipe for VP
(vector-predicated) loads and stores to address multiple TODOs from
https://github.com/llvm/llvm-project/pull/76172

Note that the introduction of the new recipes also improves code-gen for
VP gather/scatters by removing the redundant header mask. With the new
approach, it is not sufficient to look at users of the widened canonical
IV to find all uses of the header mask.

In some cases, a widened IV is used instead of separately widening the
canonical IV. To handle that, first collect all VPValues representing header
masks (by looking at users of both the canonical IV and widened inductions
that are canonical) and then checking all users (recursively) of those header
masks.

Depends on https://github.com/llvm/llvm-project/pull/87411.

PR: https://github.com/llvm/llvm-project/pull/87816
2024-04-19 09:44:23 +01:00
Florian Hahn
5d314353fb
[VPlan] Check for VPWidenLoadRecipe directly in truncateToMinBW. (NFCI).
Since ne
After a separate recipe has been introduced for wide loads in
a9bafe91dd0, we can directly check for load recipes in the early
bail-out and remove the redundant bail out for stores.
2024-04-17 15:53:32 +01:00
Florian Hahn
41b7341d6b
[VPlan] Factor out helper to recursively collect all users (NFCI).
Factor out logic to collect all users recursively to be re-used
in https://github.com/llvm/llvm-project/pull/87816.
2024-04-17 14:56:47 +01:00
Florian Hahn
a9bafe91dd
[VPlan] Split VPWidenMemoryInstructionRecipe (NFCI). (#87411)
This patch introduces a new VPWidenMemoryRecipe base class and distinct
sub-classes to model loads and stores.

This is a first step in an effort to simplify and modularize code
generation for widened loads and stores and enable adding further more
specialized memory recipes.

PR: https://github.com/llvm/llvm-project/pull/87411
2024-04-17 11:00:58 +01:00
Alexey Bataev
413a66f339
[LV, VP]VP intrinsics support for the Loop Vectorizer + adding new tail-folding mode using EVL. (#76172)
This patch introduces generating VP intrinsics in the Loop Vectorizer.

Currently the Loop Vectorizer supports vector predication in a very
limited capacity via tail-folding and masked load/store/gather/scatter
intrinsics. However, this does not let architectures with active vector
length predication support take advantage of their capabilities.
Architectures with general masked predication support also can only take
advantage of predication on memory operations. By having a way for the
Loop Vectorizer to generate Vector Predication intrinsics, which (will)
provide a target-independent way to model predicated vector
instructions. These architectures can make better use of their
predication capabilities.

Our first approach (implemented in this patch) builds on top of the
existing tail-folding mechanism in the LV (just adds a new tail-folding
mode using EVL), but instead of generating masked intrinsics for memory
operations it generates VP intrinsics for loads/stores instructions. The
patch adds a new VPlanTransforms to replace the wide header predicate
compare with EVL and updates codegen for load/stores to use VP
store/load with EVL.

Other important part of this approach is how the Explicit Vector Length
is computed. (VP intrinsics define this vector length parameter as
Explicit Vector Length (EVL)). We use an experimental intrinsic
`get_vector_length`, that can be lowered to architecture specific
instruction(s) to compute EVL.

Also, added a new recipe to emit instructions for computing EVL. Using
VPlan in this way will eventually help build and compare VPlans
corresponding to different strategies and alternatives.

Differential Revision: https://reviews.llvm.org/D99750
2024-04-04 18:30:17 -04:00
Florian Hahn
7bd163d0a4
[VPlan] Clean up dead recipes after UF & VF specific simplification.
Recursively remove dead recipes after simplifying vector loop exit
branch.
2024-04-04 12:05:08 +01:00
Florian Hahn
e329b68413
[VPlan] Factor out logic to check if recipe is dead (NFCI).
In preparation to use the helper in more places.
2024-04-03 14:22:41 +01:00
Florian Hahn
e701c1a653
[VPlan] Use recipe's debug loc for VPWidenMemoryInstructionRecipe (NFCI)
Now that VPRecipeBase manages debug locations for recipes, use it in
VPWidenMemoryInstructionRecipe.
2024-04-01 12:07:30 +01:00
Florian Hahn
8a614c1d31
[VPlan] Rename getVPValueOrAddLiveIn -> getOrAddLiveIn (NFCI).
The helper now only deals with live-ins, clarify the name.
2024-03-28 21:02:15 +00:00
Florian Hahn
6ef829941b
Recommit "[VPlan] Replace disjoint or with add instead of dropping disjoint. (#83821)"
Recommit with a fix for the use-after-free causing the revert.
This reverts the revert commit f872043e055f4163c3c4b1b86ca0354490174987.

Original commit message:

Dropping disjoint from an OR may yield incorrect results, as some
analysis may have converted it to an Add implicitly (e.g. SCEV used for
dependence analysis). Instead, replace it with an equivalent Add.

This is possible as all users of the disjoint OR only access lanes where
the operands are disjoint or poison otherwise.

Note that replacing all disjoint ORs with ADDs instead of dropping the
flags is not strictly necessary. It is only needed for disjoint ORs that
SCEV treated as ADDs, but those are not tracked.

There are other places that may drop poison-generating flags; those
likely need similar treatment.

Fixes https://github.com/llvm/llvm-project/issues/81872

PR: https://github.com/llvm/llvm-project/pull/83821
2024-03-27 19:11:18 +00:00
Florian Hahn
06bb8c9f20
[VPlan] Explicitly handle scalar pointer inductions. (#83068)
Add a new PtrAdd opcode to VPInstruction that corresponds to
IRBuilder::CreatePtrAdd, which creates a GEP with source element type
i8.

This is then used to model scalarizing VPWidenPointerInductionRecipe by
introducing scalar-steps to model the index increment followed by a
PtrAdd.

Note that PtrAdd needs to be able to generate code for only the first
lane or for all lanes. This may warrant introducing a separate recipe
for scalarizing that can be created without relying on the underlying
IR.

Depends on https://github.com/llvm/llvm-project/pull/80271

PR: https://github.com/llvm/llvm-project/pull/83068
2024-03-26 16:01:57 +01:00
Florian Hahn
39c8e87717
[VPlan] Move recording of Inst->VPValue to VPRecipeBuilder (NFCI). (#84464)
Instead of keeping a mapping of Inst->VPValues (of their corresponding
recipes) in VPlan's Value2VPValue mapping, keep it in VPRecipeBuilder
instead. After recently replacing the last user of this mapping after
initial construction, this mapping is only needed for recipe
construction (to map IR operands to VPValue operands).

By moving the mapping, VPlan's VPValue tracking can be simplified and
limited only to live-ins. It also allows removing disableValue2VPValue
and associated machinery & asserts.

PR: https://github.com/llvm/llvm-project/pull/84464
2024-03-23 18:43:14 +01:00
Benjamin Kramer
f872043e05 Revert "[VPlan] Replace disjoint or with add instead of dropping disjoint. (#83821)"
This reverts commit c2c1e6ee4ce0df3d000ba880fa6cf58441da6462. It creates
a use after free.

==8342==ERROR: AddressSanitizer: heap-use-after-free on address 0x50f000001760 at pc 0x55b9fb84a8fb bp 0x7ffc18468a10 sp 0x7ffc18468a08
READ of size 1 at 0x50f000001760 thread T0
 #0 0x55b9fb84a8fa in dropPoisonGeneratingFlags llvm/lib/Transforms/Vectorize/VPlan.h:1040:13
 #1 0x55b9fb84a8fa in llvm::VPlanTransforms::dropPoisonGeneratingRecipes(llvm::VPlan&, llvm::function_ref<bool (llvm::BasicBlock*)>)::$_0::operator()(llvm::VPRecipeBase*) const llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp:1236:23
 #2 0x55b9fb84a196 in llvm::VPlanTransforms::dropPoisonGeneratingRecipes(llvm::VPlan&, llvm::function_ref<bool (llvm::BasicBlock*)>) llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp

Can be reproduced with asan on
Transforms/LoopVectorize/AArch64/sve-interleaved-masked-accesses.ll
Transforms/LoopVectorize/X86/pr81872.ll
Transforms/LoopVectorize/X86/x86-interleaved-accesses-masked-group.ll
2024-03-20 15:14:58 +01:00
Florian Hahn
c2c1e6ee4c
[VPlan] Replace disjoint or with add instead of dropping disjoint. (#83821)
Dropping disjoint from an OR may yield incorrect results, as some
analysis may have converted it to an Add implicitly (e.g. SCEV used for
dependence analysis). Instead, replace it with an equivalent Add.

This is possible as all users of the disjoint OR only access lanes where
the operands are disjoint or poison otherwise.

Note that replacing all disjoint ORs with ADDs instead of dropping the
flags is not strictly necessary. It is only needed for disjoint ORs that
SCEV treated as ADDs, but those are not tracked.

There are other places that may drop poison-generating flags; those
likely need similar treatment.

Fixes https://github.com/llvm/llvm-project/issues/81872


PR: https://github.com/llvm/llvm-project/pull/83821
2024-03-19 20:16:18 +01:00
Florian Hahn
fd93a5e3c0
[VPlan] Support match unary and binary recipes in pattern matcher (NFC).
Generalize pattern matchers to take recipe types to match as template
arguments and use it to provide matchers for unary and binary recipes
with specific opcodes and a list of recipe types (VPWidenRecipe,
VPReplicateRecipe, VPWidenCastRecipe, VPInstruction)

The new matchers are used to simplify and generalize the code in
simplifyRecipes.
2024-03-18 14:24:52 +00:00
Florian Hahn
c07c1c47d3
[VPlan] Remove redundant cast (NFCI).
SinkCandidate is a VPSingleDefRecipe now, so no cast is needed to access
getUnderlyingInstr directly.
2024-03-18 08:58:23 +00:00
Florian Hahn
f1015d1701
[VPlan] Use VPBuilder to create ActiveLaneMask (NFC). 2024-03-13 16:08:02 +00:00
Philip Reames
df9ba13579
[LV] Handle scalable VFs in optimizeForVFAndUF (#82669)
Given a scalable VF of the form <NumElts * VScale>, this patch adds the
ability to discharge a backedge test for a loop whose trip count is
between (NumElts, MinVScale*NumElts).

A couple of notes on this:
* Annoyingly, I could not figure out to write a test for this case. My
attempt is checked in as test32_i8 in f67ef1a, but LV uses a fixed
vector in that case, and ignored the force flags.
* This depends on 9eb5f94f to avoid appearing like a regression. Since
SCEV doesn't know any upper bound on vscale without the vscale_range
attribute (it doesn't query TTI), the ranges overflow on the multiply.
Arguably, this is fixing a bug in the current LV code since in theory
vscale can be large enough to overflow for real, but no actual target is
going to see that case.
2024-03-04 13:49:35 -08:00
Florian Hahn
2435dcd83a
[VPlan] Add initial pattern match implementation for VPInstruction. (#80563)
Add an initial version of a pattern match for VPValues and recipes,
starting with VPInstruction.

PR: https://github.com/llvm/llvm-project/pull/80563
2024-03-03 21:48:58 +00:00
Florian Hahn
3d66d6932e
[VPlan] Support live-ins without underlying IR in type analysis. (#80723)
A VPlan contains multiple live-ins without underlying IR, like VFxUF or
VectorTripCount. Trying to infer the scalar type of those causes a crash
at the moment.

Update VPTypeAnalysis to take a VPlan in its constructor and assign
types to those live-ins up front. All those live-ins share the type of
the canonical IV.

PR: https://github.com/llvm/llvm-project/pull/80723
2024-02-21 19:37:15 +00:00
Florian Hahn
20177c45db
[VPlan] Turn private members of VPlanTransforms to static funcs (NFC)
Private members of VPlanTransforms are only used inside
VPlanTransforms.cpp, just make them static.
2024-02-17 13:45:23 +00:00
Florian Hahn
0dacba3ad1
[VPlan] Handle truncating ICMPs in truncateToMinimalBWs.
Update truncateToMinimalBitwidths to handle truncating ICMPs. For ICMPs,
the new target type will be the same as the original type. In that case,
only truncate the operands, but skip the extend. This is in line with
what the original truncateToMinimalBitwidths did for compares.

Fixes https://github.com/llvm/llvm-project/issues/81415.
2024-02-16 12:58:56 +00:00
Florian Hahn
debca7ee43
[VPlan] Move dropping of poison flags to VPlanTransforms. (NFC)
Move collectPoisonGeneratingFlags from InnerLoopVectorizer to
VPlanTransforms and also update its name. collectPoisonGeneratingFlags
already directly drops poison-generating flags, not only collecting it.
This means it is more appropriate to integerate it directly into the
VPlan transform pipeline.

The current implementation still calls back to legal to check if a block
needs predication, which should be improved in the future.
2024-02-14 12:28:58 +00:00
Florian Hahn
cec24f0d7e
[VPlan] Update stale test after 9536a6286, fix formatting. 2024-01-31 13:45:38 +00:00
Florian Hahn
9536a6286e
[VPlan] Preserve original induction order when creating scalar steps.
Update createScalarIVSteps to take an insert point as parameter. This
ensures that the inserted scalar steps are in the same order as the
recipes they replace (vs in reverse order as currently). This helps to
reduce the diff for follow-up changes.
2024-01-31 13:31:28 +00:00
Florian Hahn
743946e8ef
[VPlan] Replace VPRecipeOrVPValue with VP2VP recipe simplification. (#76090)
Move simplification of VPBlendRecipes from early VPlan construction to
VPlan-to-VPlan based recipe simplification. This simplifies initial
construction.

Note that some in-loop reduction tests are failing at the moment, due to
the reduction predicate being created after the reduction recipe. I will
provide a patch for that soon.

PR: https://github.com/llvm/llvm-project/pull/76090
2024-01-29 09:52:05 +00:00
Florian Hahn
0ab539fd67
[VPlan] Add new VPScalarCastRecipe, use for IV & step trunc. (#78113)
Add a new recipe to model scalar cast instructions, without relying on
an underlying instruction.

This allows creating scalar casts, without relying on an underlying
instruction (like the current VPReplicateRecipe). The new recipe is 
used to explicitly model both truncating the induction step and the
VPDerivedIVRecipe, thus simplifying both the recipe and code
needed to introduce it.

Truncating VPWidenIntOrFpInductionRecipes should also be modeled using
the new recipe, as follow-up.

PR: https://github.com/llvm/llvm-project/pull/78113
2024-01-26 11:13:05 +00:00
Florian Hahn
42fb1fac9e
[VPlan] Use DebugLoc from recipe in VPWidenCallRecipe (NFCI).
Instead of using the debug location of the underlying instruction, use
the debug location from the recipe. This removes an unneeded dependency
of the underlying instruction.
2024-01-19 13:33:03 +00:00
Florian Hahn
abdb61f5fd
[VPlan] Introduce VPSingleDefRecipe. (#77023)
This patch introduces a new common base class for recipes defining a
single result VPValue. This has been discussed/mentioned at various
previous reviews as potential follow-up and helps to replace various
getVPSingleValue calls.

PR: https://github.com/llvm/llvm-project/pull/77023
2024-01-19 10:27:53 +00:00
Florian Hahn
59d6f033a2
[VPlan] Support narrowing widened loads in truncateToMinimimalBitwidths.
MinBWs may also contain widened load instructions, handle them by only
narrowing their result.

Fixes https://github.com/llvm/llvm-project/issues/77468
2024-01-12 13:14:13 +00:00
Florian Hahn
2ab5c47c87
[VPlan] Don't replace scalarizing recipe with VPWidenCastRecipe.
Don't replace a scalarizing recipe with a VPWidenCastRecipe. This would
introduce wide (vectorizing) recipes when interleaving only.

Fixes https://github.com/llvm/llvm-project/issues/76986
2024-01-04 20:39:44 +00:00
Florian Hahn
cb56ba6350
[VPlan] Unswitch cond in replaceUsesWithIf in optimizeInductions (NFC)
As suggested post-commit for a00227197, unswitch the condition in
replaceUsesWithIf to simplify the check.
2023-12-15 20:26:36 +00:00
Florian Hahn
9277ef12c0
[VPlan] Remove stale comment from optimizeInductions (NFC).
As suggested post-commit for a00227197, remove the stale comment,
SetVector is no longer used here.
2023-12-15 17:35:13 +00:00
Alexey Bataev
056367bb19
[LV]Support dropping of nneg flag for zext widencast recipes. (#74112)
Compiler crashes when the assertion triggered for zext nneg instruction,
that checks that the instruction cannot produce poison. Changed the base
class for widencast recipe to handle dropping nneg flag to avoid
compiler crash.
2023-12-05 09:17:23 -05:00
Florian Hahn
cd4348349a
[VPlan] Sink cases where no truncate is needed in truncateMinimalBWs.
MinBWs contains entries that specify the minimum required bitwidth. In
some cases, the old and new bitwidths can be equal (see test case) and
in those cases no truncations are needed, so skip those cases.

Fixes #74307.
2023-12-04 15:35:54 +00:00
Florian Hahn
c890582912
[VPlan] Account for live-in entries in MinBW used by replicate recipes.
In some cases MinBWs may contain entries for live-ins that are not used
by VPWidenRecipe or VPWidenSelectRecipes. In those cases, the live-ins
won't get processed, so make sure we include them in the count when used
as operands in VPWidenCast and VPWidenSelectRecipe.

Fixes https://github.com/llvm/llvm-project/issues/74231
2023-12-03 11:15:29 +00:00
Kazu Hirata
0008b9c0ac [Vectorize] Fix an unused variable warning
This patch fixes:

  llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp:912:16: error:
  unused variable 'OldResSizeInBits' [-Werror,-Wunused-variable]
2023-12-02 11:20:57 -08:00
Florian Hahn
70535f5e60
[VPlan] Replace IR based truncateToMinimalBitwidths with VPlan version.
This patch replaces the IR based truncateToMinimalBitwidths with a VPlan
version. This has 3 benefits:
1) the VPlan-based version is simpler; we don't need to implement
   special codegen for each supported instruction type like the IR based
   one.
2) Removes a dependency on the cost-model after VPlan execution and
3) Removes a use of getVPValue that uses underlying values after VPlan
   execution (See removed FIXME).

Depends on D149081.

Depends on D149079.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D149903
2023-12-02 16:12:38 +00:00
Florian Hahn
6f3b88baa2
[VPlan] Move trunc ([s|z]ext A) simplifications to simplifyRecipe.
Split off simplification from D149903 as suggested.

This should be effectively NFC until D149903 lands.
2023-11-16 21:17:10 +00:00
Florian Hahn
097ba5366c
[VPlan] Use VPTypeInfo in simplifyRecipes.
Replace getTypeForVPValue with the recently added, more general
VPTypeAnalysis.
2023-11-15 15:28:51 +00:00
Florian Hahn
a002271972
[VPlan] Add VPValue::replaceUsesWithIf (NFCI).
Add replaceUsesWithIf helper and use it in a few places.
2023-11-06 16:08:22 +00:00
Florian Hahn
cff6652129
[VPlan] Handle VPValues without underlying values in getTypeForVPValue.
Fixes a crash after 0c8e5be6fa08.

Full type inference will be added in
https://github.com/llvm/llvm-project/pull/69013
2023-10-27 13:34:54 +01:00
Florian Hahn
0c8e5be6fa
[VPlan] Simplify redundant trunc (zext A) pairs to A.
Add simplification for redundant trunc(zext A) pairs. Generally apply a
transform from D149903.

Depends on D159200.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D159202
2023-10-22 11:41:38 +01:00
Mikael Holmen
9cecee97a0 [VPlan] Silence gcc Wparentheses warning [NFC]
Without the fix gcc warns about
../lib/Transforms/Vectorize/VPlanTransforms.cpp:968:42: warning: suggest parentheses around '&&' within '||' [-Wparentheses]
  968 |          UseActiveLaneMaskForControlFlow &&
      |          ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~
  969 |              "DataAndControlFlowWithoutRuntimeCheck implies "
      |              ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  970 |              "UseActiveLaneMaskForControlFlow");
      |              ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
2023-09-28 12:04:26 +02:00
Florian Hahn
97687b7aea
[VPlan] Add active-lane-mask as VPlan-to-VPlan transformation.
This patch updates the mask creation code to always create compares of
the form (ICMP_ULE, wide canonical IV, backedge-taken-count) up front
when tail folding and introduce active-lane-mask as later
transformation.

This effectively makes (ICMP_ULE, wide canonical IV, backedge-taken-count)
the canonical form for tail-folding early on. Introducing more specific
active-lane-mask recipes is treated as a VPlan-to-VPlan optimization.

This has the advantage of keeping the logic  (and complexity) of
introducing active-lane-mask recipes in a single place, instead of
spreading the logic out across multiple functions. It also simplifies
initial VPlan construction and enables treating introducing EVL as
similar optimization.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D158779
2023-09-25 13:34:45 +01:00
Florian Hahn
f108c6cdc1
[VPlan] Fold (MUL A, 1) -> A as VPlan2VPlan transform.
Add first VPlan-based recipe simplification to fold (MUL A, 1) -> A.
Among other things, this enables additional simplifications after
applying versioned strides, as follow up to D147783.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D159200
2023-09-18 21:45:34 +01:00
Florian Hahn
aacaf3d580
[VPlan] Simplify VPDerivedIV truncation handling (NFCI).
Address post-commit simplification suggestion for 8a56179bcd8c: Replace
IsTruncated by conditionally setting TruncResultTy only if truncation
is required.
2023-08-14 17:33:10 +01:00
Florian Hahn
b223229e2c
[VPlan] Re-use existing step again after 34accad1feae.
This fixes a failing RISCV test case that was missed originally.
2023-08-08 21:42:56 +01:00