429 Commits

Author SHA1 Message Date
Florian Hahn
6c8f41d336
[VPlan] Hook IR blocks into VPlan during skeleton creation (NFC) (#114292)
As a first step to move towards modeling the full skeleton in VPlan,
start by wrapping IR blocks created during legacy skeleton creation in
VPIRBasicBlocks and hook them into the VPlan. This means the skeleton
CFG is represented in VPlan, just before execute. This allows moving
parts of skeleton creation into recipes in the VPBBs gradually.

Note that this allows retiring some manual DT updates, as this will be
handled automatically during VPlan execution.

PR: https://github.com/llvm/llvm-project/pull/114292
2024-12-12 15:58:16 +00:00
Mel Chen
b3cba9be41
[LoopVectorize] Vectorize select-cmp reduction pattern for increasing integer induction variable (#67812)
Consider the following loop:
```
  int rdx = init;
  for (int i = 0; i < n; ++i)
    rdx = (a[i] > b[i]) ? i : rdx;
```
We can vectorize this loop if `i` is an increasing induction variable.
The final reduced value will be the maximum of `i` that the condition
`a[i] > b[i]` is satisfied, or the start value `init`.

This patch added new RecurKind enums - IFindLastIV and FFindLastIV.

---------

Co-authored-by: Alexey Bataev <5361294+alexey-bataev@users.noreply.github.com>
2024-12-12 16:48:31 +08:00
Florian Hahn
5fae408d3a
[VPlan] Dispatch to multiple exit blocks via middle blocks. (#112138)
A more lightweight variant of
https://github.com/llvm/llvm-project/pull/109193,
which dispatches to multiple exit blocks via the middle blocks.

The patch also introduces a bit of required scaffolding to enable
early-exit vectorization, including an option. At the moment, early-exit
vectorization doesn't come with legality checks, and is only used if the
option is provided and the loop has metadata forcing vectorization. This
is only intended to be used for testing during bring-up, with @david-arm
enabling auto early-exit vectorization plugging in the changes from
https://github.com/llvm/llvm-project/pull/88385.

PR: https://github.com/llvm/llvm-project/pull/112138
2024-12-11 21:11:05 +00:00
LiqinWeng
b759020cc8
[LV][EVL] Support cast instruction with EVL-vectorization (#108351) 2024-12-11 10:01:41 +08:00
Florian Hahn
afef545efa
[VPlan] Address post-commit for #114305.
Apply suggested renaming and adjust placement as suggested in
https://github.com/llvm/llvm-project/pull/114305. Also drop unneeded
RPOT creation.
2024-12-08 21:24:19 +00:00
Florian Hahn
156da98683
[VPlan] Move printing final VPlan to ::execute (NFC).
This moves printing of the final VPlan to ::execute. This ensures the
final VPlan is printed, including recipes that get introduced by late,
lowering transforms and skeleton construction.

Split off from https://github.com/llvm/llvm-project/pull/114292, to
simplify the diff.
2024-12-07 09:39:10 +00:00
Florian Hahn
7f7f540a48
Reapply "[VPlan] Update scalar induction resume values in VPlan. (#110577)"
This reverts commit f09b16e2671cbcdf7cb7dc7ed705db092a9deda1.

The crash when building llvm-test-suite with stage2 should have been
fixed by 1091fad31a83d5ab87eb6fa11fe3bdb3f0d152ea.
2024-12-06 19:41:51 +00:00
Nikita Popov
f09b16e267 Revert "[VPlan] Update scalar induction resume values in VPlan. (#110577)"
This reverts commit 0678e2058364ec10b94560d27ec7138dfa003287.
This reverts commit 1091fad31a83d5ab87eb6fa11fe3bdb3f0d152ea.

Causes crashes in llvm-test-suite when using stage 2 clang.
2024-12-06 18:01:42 +01:00
Florian Hahn
1091fad31a
[VPlan] Fix stack-use-after-scope in VPInstruction::generate (NFC).
Fix stack-use-after-scope introduced in 0678e2058364ec by pulling out
the vector to a dedicated variable.

Should fix ASan/MSan failures, including
https://lab.llvm.org/buildbot/#/builders/169/builds/6111.
2024-12-06 16:50:54 +00:00
Florian Hahn
0678e20583
[VPlan] Update scalar induction resume values in VPlan. (#110577)
Updated ILV.createInductionResumeValues (now createInductionResumeVPValue)
to directly update the VPIRInstructions wrapping the original phis with the 
created resume values.

This is the first step towards modeling them completely in VPlan.
Subsequent patches will move creation of the resume values completely
into VPlan.

Depends on https://github.com/llvm/llvm-project/pull/109975.

PR: https://github.com/llvm/llvm-project/pull/110577
2024-12-06 12:26:19 +00:00
Florian Hahn
a7fda0e1e4
[VPlan] Introduce VPScalarPHIRecipe, use for can & EVL IV codegen (NFC). (#114305)
Introduce a general recipe to generate a scalar phi. Lower
VPCanonicalIVPHIRecipe and VPEVLBasedIVRecipe to VPScalarIVPHIrecipe
before plan execution, avoiding the need for duplicated ::execute
implementations. There are other cases that could benefit, including
in-loop reduction phis and pointer induction phis.

Builds on a similar idea as
https://github.com/llvm/llvm-project/pull/82270.

PR: https://github.com/llvm/llvm-project/pull/114305
2024-12-03 14:53:51 +00:00
Ramkumar Ramachandra
2a0ee090db
IVDesc: strip redundant arg in getOpcode call (NFC) (#118476) 2024-12-03 13:40:51 +00:00
Luke Lau
d9c269577e
[VPlan] Remove manual constant fold in VPWidenIntOrFpInductionRecipe. NFC (#118028)
This manual constant folding was added in 2017 in
https://reviews.llvm.org/D29956, but since then it looks like IRBuilder
has learnt to fold it away itself.
I'm not sure at what point this happened, I just verified this by
stepping through the call to CreateVectorSplat in the debugger.
2024-11-29 00:21:53 +01:00
LiqinWeng
4a3f46de50
[LV][EVL] Support call instruction with EVL-vectorization (#110412) 2024-11-28 10:05:08 +08:00
Florian Hahn
e2519b674c
[VPlan] Print incoming VPBB for Phi VPIRInstruction (NFC).
Print the incoming block for Phi VPIRInstructions, for better debugging
& testing.
2024-11-23 19:06:58 +00:00
Florian Hahn
320038579d
[VPlan] Return cost of PHI for scalar VFs in computeCost for FORs.
This fixes a crash when the VF is scalar.

Fixes https://github.com/llvm/llvm-project/issues/116375.
2024-11-21 21:11:21 +00:00
Finn Plummer
8663b8777e
[NFC][VectorUtils][TargetTransformInfo] Add isVectorIntrinsicWithOverloadTypeAtArg api (#114849)
This changes allows target intrinsics to specify and overwrite overloaded types.

- Updates `ReplaceWithVecLib` to not provide TTI as there most probably won't be a use-case
- Updates `SLPVectorizer` to use available TTI
- Updates `VPTransformState` to pass down TTI
- Updates `VPlanRecipe` to use passed-down TTI

This change will let us add scalarization for `asdouble`:  #114847
2024-11-21 11:04:25 -08:00
Noah Goldstein
8af5ae0648 [VPlan] Preserve IR flags when widening casts
We have `nneg` for both `sext` and `uitofp`.

Fixes #114856

Closes #115373
2024-11-08 17:21:05 -06:00
Kazu Hirata
aa825b74af
[Vectorize] Remove unused includes (NFC) (#114643)
Identified with misc-include-cleaner.
2024-11-03 08:58:51 -08:00
Florian Hahn
af6ebb70d2
[VPlan] Implement computeCost for remaining VPSingleDefRecipes.
Provide computeCost implementations for all remaining sub-classes of
VPSingleDefRecipe. This pushes one of the last uses of getLegacyCost
directly to its user: VPReplicateRecipe.
2024-11-02 19:38:31 +00:00
Florian Hahn
b021464d35
[VPlan] Introduce scalar loop header in plan, remove VPLiveOut. (#109975)
Update VPlan to include the scalar loop header. This allows retiring
VPLiveOut, as the remaining live-outs can now be handled by adding
operands to the wrapped phis in the scalar loop header.

Note that the current version only includes the scalar loop header, no
other loop blocks and also does not wrap it in a region block.

PR: https://github.com/llvm/llvm-project/pull/109975
2024-10-31 21:36:44 +01:00
Florian Hahn
680901ed80
[VPlan] Implement VPHeaderPHIRecipe::computeCost.
Fill out computeCost implementations for various header PHI recipes,
matching the legacy cost model for now.
2024-10-29 21:04:31 +00:00
Shih-Po Hung
266ff98cba
[LV][VPlan] Use VF VPValue in VPVectorPointerRecipe (#110974)
Refactors VPVectorPointerRecipe to use the VF VPValue to obtain the
runtime VF, similar to #95305.

Since only reverse vector pointers require the runtime VF, the patch
sets VPUnrollPart::PartOpIndex to 1 for vector pointers and 2 for
reverse vector pointers. As a result, the generation of reverse vector
pointers is moved into a separate recipe.
2024-10-26 23:18:50 +08:00
Florian Hahn
e724226da7
[VPlan] Return cost of 0 for VPWidenCastRecipe without underlying value.
In some cases, VPWidenCastRecipes are created but not considered in the
legacy cost model, including truncates/extends when evaluating a reduction
in a smaller type. Return 0 for such casts for now, to avoid divergences
between VPlan and legacy cost models.

Fixes https://github.com/llvm/llvm-project/issues/113526.
2024-10-25 21:25:44 +02:00
Elvis Wang
b3edc764f7
[VPlan] Implement VPWidenCastRecipe::computeCost(). (NFCI) (#111339)
This patch implement `VPWidenCastRecipe::computeCost()` and skip cast
recipies in the in-loop reduction.
2024-10-22 12:23:49 +08:00
Florian Hahn
a4819bd46d
[VPlan] Restore case accidentially dropped in 34cdd67c85.
34cdd67c85 accidentially dropped the case for VPWidenIntrinsicSC. Add it
back again. Thanks @Mel-Chen for spotting this.
2024-10-21 20:33:58 -07:00
Florian Hahn
1d9b3222f3
[VPlan] Implement VPWidenSelectRecipe::computeCost.
Implement VPlan-based cost computation for VPWidenSelectRecipe.
2024-10-22 03:10:04 +01:00
Florian Hahn
2a6b09e0d3
[LV] Use type from InsertPos for cost computation of interleave groups.
Previously the legacy cost model would pick the type for the cost
computation depending on the order of the members in the input IR.
This is incompatible with the VPlan-based cost model (independent of
original IR order) and also doesn't match code-gen, which uses the type
of the insert position.

Update the legacy cost model to use the type (and address space) from
the Group's insert position.

This brings the legacy cost model in line with the legacy cost model and
fixes a divergence between both models.

Note that the X86 cost model seems to assign different costs to groups
with i64 and double types. Added a TODO to check.

Fixes https://github.com/llvm/llvm-project/issues/112922.
2024-10-18 19:12:40 -07:00
Alexey Bataev
f148d5791b
[LV]Initial support for safe distance in predicated DataWithEVL vectorization mode.
Enabled initial support for max safe distance in DataWithEVL mode. If
max safe distance is required, need to emit special code:
CMP = icmp ult AVL, MAX_SAFE_DISTANCE
SAFE_AVL = select CMP, AVL, MAX_SAFE_DISTANCE
EVL = call i32 @llvm.experimental.get.vector.length(i64 SAFE_AVL)

while vectorize the loop in DataWithEVL tail folding mode.

Reviewers: fhahn

Reviewed By: fhahn

Pull Request: https://github.com/llvm/llvm-project/pull/102897
2024-10-18 15:51:49 -04:00
Florian Hahn
81bbe19383
[VPlan] Add VPSingleDefRecipe::dump() to resolve ambigous lookup (NFC).
This allows calling ::dump() on various sub-classes of VPSingleDefRecipe
directly, as it resolves an ambigous name lookup.

Previously, calling VPWidenRecipe::dump() (and others), would result in

the following errors:
llvm/unittests/Transforms/Vectorize/VPlanTest.cpp:1284:19: error: member 'dump' found in multiple base classes of different types
 1284 |           WidenR->dump();
      |                   ^
llvm/include/../lib/Transforms/Vectorize/VPlanValue.h:434:8: note: member found by ambiguous name lookup
  434 |   void dump() const;
      |        ^
llvm/include/../lib/Transforms/Vectorize/VPlanValue.h:108:8: note: member found by ambiguous name lookup
  108 |   void dump() const;
      |        ^
1 error generated.
2024-10-17 05:31:29 +01:00
Florian Hahn
3860e29e0e
[VPlan] Mark VPVectorPointerRecipe as not having sideeffects.
VectorPointer doesn't read from memory or have any sideeffects. Mark it
accordingly.
2024-10-16 06:10:19 +01:00
Florian Hahn
34cdd67c85
[VPlan] Use VPWidenIntrinsicRecipe to vp.select. (#110489)
Use VPWidenIntrinsicRecipe
(https://github.com/llvm/llvm-project/pull/110486)
to create vp.select intrinsics. This potentially offers an alternative
to duplicating EVL recipes for all existing recipes.

There are some recipes that will need duplicates (at least at the
moment), due to extra code-gen needs (e.g. widening loads and stores).
But in cases the intrinsic can directly be used, creating the widened
intrinsic directly would reduce the need to duplicate some recipes.


PR: https://github.com/llvm/llvm-project/pull/110489
2024-10-15 21:48:15 +01:00
Florian Hahn
2a46e5d039
[VPlan] Implement VPInterleaveRecipe::computeCost. (#106067)
Implement computing costs for VPInterleaveRecipe.

PR: https://github.com/llvm/llvm-project/pull/106067
2024-10-15 20:50:28 +01:00
Elvis Wang
3c91a2f73e
[VPlan] Implement VPReductionRecipe::computeCost(). NFC (#107790)
Implementation of `computeCost()` function for `VPReductionRecipe`.

Note that `in-loop` and `any-of` reductions are not supported by
VPlan-based cost model currently.
2024-10-15 15:44:37 +08:00
Rahul Joshi
fa789dffb1
[NFC] Rename Intrinsic::getDeclaration to getOrInsertDeclaration (#111752)
Rename the function to reflect its correct behavior and to be consistent
with `Module::getOrInsertFunction`. This is also in preparation of
adding a new `Intrinsic::getDeclaration` that will have behavior similar
to `Module::getFunction` (i.e, just lookup, no creation).
2024-10-11 05:26:03 -07:00
Florian Hahn
fa3258ecb8
[VPlan] Sink retrieving legacy costs to more specific computeCost impls. (#109708)
Make legacy cost retrieval independent of getInstructionForCost by
sinking it to more specific ::computeCost implementation (specifically
VPInterleaveRecipe::computeCost and VPSingleDefRecipe::computeCost).

Inline getInstructionForCost to VPRecipeBase::cost(), as it is now only
used to decide which recipes to skip during cost computation and when to
apply forced costs.

PR: https://github.com/llvm/llvm-project/pull/109708
2024-10-09 13:58:58 +01:00
Florian Hahn
01cbbc52dc
[VPlan] Request lane 0 for pointer arg in PtrAdd.
After 7f74651, the pointer operand may be replicated of a PtrAdd. Instead
of requesting a single scalar, request lane 0, which correctly handles the
case when there is a scalar-per-lane.

Fixes https://github.com/llvm/llvm-project/issues/111606.
2024-10-09 13:18:54 +01:00
Florian Hahn
6fbbe152fa
[VPlan] Introduce VPWidenIntrinsicRecipe to separate from libcall. (#110486)
This patch splits off intrinsic hanlding to a new
VPWidenIntrinsicRecipe. VPWidenIntrinsicRecipes only need access to the
intrinsic ID to widen and the scalar result type (in case the intrinsic
is overloaded on the result type). It does not need access to an
underlying IR call instruction or function.

This means VPWidenIntrinsicRecipe can be created easily without access
to underlying IR.
2024-10-08 22:37:20 +01:00
Florian Hahn
36fc291b6e
[VPlan] Implement VPBlendRecipe::computeCost.
Implement VPBlendRecipe::computeCost. VPBlendRecipe is currently is also
used if only the first lane is used.

This also requires pre-computing costs for forced scalars and
instructions considered profitable to scalarize. For those, the cost
will be computed separately in the legacy cost model. This will also be
needed when implementing VPReplicateRecipe::computeCost.
2024-10-08 21:33:42 +01:00
Florian Hahn
3829fd75c8
[VPlan] Remove redundant getVPSingleValue for VPSingleDefRecipes (NFC). 2024-10-08 20:31:41 +01:00
Florian Hahn
3ec6f805c5
[VPlan] Don't created GEP x, 0 for interleave group pointers.
The GEP with offet 0 is redundant, remove it. This addresses a TODO
from 7f74651837b ((#106431).
2024-10-08 12:08:13 +01:00
Florian Hahn
7f74651837
[VPlan] Use pointer to member 0 as VPInterleaveRecipe's pointer arg. (#106431)
Update VPInterleaveRecipe to always use the pointer to member 0 as
pointer argument. This in many cases helps to remove unneeded index
adjustments and simplifies VPInterleaveRecipe::execute.

In some rare cases, the address of member 0 does not dominate the insert
position of the interleave group. In those cases a PtrAdd VPInstruction
is emitted to compute the address of member 0 based on the address of
the insert position. Alternatively we could hoist the recipe computing
the address of member 0.
2024-10-06 22:53:13 +01:00
Florian Hahn
68210c7c26
[VPlan] Only generate first lane for VPPredInstPHI if no others used.
IF only the first lane of the result is used, only generate the first
lane.

Fixes https://github.com/llvm/llvm-project/issues/111042.
2024-10-05 19:15:05 +01:00
Florian Hahn
0344123ffb
[VPlan] Manage FMFs for VPWidenCall via VPRecipeWithIRFlags. (NFC)
Update VPWidenCallRecipe to manage fast-math flags directly via
VPRecipeWithIRFlags. This addresses a TODO and allows adjusting the FMFs
directly on the recipe. Also fixes printing for flags for
VPWidenCallRecipe.
2024-10-01 13:20:34 +01:00
Mel Chen
f8373cb0f9
[LV] Reuse VPReplicateRecipe to handle scalar stores in exit block. (#106342)
This patch separates the computation of the final reduction result and
the intermediate stores of reduction.

---------

Co-authored-by: Florian Hahn <flo@fhahn.com>
2024-09-30 15:35:09 +08:00
Graham Hunter
6f1a8c2da2
[LV] Vectorize histogram operations (#99851)
This patch implements autovectorization support for the 'all-in-one'
histogram intrinsic, which seems to have more support than the
'standalone' intrinsic. See
https://discourse.llvm.org/t/rfc-vectorization-support-for-histogram-count-operations/74788/
for an overview of the work and my notes on the tradeoffs between the
two approaches.
2024-09-27 13:08:55 +01:00
Youngsuk Kim
e177dd6fbb
[llvm] Replace uses of Type::getPointerTo() (NFC) (#110163)
Replace uses of `Type::getPointerTo()` which is to be removed.

---------

Co-authored-by: Nikita Popov <github@npopov.com>
2024-09-26 16:38:50 -04:00
Florian Hahn
68ed1728bf
[VPlan] Unify mayWriteToMemory and mayHaveSideEffects logic for VPInst.
Unify logic for mayWriteToMemory and mayHaveSideEffects for
VPInstruction, with the later relying on the former. Also extend to
handle binary operators.

Split off from https://github.com/llvm/llvm-project/pull/106441
2024-09-26 19:16:43 +01:00
Elvis Wang
a068b974b1
[VPlan] Implement VPWidenLoad/StoreEVLRecipe::computeCost(). (#109644)
Currently the EVL recipes transfer the tail masking to the EVL.
But in the legacy cost model, the mask exist and will calculate the
instruction cost of the mask.
To fix the difference between the VPlan-based cost model and the legacy
cost model, we always calculate the instruction cost for the mask in the
EVL recipes.

Note that we should remove the mask cost in the EVL recipes when we
don't need to compare to the legacy cost model.

This patch also fixes #109468.
2024-09-26 07:10:25 +08:00
Florian Hahn
aae7ac6685
[VPlan] Remove VPIteration, update to use directly VPLane instead (NFC)
After 8ec406757cb92 (https://github.com/llvm/llvm-project/pull/95842),
only the lane part of VPIteration is used.

Simplify the code by replacing remaining uses of VPIteration with VPLane directly.
2024-09-25 16:44:42 +01:00