186 Commits

Author SHA1 Message Date
Florian Hahn
0344123ffb
[VPlan] Manage FMFs for VPWidenCall via VPRecipeWithIRFlags. (NFC)
Update VPWidenCallRecipe to manage fast-math flags directly via
VPRecipeWithIRFlags. This addresses a TODO and allows adjusting the FMFs
directly on the recipe. Also fixes printing for flags for
VPWidenCallRecipe.
2024-10-01 13:20:34 +01:00
Mel Chen
f8373cb0f9
[LV] Reuse VPReplicateRecipe to handle scalar stores in exit block. (#106342)
This patch separates the computation of the final reduction result and
the intermediate stores of reduction.

---------

Co-authored-by: Florian Hahn <flo@fhahn.com>
2024-09-30 15:35:09 +08:00
Graham Hunter
6f1a8c2da2
[LV] Vectorize histogram operations (#99851)
This patch implements autovectorization support for the 'all-in-one'
histogram intrinsic, which seems to have more support than the
'standalone' intrinsic. See
https://discourse.llvm.org/t/rfc-vectorization-support-for-histogram-count-operations/74788/
for an overview of the work and my notes on the tradeoffs between the
two approaches.
2024-09-27 13:08:55 +01:00
Youngsuk Kim
e177dd6fbb
[llvm] Replace uses of Type::getPointerTo() (NFC) (#110163)
Replace uses of `Type::getPointerTo()` which is to be removed.

---------

Co-authored-by: Nikita Popov <github@npopov.com>
2024-09-26 16:38:50 -04:00
Florian Hahn
68ed1728bf
[VPlan] Unify mayWriteToMemory and mayHaveSideEffects logic for VPInst.
Unify logic for mayWriteToMemory and mayHaveSideEffects for
VPInstruction, with the later relying on the former. Also extend to
handle binary operators.

Split off from https://github.com/llvm/llvm-project/pull/106441
2024-09-26 19:16:43 +01:00
Elvis Wang
a068b974b1
[VPlan] Implement VPWidenLoad/StoreEVLRecipe::computeCost(). (#109644)
Currently the EVL recipes transfer the tail masking to the EVL.
But in the legacy cost model, the mask exist and will calculate the
instruction cost of the mask.
To fix the difference between the VPlan-based cost model and the legacy
cost model, we always calculate the instruction cost for the mask in the
EVL recipes.

Note that we should remove the mask cost in the EVL recipes when we
don't need to compare to the legacy cost model.

This patch also fixes #109468.
2024-09-26 07:10:25 +08:00
Florian Hahn
aae7ac6685
[VPlan] Remove VPIteration, update to use directly VPLane instead (NFC)
After 8ec406757cb92 (https://github.com/llvm/llvm-project/pull/95842),
only the lane part of VPIteration is used.

Simplify the code by replacing remaining uses of VPIteration with VPLane directly.
2024-09-25 16:44:42 +01:00
Philip Reames
d288574363
[TTI][RISCV] Model cost of loading constants arms of selects and compares (#109824)
This follows in the spirit of 7d82c99403f615f6236334e698720bf979959704,
and extends the costing API for compares and selects to provide
information about the operands passed in an analogous manner. This
allows us to model the cost of materializing the vector constant, as
some select-of-constants are significantly more expensive than others
when you account for the cost of materializing the constants involved.

This is a stepping stone towards fixing
https://github.com/llvm/llvm-project/issues/109466. A separate SLP patch
will be required to utilize the new API.
2024-09-25 07:25:57 -07:00
Alexey Bataev
60ed2361c0
[LV][EVL]Explicitly model AVL as sub, original TC, EVL_PHI.
Patch explicitly models AVL as sub original TC, EVL_PHI instead of
having it in EXPLICIT-VECTOR-LENGTH VPInstruction. Required for correct
safe dependence distance suport.

Reviewers: fhahn, ayalz

Reviewed By: ayalz

Pull Request: https://github.com/llvm/llvm-project/pull/108869
2024-09-25 08:58:29 -04:00
Florian Hahn
3fbf6f8bb1
[LV] Remove more references of unrolled parts after 57f5d8f2fe.
Continue to clean up some now stale references of unroll parts and
related terminology as pointed out post-commit for 06c3a7d.
2024-09-24 15:50:31 +01:00
Florian Hahn
040bb37195
[VPlan] Fix incorrect argument for CreateBinOp after 06c3a7d2d764.
06c3a7d2d764 incorrectly updated CreateBinOp to pass the debug location,
which gets interpreted as FPMath node. Remove the argument.
2024-09-24 11:18:50 +01:00
Florian Hahn
57f5d8f2fe
[VPlan] Only store single vector per VPValue in VPTransformState. (NFC)
After 8ec406757cb92 (https://github.com/llvm/llvm-project/pull/95842),
VPTransformState only stores a single vector value per VPValue.

Simplify the code by replacing the SmallVector in PerPartOutput with a
single Value * and rename to VPV2Vector for clarity.

Also remove the redundant Part argument from various accessors.
2024-09-23 11:28:24 +01:00
Florian Hahn
06c3a7d2d7
[VPlan] Remove unneeded State.UF after 8ec406757cb92 (NFC).
State.UF is not needed any longer after 8ec406757cb92
(https://github.com/llvm/llvm-project/pull/95842). Clean it up,
simplifying ::execute of existing recipes.
2024-09-22 20:42:37 +01:00
Florian Hahn
8ec406757c
[VPlan] Implement unrolling as VPlan-to-VPlan transform. (#95842)
This patch implements explicit unrolling by UF  as VPlan transform. In
follow up patches this will allow simplifying VPTransform state (no need
to store unrolled parts) as well as recipe execution (no need to
generate code for multiple parts in an each recipe). It also allows for
more general optimziations (e.g. avoid generating code for recipes that
are uniform-across parts).

It also unifies the logic dealing with unrolled parts in a single place,
rather than spreading it out across multiple places (e.g. VPlan post
processing for header-phi recipes previously.)

In the initial implementation, a number of recipes still take the
unrolled part as additional, optional argument, if their execution
depends on the unrolled part.

The computation for start/step values for scalable inductions changed
slightly. Previously the step would be computed as scalar and then
splatted, now vscale gets splatted and multiplied by the step in a
vector mul.

This has been split off https://github.com/llvm/llvm-project/pull/94339
which also includes changes to simplify VPTransfomState and recipes'
::execute.

The current version mostly leaves existing ::execute untouched and
instead sets VPTransfomState::UF to 1.

A follow-up patch will clean up all references to VPTransformState::UF.

Another follow-up patch will simplify VPTransformState to only store a
single vector value per VPValue.

PR: https://github.com/llvm/llvm-project/pull/95842
2024-09-21 19:47:37 +01:00
Jay Foad
e03f427196
[LLVM] Use {} instead of std::nullopt to initialize empty ArrayRef (#109133)
It is almost always simpler to use {} instead of std::nullopt to
initialize an empty ArrayRef. This patch changes all occurrences I could
find in LLVM itself. In future the ArrayRef(std::nullopt_t) constructor
could be deprecated or removed.
2024-09-19 16:16:38 +01:00
Florian Hahn
256100489d
[VPlan] Rename isDefinedOutside[Vector]Regions -> [Loop] (NFC)
Clarify name of helper, split off from
https://github.com/llvm/llvm-project/pull/95842/files#r1765556732.
2024-09-19 11:20:31 +01:00
Shih-Po Hung
ffcff2f465
[VPlan][NFC] Fix the value name of VECTOR_GEP (#107544)
This patch passes the string `"vector.gep"` to CreateGEP instead of
CreateMul.
2024-09-18 19:22:36 +08:00
LiqinWeng
a2994b2999
[LV][NFC] Unify printing for WidenEVLReicpe with other EVL recipes (#108177) 2024-09-18 15:03:37 +08:00
Florian Hahn
f0c5caa814
[VPlan] Add VPIRInstruction, use for exit block live-outs. (#100735)
Add a new VPIRInstruction recipe to wrap existing IR instructions not to
be modified during execution, execept for PHIs. For PHIs, a single
VPValue
operand is allowed, and it is used to add a new incoming value for the
single predecessor VPBB. Expect PHIs, VPIRInstructions cannot have any
operands.

Depends on https://github.com/llvm/llvm-project/pull/100658.

PR: https://github.com/llvm/llvm-project/pull/100735
2024-09-14 21:21:55 +01:00
Florian Hahn
a794ee4559
[VPlan] Add VPValue for VF, use it for VPWidenIntOrFpInductionRecipe. (#95305)
Similar to VFxUF, also add a VF VPValue to VPlan and use it to get the
runtime VF in VPWidenIntOrFpInductionRecipe. Code for VF is only
generated if there are users of VF, to avoid unnecessary test changes.

PR: https://github.com/llvm/llvm-project/pull/95305
2024-09-10 10:41:35 +01:00
Kazu Hirata
ce192b87b2 [Vectorize] Fix a warning
This patch fixes:

  llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp:1278:12: error:
  unused variable 'Op0' [-Werror,-Wunused-variable]
2024-09-06 09:12:06 -07:00
Kolya Panchenko
00e40c9b5b
[LV] Support binary and unary operations with EVL-vectorization (#93854)
The patch adds `VPWidenEVLRecipe` which represents `VPWidenRecipe` + EVL
argument. The new recipe replaces `VPWidenRecipe` in
`tryAddExplicitVectorLength` for each binary and unary operations.
Follow up patches will extend support for remaining cases, like `FCmp`
and `ICmp`
2024-09-06 11:41:36 -04:00
Philip Reames
3d9abfc9f8 Consolidate all IR logic for getting the identity value of a reduction [nfc]
This change merges the three different places (at the IR layer) for
finding the identity value of a reduction into a single copy.  This
depends on several prior commits which fix ommissions and bugs in
the distinct copies, but this patch itself should be fully
non-functional.

As the new comments and naming try to make clear, the identity value
is a property of the @llvm.vector.reduce.* intrinsic, not of e.g.
the recurrence descriptor.  (We still provide an interface for
clients using recurrence descriptors, but the implementation simply
translates to the intrinsic which each corresponds to.)

As a note, the getIntrinsicIdentity API does not support fminnum/fmaxnum
or fminimum/fmaximum which is why we still need manual logic (but at
least only one copy of manual logic) for those cases.
2024-09-04 08:23:21 -07:00
Elvis Wang
ed220e1571
[VPlan][NFC] Implement VPWidenMemoryRecipe::computeCost(). (#105614)
In this patch, we implement the `computeCost()` function in
`VPWidenMemoryRecipe`.
2024-09-04 09:46:02 +08:00
Philip Reames
3e8840ba71 Remove "Target" from createXReduction naming [nfc]
Despite the stale comments, none of these actually use TTI, and they're
solely generating standard LLVM IR.
2024-09-03 17:03:55 -07:00
Philip Reames
0b2f2537a5 [LV] Separate AnyOf recurrence from getRecurrenceIdentity [NFC]
These recurrence types don't have a meaningful identity, and the
routine was abused to return the start value instead.  Out of the
three callers to this routine, only one actually wants this
behavior.  This is a prep change for removing the routine entirely
and commoning it with other copies of the same logic.
2024-09-03 09:46:30 -07:00
Florian Hahn
50a02e7c68
[VPlan] Pass intrinsic inst to TTI in VPWidenCallRecipe::computeCost.
Follow-up to 9ccf825, adjust computeCost to also pass IntrinsicInst to
TTI if available, as there are multiple places in TTI which use the
IntrinsicInst.

Fixes https://github.com/llvm/llvm-project/issues/107016.
2024-09-02 20:47:37 +01:00
Florian Hahn
b0de7fa466
[VPlan] Use op from underlying call in computeCost if needed.
This fixes a divergence between legacy and VPlan-based cost model, e.g.
if one of the operands has an first-order recurrence phi as operand.
2024-09-02 14:00:10 +01:00
Florian Hahn
9ccf82543d
[VPlan] Implement VPWidenCallRecipe::computeCost (NFCI). (#106047)
Implement cost computation for VPWidenCallRecipe. In some cases, targets
use argument info to compute intrinsic costs. If all operands of the
call are VPValues with an underlying IR value, use the IR values as
arguments.

PR: https://github.com/llvm/llvm-project/pull/106731
2024-09-01 16:26:08 +01:00
Philip Reames
c53008de89 [VPlan] Manually jumpthread a bit of reduction code for readability [nfc] 2024-08-30 12:46:49 -07:00
Ramkumar Ramachandra
71ede8d831
VPlan: factor out VPlanUtils into its own file (NFC) (#105857) 2024-08-28 13:54:41 +01:00
Shao-Ce SUN
2f0d32692e
[NFC][VPlan] Trim extra spaces in VPDerivedIVRecipe::print during debugging (#106041)
before:
```
    EMIT vp<%3> = CANONICAL-INDUCTION ir<0>, vp<%8>
    vp<%4>    = DERIVED-IV ir<%n> + vp<%3> * ir<-1>
    vp<%5> = SCALAR-STEPS vp<%4>, ir<-1>
```

after:
```
    EMIT vp<%3> = CANONICAL-INDUCTION ir<0>, vp<%8>
    vp<%4> = DERIVED-IV ir<%n> + vp<%3> * ir<-1>
    vp<%5> = SCALAR-STEPS vp<%4>, ir<-1>
```
2024-08-26 21:23:05 +08:00
Florian Hahn
1fa6c99a09
[VPlan] Move EVL memory recipes to VPlanRecipes.cpp (NFC)
Move VPWiden[Load|Store]EVLRecipe::executeto VPlanRecipes.cpp in line
with other ::execute implementations that don't depend on anything
defined in LoopVectorization.cpp
2024-08-22 18:30:49 +01:00
Paul Walker
4f075086e7
[LLVM][VPlan] Keep all VPBlend masks until VPlan transformation. (#104015)
It's not possible to pick the best mask to remove when optimising
VPBlend at construction and so this patch refactors the code to move the
decision (and thus transformation) to VPlanTransforms.

NOTE: This patch does not change the decision of which mask to pick.
That will be done in a following PR to keep this patch as NFC from an
output point of view.
2024-08-21 12:51:40 +01:00
Florian Hahn
99741ac285
[VPlan] Introduce explicit ExtractFromEnd recipes for live-outs. (#100658)
Introduce explicit ExtractFromEnd recipes to extract the final values
for live-outs instead of implicitly extracting in VPLiveOut::fixPhi.

This is a follow-up to the recent changes of modeling extracts for
recurrences and consolidates live-out extract creation for fixed-order
recurrences at a single place: addLiveOutsForFirstOrderRecurrences.

It is also in preparation of replacing VPLiveOut with VPIRInstructions
wrapping the original scalar phis.

PR: https://github.com/llvm/llvm-project/pull/100658
2024-08-21 10:06:44 +02:00
Florian Hahn
1aa8a6f691
[VPlan] Compute cost for most opcodes in VPWidenRecipe (NFCI). (#98764)
Implement VPWidenRecipe::computeCost for most cases (except 
UDiv,SDiv,URem,SRem which require additional logic).

Note that this specializes `::computeCost` instead of `::cost`, as
`VPRecipeBase::cost` is responsible for skipping cost-computations
for pre-computed recipes for now.

The most recent version of the VPlan-based cost model introduction 
has been committed on Jul 10 (b841e2eca3b5c8b) and we should
probably give it at least a week in case additional mismatches surface.

PR: https://github.com/llvm/llvm-project/pull/98764
2024-08-16 21:20:23 +02:00
Florian Hahn
12763a0652
[VPlan] Move VPWidenStoreRecipe::execute to VPlanRecipes.cpp (NFC).
Move VPWidenStoreRecipe::execute to VPlanRecipes.cpp in line with
other ::execute implementations that don't depend on anything
defined in LoopVectorization.cpp
2024-08-15 08:04:22 +01:00
Paul Walker
b89853b504 [LLVM] Fix whitespace issues in VPBlendRecipe::execute. 2024-08-13 15:12:33 +00:00
Florian Hahn
35d3625a4d
[VPlan] Move VPWidenLoadRecipe::execute to VPlanRecipes.cpp (NFC).
Move VPWidenLoadRecipe::execute to VPlanRecipes.cpp in line with
other ::execute implementations that don't depend on anything
defined in LoopVectorization.cpp
2024-08-11 12:01:18 +01:00
Florian Hahn
241349fff2
[VPlan] Move VPWidenPointerInductionR::execute to VPlanRecipes. (NFC)
Move VPWidenPointerInductionRecipe::execute to VPlanRecipes.cpp in line
with other ::execute implementations that don't depend on anything
defined in LoopVectorization.cpp
2024-08-05 20:42:10 +01:00
Florian Hahn
bb60dd391f
[VPlan] Only use force-target-instruction-cost for recipes with insts.
To match the behavior of the legacy cost model, only apply
-force-target-instruction-cost to recipes with underlying instructions
for now, as only original IR instructions are considered by the legacy
cost model.

This fixes a difference between legacy and VPlan based cost model,
triggering the verification assertion, reported by @JonPsson1.
2024-07-23 21:05:10 +01:00
Florian Hahn
a23efcc703
[VPlan] Move VPInterleaveRecipe::execute to VPlanRecipes.cpp (NFC).
Move ::exeute and ::print to VPlanRecipes.cpp in line with other recipe
definitions.
2024-07-20 22:23:02 +01:00
Mel Chen
4eb30cfb34
[LV][EVL] Support in-loop reduction using tail folding with EVL. (#90184)
Following from #87816, add VPReductionEVLRecipe to describe vector
predication reduction.

Address one of TODOs from #76172.
2024-07-16 16:15:24 +08:00
Graham Hunter
22a7f6dcc4
Revert "[LV] Autovectorization for the all-in-one histogram intrinsic" (#98493)
Reverts llvm/llvm-project#91458 to deal with post-commit reviewer
requests.
2024-07-11 16:39:30 +01:00
Florian Hahn
9a5a8731e7
[VPlan] Introduce ResumePhi VPInstruction, use to create phi for FOR. (#94760)
This patch introduces a new ResumePhi VPInstruction which creates a phi
in a leaf block of a VPlan. The first use is to create the phi node for
fixed-order recurrence resume values in the scalar preheader.

The VPInstruction takes 2 operands: 1) the incoming value from the
middle-block and a default value to be used for all other incoming
blocks.

In follow-up changes, it will also be used to create phis for reduction
and induction resume values.

Depends on https://github.com/llvm/llvm-project/pull/92651

PR: https://github.com/llvm/llvm-project/pull/94760
2024-07-11 16:08:04 +01:00
Graham Hunter
1860fd049e
[LV] Autovectorization for the all-in-one histogram intrinsic (#91458)
This patch implements limited loop vectorization support for the 'all-in-one' histogram intrinsic. The feature is disabled by default, and when enabled will only vectorize if there are no other users of values in the gather-modify-scatter sequence.
2024-07-11 15:33:30 +01:00
Florian Hahn
b841e2eca3
Recommit "[VPlan] First step towards VPlan cost modeling. (#92555)"
This reverts commit 6f538f6a2d3224efda985e9eb09012fa4275ea92.

A number of crashes have been fixed by separate fixes, including
ttps://github.com/llvm/llvm-project/pull/96622. This version of the
PR also pre-computes the costs for branches (except the latch) instead
of computing their costs as part of costing of replicate regions, as
there may not be a direct correspondence between original branches and
number of replicate regions.

Original message:
This adds a new interface to compute the cost of recipes, VPBasicBlocks,
VPRegionBlocks and VPlan, initially falling back to the legacy cost model
for all recipes. Follow-up patches will gradually migrate recipes to
compute their own costs step-by-step.

It also adds getBestPlan function to LVP which computes the cost of all
VPlans and picks the most profitable one together with the most
profitable VF.

The VPlan selected by the VPlan cost model is executed and there is an
assert to catch cases where the VPlan cost model and the legacy cost
model disagree. Even though I checked a number of different build
configurations on AArch64 and X86, there may be some differences
that have been missed.

Additional discussions and context can be found in @arcbbb's
https://github.com/llvm/llvm-project/pull/67647 and
https://github.com/llvm/llvm-project/pull/67934 which is an earlier
version of the current PR.

PR: https://github.com/llvm/llvm-project/pull/92555
2024-07-10 14:22:21 +01:00
Florian Hahn
c16e37867c
[VPlan] Clarify setting Lane in fixPhi (NFCI).
Split off from https://github.com/llvm/llvm-project/pull/94760, clarify
as suggested.
2024-07-09 13:09:40 +01:00
Florian Hahn
99d6c6d936
[VPlan] Model branch cond to enter scalar epilogue in VPlan. (#92651)
This patch moves branch condition creation to enter the scalar epilogue
loop to VPlan. Modeling the branch in the middle block also requires
modeling the successor blocks. This is done using the recently
introduced VPIRBasicBlock.

Note that the middle.block is still created as part of the skeleton and
then patched in during VPlan execution. Unfortunately the skeleton needs
to create the middle.block early on, as it is also used for induction
resume value creation and is also needed to properly update the
dominator tree during skeleton creation.

After this patch lands, I plan to move induction resume value and phi
node creation in the scalar preheader to VPlan. Once that is done, we
should be able to create the middle.block in VPlan directly.

This is a re-worked version based on the earlier
https://reviews.llvm.org/D150398 and the main change is the use of
VPIRBasicBlock.

Depends on https://github.com/llvm/llvm-project/pull/92525

PR: https://github.com/llvm/llvm-project/pull/92651
2024-07-05 10:08:42 +01:00
Nikita Popov
2d209d964a
[IR] Add getDataLayout() helpers to BasicBlock and Instruction (#96902)
This is a helper to avoid writing `getModule()->getDataLayout()`. I
regularly try to use this method only to remember it doesn't exist...

`getModule()->getDataLayout()` is also a common (the most common?)
reason why code has to include the Module.h header.
2024-06-27 16:38:15 +02:00