138 Commits

Author SHA1 Message Date
Andrew Rogers
4e2efa55c6
[llvm] export private symbols needed by unittests (#145767)
## Purpose
Export a small number of private LLVM symbols so that unit tests can
still build/run when LLVM is built as a Windows DLL or a shared library
with default hidden symbol visibility.

## Background
The effort to build LLVM as a WIndows DLL is tracked in #109483.
Additional context is provided in [this
discourse](https://discourse.llvm.org/t/psa-annotating-llvm-public-interface/85307).

Some LLVM unit tests use internal/private symbols that are not part of
LLVM's public interface. When building LLVM as a DLL or shared library
with default hidden symbol visibility, the symbols are not available
when the unit test links against the DLL or shared library.

This problem can be solved in one of two ways:
1. Export the private symbols from the DLL.
2. Link the unit tests against the intermediate static libraries instead
of the final LLVM DLL.

This PR applies option 1. Based on the discussion of option 2 in
#145448, this option is preferable.

## Overview
* Adds a new `LLVM_ABI_FOR_TEST` export macro, which is currently just
an alias for `LLVM_ABI`.
* Annotates the sub-set of symbols under `llvm/lib` that are required to
get unit tests building using the new macro.
2025-07-10 08:20:09 -07:00
Florian Hahn
6b3d2b629c
[VPlan] Add VPExpressionRecipe, replacing extended reduction recipes. (#144281)
This patch adds a new recipe to combine multiple recipes into an
'expression' recipe, which should be considered as single entity for
cost-modeling and transforms. The recipe needs to be 'decomposed', i.e.
replaced by its individual recipes before execute.

This subsumes VPExtendedReductionRecipe and
VPMulAccumulateReductionRecipe and should make it easier to extend to
include more types of bundled patterns, like e.g. extends folded into
loads or various arithmetic instructions, if supported by the target.

It allows avoiding re-creating the original recipes when converting to
concrete recipes, together with removing the need to record various
information. The current version of the patch still retains the original
printing matching VPExtendedReductionRecipe and
VPMulAccumulateReductionRecipe, but this specialized print could be
replaced with printing the bundled recipes directly.

PR: https://github.com/llvm/llvm-project/pull/144281
2025-07-01 20:44:50 +01:00
Florian Hahn
01b9828a66
[VPlan] Remove unneeded friend classes from VPValue (NFC).
None of the removed classes makes use of the friendship relationship.
2025-06-05 21:40:21 +01:00
Florian Hahn
0f00a96fed
[VPlan] Simplify branch on False in VPlan transform (NFC). (#140409)
Simplify branch on false, starting with the branch from the middle block
to the scalar preheader. Initially this helps simplifying the initial
VPlan construction.

Depends on https://github.com/llvm/llvm-project/pull/140405.

PR: https://github.com/llvm/llvm-project/pull/140409
2025-05-31 20:32:45 +01:00
Florian Hahn
c3cce7caf8
[VPlan] Remove unused VPUser constructors (NFC).
Now all users construct VPUsers using VPUser(ArraryRef<VPValue *>).
Remove the other unused constructors.
2025-05-31 12:20:32 +01:00
Florian Hahn
dcef154b5c
[VPlan] Replace VPRegionBlock with explicit CFG before execute (NFCI). (#117506)
Building on top of https://github.com/llvm/llvm-project/pull/114305,
replace VPRegionBlocks with explicit CFG before executing.

This brings the final VPlan closer to the IR that is generated and
helps to simplify codegen.

It will also enable further simplifications of phi handling during
execution and transformations that do not have to preserve the 
canonical IV required by loop regions. This for example could include
replacing the canonical IV with an EVL based phi while completely
removing the original canonical IV.

PR: https://github.com/llvm/llvm-project/pull/117506
2025-05-24 19:17:16 +01:00
Elvis Wang
664c937b43
[VPlan] Implement VPExtendedReduction, VPMulAccumulateReductionRecipe and corresponding vplan transformations. (#137746)
This patch introduce two new recipes.

* VPExtendedReductionRecipe
  - cast + reduction.

* VPMulAccumulateReductionRecipe
  - (cast) + mul + reduction.

This patch also implements the transformation that match following
patterns via vplan and converts to abstract recipes for better cost
estimation.

* VPExtendedReduction
  - reduce(cast(...))

* VPMulAccumulateReductionRecipe
  - reduce.add(mul(...))
  - reduce.add(mul(ext(...), ext(...))
  - reduce.add(ext(mul(ext(...), ext(...))))

The converted abstract recipes will be lower to the concrete recipes
(widen-cast + widen-mul + reduction) just before recipe execution.

Note that this patch still relies on legacy cost model the calculate the
cost for these patters.
Will enable vplan-based cost decision in #113903.

Split from #113903.
2025-05-16 10:25:38 +08:00
Florian Hahn
bc03d6cce2
[VPlan] Introduce all loop regions as VPlan transform. (NFC) (#129402)
Further simplify VPlan CFG builder by moving introduction of inner
regions to a VPlan transform, building on
https://github.com/llvm/llvm-project/pull/128419.

The HCFG builder now only constructs plain CFGs. I will move it to
VPlanConstruction as follow-up.

Depends on https://github.com/llvm/llvm-project/pull/128419.

PR: https://github.com/llvm/llvm-project/pull/129402
2025-04-16 13:30:45 +02:00
Sam Tebbs
b658a2e74a
[LV] Reduce register usage for scaled reductions (#133090)
This PR accounts for scaled reductions in `calculateRegisterUsage` to
reflect the fact that the number of lanes in their output is smaller
than the VF.

Depends on https://github.com/llvm/llvm-project/pull/126437
2025-04-11 14:31:08 +01:00
Florian Hahn
6a9e8fc50c
[VPlan] Introduce VPInstructionWithType, use instead of VPScalarCast(NFC) (#129706)
There are some opcodes that currently require specialized recipes, due
to their result type not being implied by their operands, including
casts.

This leads to duplication from defining multiple full recipes.

This patch introduces a new VPInstructionWithType subclass that also
stores the result type. The general idea is to have opcodes needing to
specify a result type to use this general recipe. The current patch
replaces VPScalarCastRecipe with VInstructionWithType, a similar patch
for VPWidenCastRecipe will follow soon.

There are a few proposed opcodes that should also benefit, without the
need of workarounds:
* https://github.com/llvm/llvm-project/pull/129508
* https://github.com/llvm/llvm-project/pull/119284

PR: https://github.com/llvm/llvm-project/pull/129706
2025-04-10 22:30:40 +01:00
Florian Hahn
ad9f15ab53
[VPlan] Introduce and use VPValue::replaceUsesOfWith (NFC).
Adds an API matching LLVM's IR Value, which simplifies some code a
bit.
2025-04-07 22:07:52 +01:00
Florian Hahn
5b38fb59df
[VPlan] Remove remaining references to VPScalarPHIRecipe (NFC).
VPScalarPHIRecipe has been replaced by VPInstructions with PHI opcodes.
Strip remaining dead references to VPScalarPHIRecipe.
2025-03-24 19:37:00 +00:00
Luke Lau
a4dc02c0e7
[VPlan] Rename VPReverseVectorPointerRecipe to VPVectorEndPointerRecipe. NFC (#131086)
After #128718 lands there will be two ways of performing a reversed
widened memory access, either by performing a consecutive unit-stride
access and a reverse, or a strided access with a negative stride.

Even though both produce a reversed vector, only the former needs
VPReverseVectorPointerRecipe which computes a pointer to the last
element of each part. A strided reverse still needs a pointer to the
first element of each part so it will use VPVectorPointerRecipe.

This renames VPReverseVectorPointerRecipe to VPVectorEndPointerRecipe to
clarify that a reversed access may not necessarily need a pointer to the
last element.
2025-03-19 00:09:15 +08:00
Florian Hahn
c0bf4b2c57
[VPlan] Remove unneeded VPValue::getLiveInIRValue() const (NFC).
The accessor is not needed/used.
2025-02-28 17:01:19 +00:00
Luke Lau
e23ab73335
[VPlan] Don't convert widen recipes to VP intrinsics in EVL transform (#127180)
This is a copy of #126177, since it was automatically and permanently
closed because I messed up the source branch on my remote

This patch proposes to avoid converting widening recipes to VP
intrinsics during the EVL transform.

IIUC we initially did this to avoid `vl` toggles on RISC-V. However we
now have the RISCVVLOptimizer pass which mostly makes this redundant.

Emitting regular IR instead of VP intrinsics allows more generic
optimisations, both in the middle end and DAGCombiner, and we generally
have better patterns in the RISC-V backend for non-VP nodes. Sticking to
regular IR instructions is likely a lot less work than reimplementing
all of these optimisations for VP intrinsics, and on SPEC CPU 2017 we get
noticeably better code generation.
2025-02-22 19:38:11 +08:00
Florian Hahn
5008277322
[VPlan] Move auxiliary declarations out of VPlan.h (NFC). (#124104)
Nothing in VPlan.h directly depends on VPTransformState, VPCostContext,
VPFRange, VPlanPrinter or VPSlotTracker. Move them out to a separate
header to reduce the size of widely used VPlan.h.

This is a first step towards more cleanly separating declarations in
VPlan.

Besides reducing VPlan.h's size, this also allows including additional
VPlan-related headers in VPlanHelpers.h for use there. An example is
using VPDominatorTree in VPTransformState
(https://github.com/llvm/llvm-project/pull/117138).

PR: https://github.com/llvm/llvm-project/pull/124104
2025-02-02 13:44:07 +00:00
Florian Hahn
65cd9e4c2f
[VPlan] Make VPValue constructors protected. (NFC)
Tighten access to constructors similar to ef1260acc0. VPValues should
either be constructed by constructors of recipes defining them or should
be live-ins created by VPlan (via getOrAddLiveIn).
2025-01-17 22:17:12 +00:00
Sam Tebbs
795e35a653
Reland "[LoopVectorizer] Add support for partial reductions" with non-phi operand fix. (#121744)
This relands the reverted #120721 with a fix for cases where neither
reduction operand are the reduction phi. Only
63114239cc8d26225a0ef9920baacfc7cc00fc58 and
63114239cc8d26225a0ef9920baacfc7cc00fc58 are new on top of the reverted
PR.

---------

Co-authored-by: Nicholas Guy <nicholas.guy@arm.com>
2025-01-13 11:20:35 +00:00
Zequan Wu
4d8f9594b2 Revert "Reland "[LoopVectorizer] Add support for partial reductions" (#120721)"
This reverts commit c858bf620c3ab2a4db53e84b9365b553c3ad1aa6 as it casuse optimization crash on -O2, see https://github.com/llvm/llvm-project/pull/120721#issuecomment-2563192057
2024-12-27 11:51:54 -08:00
Sam Tebbs
c858bf620c
Reland "[LoopVectorizer] Add support for partial reductions" (#120721)
This re-lands the reverted #92418 

When the VF is small enough so that dividing the VF by the scaling
factor results in 1, the reduction phi execution thinks the VF is scalar
and sets the reduction's output as a scalar value, tripping assertions
expecting a vector value. The latest commit in this PR fixes that by
using `State.VF` in the scalar check, rather than the divided VF.

---------

Co-authored-by: Nicholas Guy <nicholas.guy@arm.com>
2024-12-24 12:08:17 +00:00
Florian Hahn
5f096fd221
Revert "[LoopVectorizer] Add support for partial reductions (#92418)"
This reverts commit 060d62b48aeb5080ffcae1dc56e41a06c6f56701.

It looks like this is triggering an assertion when build llvm-test-suite
on ARM64 macOS.

Reproducer from MultiSource/Benchmarks/Ptrdist/bc/number.c

    target datalayout = "e-m:o-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-n32:64-S128-Fn32"
    target triple = "arm64-apple-macosx15.0.0"

    define void @test(i64 %idx.neg, i8 %0) #0 {
    entry:
      br label %while.body

    while.body:                                       ; preds = %while.body, %entry
      %n1ptr.0.idx131 = phi i64 [ %n1ptr.0.add, %while.body ], [ %idx.neg, %entry ]
      %n2ptr.0.idx130 = phi i64 [ %n2ptr.0.add, %while.body ], [ 0, %entry ]
      %sum.1129 = phi i64 [ %add99, %while.body ], [ 0, %entry ]
      %n1ptr.0.add = add i64 %n1ptr.0.idx131, 1
      %conv = sext i8 %0 to i64
      %n2ptr.0.add = add i64 %n2ptr.0.idx130, 1
      %1 = load i8, ptr null, align 1
      %conv97 = sext i8 %1 to i64
      %mul = mul i64 %conv97, %conv
      %add99 = add i64 %mul, %sum.1129
      %cmp94 = icmp ugt i64 %n1ptr.0.idx131, 0
      %cmp95 = icmp ne i64 %n2ptr.0.idx130, -1
      %2 = and i1 %cmp94, %cmp95
      br i1 %2, label %while.body, label %while.end.loopexit

    while.end.loopexit:                               ; preds = %while.body
      %add99.lcssa = phi i64 [ %add99, %while.body ]
      ret void
    }

    attributes #0 = { "target-cpu"="apple-m1" }

> opt -p loop-vectorize
Assertion failed: ((VF.isScalar() || V->getType()->isVectorTy()) && "scalar values must be stored as (0, 0)"), function set, file VPlan.h, line 284.
2024-12-19 21:46:51 +00:00
Nicholas Guy
060d62b48a
[LoopVectorizer] Add support for partial reductions (#92418)
Following on from https://github.com/llvm/llvm-project/pull/94499, this
patch adds support to the Loop Vectorizer to emit the partial reduction
intrinsics where they may be beneficial for the target.

---------

Co-authored-by: Samuel Tebbs <samuel.tebbs@arm.com>
2024-12-19 11:42:40 +00:00
Florian Hahn
a7fda0e1e4
[VPlan] Introduce VPScalarPHIRecipe, use for can & EVL IV codegen (NFC). (#114305)
Introduce a general recipe to generate a scalar phi. Lower
VPCanonicalIVPHIRecipe and VPEVLBasedIVRecipe to VPScalarIVPHIrecipe
before plan execution, avoiding the need for duplicated ::execute
implementations. There are other cases that could benefit, including
in-loop reduction phis and pointer induction phis.

Builds on a similar idea as
https://github.com/llvm/llvm-project/pull/82270.

PR: https://github.com/llvm/llvm-project/pull/114305
2024-12-03 14:53:51 +00:00
Florian Hahn
b021464d35
[VPlan] Introduce scalar loop header in plan, remove VPLiveOut. (#109975)
Update VPlan to include the scalar loop header. This allows retiring
VPLiveOut, as the remaining live-outs can now be handled by adding
operands to the wrapped phis in the scalar loop header.

Note that the current version only includes the scalar loop header, no
other loop blocks and also does not wrap it in a region block.

PR: https://github.com/llvm/llvm-project/pull/109975
2024-10-31 21:36:44 +01:00
Shih-Po Hung
266ff98cba
[LV][VPlan] Use VF VPValue in VPVectorPointerRecipe (#110974)
Refactors VPVectorPointerRecipe to use the VF VPValue to obtain the
runtime VF, similar to #95305.

Since only reverse vector pointers require the runtime VF, the patch
sets VPUnrollPart::PartOpIndex to 1 for vector pointers and 2 for
reverse vector pointers. As a result, the generation of reverse vector
pointers is moved into a separate recipe.
2024-10-26 23:18:50 +08:00
Elvis Wang
b3edc764f7
[VPlan] Implement VPWidenCastRecipe::computeCost(). (NFCI) (#111339)
This patch implement `VPWidenCastRecipe::computeCost()` and skip cast
recipies in the in-loop reduction.
2024-10-22 12:23:49 +08:00
Florian Hahn
6fbbe152fa
[VPlan] Introduce VPWidenIntrinsicRecipe to separate from libcall. (#110486)
This patch splits off intrinsic hanlding to a new
VPWidenIntrinsicRecipe. VPWidenIntrinsicRecipes only need access to the
intrinsic ID to widen and the scalar result type (in case the intrinsic
is overloaded on the result type). It does not need access to an
underlying IR call instruction or function.

This means VPWidenIntrinsicRecipe can be created easily without access
to underlying IR.
2024-10-08 22:37:20 +01:00
Graham Hunter
6f1a8c2da2
[LV] Vectorize histogram operations (#99851)
This patch implements autovectorization support for the 'all-in-one'
histogram intrinsic, which seems to have more support than the
'standalone' intrinsic. See
https://discourse.llvm.org/t/rfc-vectorization-support-for-histogram-count-operations/74788/
for an overview of the work and my notes on the tradeoffs between the
two approaches.
2024-09-27 13:08:55 +01:00
Florian Hahn
4eb9838409
[VPlan] Generalize VPValue::isDefinedOutsideLoopRegions.
Update isDefinedOutsideLoopRegions to check if a recipe is defined
outside any region. Split off already approved
https://github.com/llvm/llvm-project/pull/95842 now that this can be
tested separately after landing VPlan-based LICM
https://github.com/llvm/llvm-project/issues/107501
2024-09-20 15:34:00 +01:00
Florian Hahn
256100489d
[VPlan] Rename isDefinedOutside[Vector]Regions -> [Loop] (NFC)
Clarify name of helper, split off from
https://github.com/llvm/llvm-project/pull/95842/files#r1765556732.
2024-09-19 11:20:31 +01:00
Florian Hahn
f0c5caa814
[VPlan] Add VPIRInstruction, use for exit block live-outs. (#100735)
Add a new VPIRInstruction recipe to wrap existing IR instructions not to
be modified during execution, execept for PHIs. For PHIs, a single
VPValue
operand is allowed, and it is used to add a new incoming value for the
single predecessor VPBB. Expect PHIs, VPIRInstructions cannot have any
operands.

Depends on https://github.com/llvm/llvm-project/pull/100658.

PR: https://github.com/llvm/llvm-project/pull/100735
2024-09-14 21:21:55 +01:00
Kolya Panchenko
00e40c9b5b
[LV] Support binary and unary operations with EVL-vectorization (#93854)
The patch adds `VPWidenEVLRecipe` which represents `VPWidenRecipe` + EVL
argument. The new recipe replaces `VPWidenRecipe` in
`tryAddExplicitVectorLength` for each binary and unary operations.
Follow up patches will extend support for remaining cases, like `FCmp`
and `ICmp`
2024-09-06 11:41:36 -04:00
Mel Chen
4eb30cfb34
[LV][EVL] Support in-loop reduction using tail folding with EVL. (#90184)
Following from #87816, add VPReductionEVLRecipe to describe vector
predication reduction.

Address one of TODOs from #76172.
2024-07-16 16:15:24 +08:00
Graham Hunter
22a7f6dcc4
Revert "[LV] Autovectorization for the all-in-one histogram intrinsic" (#98493)
Reverts llvm/llvm-project#91458 to deal with post-commit reviewer
requests.
2024-07-11 16:39:30 +01:00
Graham Hunter
1860fd049e
[LV] Autovectorization for the all-in-one histogram intrinsic (#91458)
This patch implements limited loop vectorization support for the 'all-in-one' histogram intrinsic. The feature is disabled by default, and when enabled will only vectorize if there are no other users of values in the gather-modify-scatter sequence.
2024-07-11 15:33:30 +01:00
Florian Hahn
b841e2eca3
Recommit "[VPlan] First step towards VPlan cost modeling. (#92555)"
This reverts commit 6f538f6a2d3224efda985e9eb09012fa4275ea92.

A number of crashes have been fixed by separate fixes, including
ttps://github.com/llvm/llvm-project/pull/96622. This version of the
PR also pre-computes the costs for branches (except the latch) instead
of computing their costs as part of costing of replicate regions, as
there may not be a direct correspondence between original branches and
number of replicate regions.

Original message:
This adds a new interface to compute the cost of recipes, VPBasicBlocks,
VPRegionBlocks and VPlan, initially falling back to the legacy cost model
for all recipes. Follow-up patches will gradually migrate recipes to
compute their own costs step-by-step.

It also adds getBestPlan function to LVP which computes the cost of all
VPlans and picks the most profitable one together with the most
profitable VF.

The VPlan selected by the VPlan cost model is executed and there is an
assert to catch cases where the VPlan cost model and the legacy cost
model disagree. Even though I checked a number of different build
configurations on AArch64 and X86, there may be some differences
that have been missed.

Additional discussions and context can be found in @arcbbb's
https://github.com/llvm/llvm-project/pull/67647 and
https://github.com/llvm/llvm-project/pull/67934 which is an earlier
version of the current PR.

PR: https://github.com/llvm/llvm-project/pull/92555
2024-07-10 14:22:21 +01:00
Florian Hahn
f1f3c34b47
Revert "Recommit "[VPlan] First step towards VPlan cost modeling. (#92555)""
This reverts commit 242cc200ccb24e22eaf54aed7b0b0c84cfc54c0b and
eea150c84053035163f307b46549a2997a343ce9, as it is causing a build bot
failure and there have been a number of crashes reported at
https://github.com/llvm/llvm-project/pull/92555
2024-06-21 19:54:21 +01:00
Florian Hahn
242cc200cc
Recommit "[VPlan] First step towards VPlan cost modeling. (#92555)"
This reverts commit 6f538f6a2d3224efda985e9eb09012fa4275ea92.

Extra tests for crashes discovered when building Chromium have been
added in fb86cb7ec157689e, 3be7312f81ad2.

Original message:
This adds a new interface to compute the cost of recipes, VPBasicBlocks,
VPRegionBlocks and VPlan, initially falling back to the legacy cost model
for all recipes. Follow-up patches will gradually migrate recipes to
compute their own costs step-by-step.

It also adds getBestPlan function to LVP which computes the cost of all
VPlans and picks the most profitable one together with the most
profitable VF.

The VPlan selected by the VPlan cost model is executed and there is an
assert to catch cases where the VPlan cost model and the legacy cost
model disagree. Even though I checked a number of different build
configurations on AArch64 and X86, there may be some differences
that have been missed.

Additional discussions and context can be found in @arcbbb's
https://github.com/llvm/llvm-project/pull/67647 and
https://github.com/llvm/llvm-project/pull/67934 which is an earlier
version of the current PR.

PR: https://github.com/llvm/llvm-project/pull/92555
2024-06-20 17:32:52 +01:00
Arthur Eubanks
6f538f6a2d Revert "Recommit "[VPlan] First step towards VPlan cost modeling. (#92555)""
This reverts commit 90fd99c0795711e1cf762a02b29b0a702f86a264.
This reverts commit 43e6f46936e177e47de6627a74b047ba27561b44.

Causes crashes, see comments on https://github.com/llvm/llvm-project/pull/92555.
2024-06-14 17:47:08 +00:00
Florian Hahn
90fd99c079
Recommit "[VPlan] First step towards VPlan cost modeling. (#92555)"
This reverts commit 46080abe9b136821eda2a1a27d8a13ceac349f8c.

Extra tests have been added in 52d29eb287.

Original message:
This adds a new interface to compute the cost of recipes, VPBasicBlocks,
VPRegionBlocks and VPlan, initially falling back to the legacy cost model
for all recipes. Follow-up patches will gradually migrate recipes to
compute their own costs step-by-step.

It also adds getBestPlan function to LVP which computes the cost of all
VPlans and picks the most profitable one together with the most
profitable VF.

The VPlan selected by the VPlan cost model is executed and there is an
assert to catch cases where the VPlan cost model and the legacy cost
model disagree. Even though I checked a number of different build
configurations on AArch64 and X86, there may be some differences
that have been missed.

Additional discussions and context can be found in @arcbbb's
https://github.com/llvm/llvm-project/pull/67647 and
https://github.com/llvm/llvm-project/pull/67934 which is an earlier
version of the current PR.

PR: https://github.com/llvm/llvm-project/pull/92555
2024-06-14 12:33:48 +01:00
Arthur Eubanks
46080abe9b Revert "[VPlan] First step towards VPlan cost modeling. (#92555)"
This reverts commit 00798354c553d48d27006a2b06a904bd6013e31b.

Causes crashes, see comments on https://github.com/llvm/llvm-project/pull/92555.
2024-06-13 16:37:21 +00:00
Florian Hahn
00798354c5
[VPlan] First step towards VPlan cost modeling. (#92555)
This adds a new interface to compute the cost of recipes, VPBasicBlocks,
VPRegionBlocks and VPlan, initially falling back to the legacy cost model
for all recipes. Follow-up patches will gradually migrate recipes to 
compute their own costs step-by-step.

It also adds getBestPlan function to LVP which computes the cost of all
VPlans and picks the most profitable one together with the most
profitable VF.

The VPlan selected by the VPlan cost model is executed and there is an
assert to catch cases where the VPlan cost model and the legacy cost
model disagree. Even though I checked a number of different build
configurations on AArch64 and X86, there may be some differences
that have been missed.

Additional discussions and context can be found in @arcbbb's
https://github.com/llvm/llvm-project/pull/67647 and 
https://github.com/llvm/llvm-project/pull/67934 which is an earlier
version of the current PR.


PR: https://github.com/llvm/llvm-project/pull/92555
2024-06-13 14:26:18 +01:00
Florian Hahn
577785c5ca
[VPlan] Remove unused removeLastOperand (NFC).
The last use of the function has been removed a while ago. Remove the
unused function.
2024-05-18 18:43:20 +01:00
Florian Hahn
e2a72fa583
[VPlan] Introduce recipes for VP loads and stores. (#87816)
Introduce new subclasses of VPWidenMemoryRecipe for VP
(vector-predicated) loads and stores to address multiple TODOs from
https://github.com/llvm/llvm-project/pull/76172

Note that the introduction of the new recipes also improves code-gen for
VP gather/scatters by removing the redundant header mask. With the new
approach, it is not sufficient to look at users of the widened canonical
IV to find all uses of the header mask.

In some cases, a widened IV is used instead of separately widening the
canonical IV. To handle that, first collect all VPValues representing header
masks (by looking at users of both the canonical IV and widened inductions
that are canonical) and then checking all users (recursively) of those header
masks.

Depends on https://github.com/llvm/llvm-project/pull/87411.

PR: https://github.com/llvm/llvm-project/pull/87816
2024-04-19 09:44:23 +01:00
Florian Hahn
a9bafe91dd
[VPlan] Split VPWidenMemoryInstructionRecipe (NFCI). (#87411)
This patch introduces a new VPWidenMemoryRecipe base class and distinct
sub-classes to model loads and stores.

This is a first step in an effort to simplify and modularize code
generation for widened loads and stores and enable adding further more
specialized memory recipes.

PR: https://github.com/llvm/llvm-project/pull/87411
2024-04-17 11:00:58 +01:00
Florian Hahn
34777c238b
[VPlan] Don't mark VPBlendRecipe as phi-like.
VPBlendRecipes don't get lowered to phis and usually do not appear at
the beginning of blocks, due to their masks appearing before them.

This effectively relaxes an over-eager verifier message.

Fixes https://github.com/llvm/llvm-project/issues/88297.
Fixes https://github.com/llvm/llvm-project/issues/88804.
2024-04-16 21:24:25 +01:00
Florian Hahn
6254b6dd89
[VPlan] Version VPValue names in VPSlotTracker. (#81411)
This patch restructures the way names for printing VPValues are handled.
It moves the logic to generate names for printing to VPSlotTracker.

VPSlotTracker will now version names of the same underlying value if it
is used by multiple VPValues, by adding a .V suffix to the name.

This fixes cases where at the moment the same name is printed for
different VPValues.

PR: https://github.com/llvm/llvm-project/pull/81411
2024-04-15 12:27:45 +01:00
Alexey Bataev
413a66f339
[LV, VP]VP intrinsics support for the Loop Vectorizer + adding new tail-folding mode using EVL. (#76172)
This patch introduces generating VP intrinsics in the Loop Vectorizer.

Currently the Loop Vectorizer supports vector predication in a very
limited capacity via tail-folding and masked load/store/gather/scatter
intrinsics. However, this does not let architectures with active vector
length predication support take advantage of their capabilities.
Architectures with general masked predication support also can only take
advantage of predication on memory operations. By having a way for the
Loop Vectorizer to generate Vector Predication intrinsics, which (will)
provide a target-independent way to model predicated vector
instructions. These architectures can make better use of their
predication capabilities.

Our first approach (implemented in this patch) builds on top of the
existing tail-folding mechanism in the LV (just adds a new tail-folding
mode using EVL), but instead of generating masked intrinsics for memory
operations it generates VP intrinsics for loads/stores instructions. The
patch adds a new VPlanTransforms to replace the wide header predicate
compare with EVL and updates codegen for load/stores to use VP
store/load with EVL.

Other important part of this approach is how the Explicit Vector Length
is computed. (VP intrinsics define this vector length parameter as
Explicit Vector Length (EVL)). We use an experimental intrinsic
`get_vector_length`, that can be lowered to architecture specific
instruction(s) to compute EVL.

Also, added a new recipe to emit instructions for computing EVL. Using
VPlan in this way will eventually help build and compare VPlans
corresponding to different strategies and alternatives.

Differential Revision: https://reviews.llvm.org/D99750
2024-04-04 18:30:17 -04:00
Florian Hahn
911055e34f
[VPlan] Consistently use (Part, 0) for first lane scalar values (#80271)
At the moment, some VPInstructions create only a single scalar value,
but use VPTransformatState's 'vector' storage for this value. Those
values are effectively uniform-per-VF (or in some cases
uniform-across-VF-and-UF). Using the vector/per-part storage doesn't
interact well with other recipes, that more accurately using (Part,
Lane) to look up scalar values and prevents VPInstructions creating
scalars from interacting with other recipes working with scalars.

This PR tries to unify handling of scalars by using (Part, 0) for scalar
values where only the first lane is demanded. This allows using
VPInstructions with other recipes like VPScalarCastRecipe and is also
needed when using VPInstructions in more cases otuside the vector loop
region to generate scalars.

Depends on https://github.com/llvm/llvm-project/pull/80269
2024-02-26 19:06:43 +00:00
Florian Hahn
9923d29cfa
[VPlan] Merge main VPlan verifer with HCFG verifier.
Unify VPlan verifiers in verifyVPlanIsValid. This adds verification for
various properties on blocks to the verifier used for VPlans generated
by the inner loop vectorizer. It also adds def-use checks for the
verifier used in the VPlan native path.

This drops the separate flag to enable HCFG verification. Instead, all
VPlans are verified once they have been created, if assertions are
enabled.

This also removes VPWidenPHIRecipe from VPHeaderPHIRecipe; it is used to
model any phi node in the native path.
2024-02-20 16:43:57 +00:00