2244 Commits

Author SHA1 Message Date
Florian Hahn
7f74651837
[VPlan] Use pointer to member 0 as VPInterleaveRecipe's pointer arg. (#106431)
Update VPInterleaveRecipe to always use the pointer to member 0 as
pointer argument. This in many cases helps to remove unneeded index
adjustments and simplifies VPInterleaveRecipe::execute.

In some rare cases, the address of member 0 does not dominate the insert
position of the interleave group. In those cases a PtrAdd VPInstruction
is emitted to compute the address of member 0 based on the address of
the insert position. Alternatively we could hoist the recipe computing
the address of member 0.
2024-10-06 22:53:13 +01:00
Florian Hahn
45b526afa2
[LV] Honor uniform-after-vectorization in setVectorizedCallDecision.
The legacy cost model always computes the cost for uniforms as cost of
VF = 1, but VPWidenCallRecipes would be created, as
setVectorizedCallDecisions would not consider uniform calls.

Fix setVectorizedCallDecision to set to Scalarize, if the call is
uniform-after-vectorization.

This fixes a bug in VPlan construction uncovered by the VPlan-based
cost model.

Fixes https://github.com/llvm/llvm-project/issues/111040.
2024-10-06 10:35:06 +01:00
Florian Hahn
9de327c94d
[LV] Generalize predication checks from 2c8836c899 for operands.
This fixes another case where the VPlan-based and legacy cost models
disagree. If any of the operands is predicated, it can't be trivially
hoisted and we should consider the cost for evaluating it each loop
iteration.

Fixes https://github.com/llvm/llvm-project/issues/108697.
2024-10-02 20:16:41 +01:00
Florian Hahn
6d6eea92e3
[LV] Use SCEV to simplify wide binop operand to constant.
The legacy cost model uses SCEV to determine if the second operand of a
binary op is a constant. Update the VPlan construction logic to mirror
the current legacy behavior, to fix a difference in the cost models.

Fixes https://github.com/llvm/llvm-project/issues/109528.
Fixes https://github.com/llvm/llvm-project/issues/110440.
2024-10-02 13:45:49 +01:00
David Sherwood
0b2403197f
[LoopVectorize] In LoopVectorize.cpp start using getSymbolicMaxBackedgeTakenCount (#108833)
LoopVectorizationLegality currently only treats a loop as legal to vectorise
if PredicatedScalarEvolution::getBackedgeTakenCount returns a valid
SCEV, or more precisely that the loop must have an exact backedge taken
count. Therefore, in LoopVectorize.cpp we can safely replace all calls to
getBackedgeTakenCount with calls to getSymbolicMaxBackedgeTakenCount,
since the result is the same.

This also helps prepare the loop vectoriser for PR #88385.
2024-10-02 10:28:54 +01:00
Jeremy Morse
96f37ae453
[NFC] Use initial-stack-allocations for more data structures (#110544)
This replaces some of the most frequent offenders of using a DenseMap that
cause a malloc, where the typical element-count is small enough to fit in
an initial stack allocation.

Most of these are fairly obvious, one to highlight is the collectOffset
method of GEP instructions: if there's a GEP, of course it's going to have
at least one offset, but every time we've called collectOffset we end up
calling malloc as well for the DenseMap in the MapVector.
2024-09-30 23:15:18 +01:00
Ramkumar Ramachandra
1e0d3c68c7
LV: reuse getSmallBestKnownTC in a TC estimation (NFC) (#105834)
GeneratedRTChecks::getCost duplicates getSmallBestKnownTC partially,
when attempting to get the best trip-count estimate. Since the intent of
this code is to get the best trip-count estimate, and
getSmallBestKnownTC is written for exactly this purpose, replace the
partial code-duplication with a call to this function.
2024-09-30 15:46:02 +01:00
Mel Chen
f8373cb0f9
[LV] Reuse VPReplicateRecipe to handle scalar stores in exit block. (#106342)
This patch separates the computation of the final reduction result and
the intermediate stores of reduction.

---------

Co-authored-by: Florian Hahn <flo@fhahn.com>
2024-09-30 15:35:09 +08:00
Florian Hahn
2c8836c899
[LV] Don't consider predicated insts as invariant unconditionally in CM.
Predicated instructions cannot hoisted trivially, so don't treat them as
uniform value in the cost model.

This fixes a difference between legacy and VPlan-based cost model.

Fixes https://github.com/llvm/llvm-project/issues/110295.
2024-09-29 20:31:24 +01:00
Florian Hahn
1edd22030c
[LV] Retrieve reduction resume values directly for epilogue vec. (NFC)
Use the reduction resume values from the phis in the scalar header,
instead of collecting them in a map. This removes some complexity from
the general executePlan code paths and pushes it to only the epilogue
vectorization part.
2024-09-29 10:37:58 +01:00
Florian Hahn
deda2f03f8
[LV] Remove redundant InnerLoopUnroller class (NFCI).
All members of InnerLoopUnroller were removed previously and it now just
forwards to InnerLoopVectorizer's constructor. Remove it.
2024-09-28 20:49:14 +01:00
Florian Hahn
a4b27e7f27
[LV] Remove noalias intrinsics handling from scalarizeInstruction (NFC).
This is now handled by the explicit unroller.
2024-09-27 19:38:38 +01:00
Graham Hunter
6f1a8c2da2
[LV] Vectorize histogram operations (#99851)
This patch implements autovectorization support for the 'all-in-one'
histogram intrinsic, which seems to have more support than the
'standalone' intrinsic. See
https://discourse.llvm.org/t/rfc-vectorization-support-for-histogram-count-operations/74788/
for an overview of the work and my notes on the tradeoffs between the
two approaches.
2024-09-27 13:08:55 +01:00
Florian Hahn
2b125e899b
[LV] Don't pass loop preheader to getOrCreateVectorTripCount (NFCI).
The vector trip count must already be created when fixupIVUsers is
called. Don't pass the vector preheader there and delay retrieving the
vector loop header. This ensures we are re-using the already computed
trip count. Computing the trip count from scratch would not be correct,
as the IR may not be in a valid state yet.
2024-09-25 20:39:05 +01:00
Florian Hahn
aae7ac6685
[VPlan] Remove VPIteration, update to use directly VPLane instead (NFC)
After 8ec406757cb92 (https://github.com/llvm/llvm-project/pull/95842),
only the lane part of VPIteration is used.

Simplify the code by replacing remaining uses of VPIteration with VPLane directly.
2024-09-25 16:44:42 +01:00
Philip Reames
d288574363
[TTI][RISCV] Model cost of loading constants arms of selects and compares (#109824)
This follows in the spirit of 7d82c99403f615f6236334e698720bf979959704,
and extends the costing API for compares and selects to provide
information about the operands passed in an analogous manner. This
allows us to model the cost of materializing the vector constant, as
some select-of-constants are significantly more expensive than others
when you account for the cost of materializing the constants involved.

This is a stepping stone towards fixing
https://github.com/llvm/llvm-project/issues/109466. A separate SLP patch
will be required to utilize the new API.
2024-09-25 07:25:57 -07:00
Florian Hahn
4be1c19a9f
[VPlan] Adjust AnyOf after creating ComputeReductionResult (NFC).
Prepares for a follow-up change to use VPInstruction::ResumePhi to
create the resume phi for reductions.
2024-09-25 14:13:50 +01:00
David Sherwood
cda0cb3939
[NFC][LoopVectorize] Remove unused argument from fixupIVUsers (#109789)
The VectorHeader argument passed to fixupIVUsers is unused
and can be removed.
2024-09-25 08:56:37 +01:00
Florian Hahn
31ac400e45
[LV] Remove another reference of unrolled parts after 57f5d8f2fe.
Continue to clean up some now stale references of unroll parts and
related terminology as pointed out post-commit for 06c3a7d.
2024-09-24 16:22:02 +01:00
Florian Hahn
3fbf6f8bb1
[LV] Remove more references of unrolled parts after 57f5d8f2fe.
Continue to clean up some now stale references of unroll parts and
related terminology as pointed out post-commit for 06c3a7d.
2024-09-24 15:50:31 +01:00
David Sherwood
f4eeae1244
[LoopVectorize] Address comments on PR #107004 left post-commit (#109300)
* Rename Speculative -> Uncountable and update tests.
* Add comments explaining why it's safe to ignore the predicates when
building up a list of exiting blocks.
* Reshuffle some code to do (hopefully) cheaper checks first.
2024-09-23 13:36:25 +01:00
Graham Hunter
785337e2d9
[LV][AArch64] Don't query registers for illegal scalable vector elts (#109411)
When trying to maximize vector bandwidth we ask TTI for the number of
registers required for a given operation. If the type of that operation
happens to be something illegal for scalable vectors (e.g.
<vscale x 4 x fp128>) then we would see a crash.

Instead, just return a default value and let the cost model reject the
invalid operation later.
2024-09-23 13:35:23 +01:00
Florian Hahn
57f5d8f2fe
[VPlan] Only store single vector per VPValue in VPTransformState. (NFC)
After 8ec406757cb92 (https://github.com/llvm/llvm-project/pull/95842),
VPTransformState only stores a single vector value per VPValue.

Simplify the code by replacing the SmallVector in PerPartOutput with a
single Value * and rename to VPV2Vector for clarity.

Also remove the redundant Part argument from various accessors.
2024-09-23 11:28:24 +01:00
Florian Hahn
06c3a7d2d7
[VPlan] Remove unneeded State.UF after 8ec406757cb92 (NFC).
State.UF is not needed any longer after 8ec406757cb92
(https://github.com/llvm/llvm-project/pull/95842). Clean it up,
simplifying ::execute of existing recipes.
2024-09-22 20:42:37 +01:00
Florian Hahn
8ec406757c
[VPlan] Implement unrolling as VPlan-to-VPlan transform. (#95842)
This patch implements explicit unrolling by UF  as VPlan transform. In
follow up patches this will allow simplifying VPTransform state (no need
to store unrolled parts) as well as recipe execution (no need to
generate code for multiple parts in an each recipe). It also allows for
more general optimziations (e.g. avoid generating code for recipes that
are uniform-across parts).

It also unifies the logic dealing with unrolled parts in a single place,
rather than spreading it out across multiple places (e.g. VPlan post
processing for header-phi recipes previously.)

In the initial implementation, a number of recipes still take the
unrolled part as additional, optional argument, if their execution
depends on the unrolled part.

The computation for start/step values for scalable inductions changed
slightly. Previously the step would be computed as scalar and then
splatted, now vscale gets splatted and multiplied by the step in a
vector mul.

This has been split off https://github.com/llvm/llvm-project/pull/94339
which also includes changes to simplify VPTransfomState and recipes'
::execute.

The current version mostly leaves existing ::execute untouched and
instead sets VPTransfomState::UF to 1.

A follow-up patch will clean up all references to VPTransformState::UF.

Another follow-up patch will simplify VPTransformState to only store a
single vector value per VPValue.

PR: https://github.com/llvm/llvm-project/pull/95842
2024-09-21 19:47:37 +01:00
Youngsuk Kim
d31e314131 [llvm] Don't call raw_string_ostream::flush() (NFC)
Don't call raw_string_ostream::flush(), which is essentially a no-op.
As specified in the docs, raw_string_ostream is always unbuffered.
( 65b13610a5226b84889b923bae884ba395ad084d for further reference )
2024-09-20 12:19:59 -05:00
Florian Hahn
a861ed411a
[VPlan] Add initial loop-invariant code motion transform. (#107894)
Add initial transform to move out loop-invariant recipes.

This also helps to fix a divergence between legacy and VPlan-based cost
model due to legacy using ScalarEvolution::isLoopInvariant in some
cases.

Fixes https://github.com/llvm/llvm-project/issues/107501.

PR: https://github.com/llvm/llvm-project/pull/107894
2024-09-20 11:22:03 +01:00
Florian Hahn
96ba9d372d
[VPlan] Only consider recipes in loop region in planContainsSimp. (NFCI)
Limit checks in planContainsAdditionalSimplifications to recipes in the
vector loop region.

Preparation for https://github.com/llvm/llvm-project/pull/107894.
2024-09-19 20:11:13 +01:00
Jay Foad
e03f427196
[LLVM] Use {} instead of std::nullopt to initialize empty ArrayRef (#109133)
It is almost always simpler to use {} instead of std::nullopt to
initialize an empty ArrayRef. This patch changes all occurrences I could
find in LLVM itself. In future the ArrayRef(std::nullopt_t) constructor
could be deprecated or removed.
2024-09-19 16:16:38 +01:00
Florian Hahn
256100489d
[VPlan] Rename isDefinedOutside[Vector]Regions -> [Loop] (NFC)
Clarify name of helper, split off from
https://github.com/llvm/llvm-project/pull/95842/files#r1765556732.
2024-09-19 11:20:31 +01:00
David Sherwood
e762d4dac7
[LoopVectorize] Teach LoopVectorizationLegality about more early exits (#107004)
This patch is split off from PR #88385 and concerns only the code
related to the legality of vectorising early exit loops. It is the first
step in adding support for vectorisation of a simple class of loops that
typically involves searching for something, i.e.

  for (int i = 0; i < n; i++) {
    if (p[i] == val)
      return i;
  }
  return n;

or

  for (int i = 0; i < n; i++) {
    if (p1[i] != p2[i])
      return i;
  }
  return n;

In this initial commit LoopVectorizationLegality will only consider
early exit loops legal for vectorising if they follow these criteria:

1. There are no stores in the loop.
2. The loop must have only one early exit like those shown in the above
example. I have referred to such exits as speculative early exits, to
distinguish from existing support for early exits where the
exit-not-taken count is known exactly at compile time.
3. The early exit block dominates the latch block.
4. The latch block must have an exact exit count.
5. There are no loads after the early exit block.
6. The loop must not contain reductions or recurrences. I don't see
anything fundamental blocking vectorisation of such loops, but I just
haven't done the work to support them yet.
7. We must be able to prove at compile-time that loops will not contain
faulting loads.

Tests have been added here:

  Transforms/LoopVectorize/AArch64/simple_early_exit.ll
2024-09-19 09:41:25 +01:00
Florian Hahn
0d736e296c
[VPlan] Add getSCEVExprForVPValue util, use to get trip count SCEV (NFC) (#94464)
Add a new getSCEVExprForVPValue utility which can be used to get a SCEV
expression for a VPValue. The initial implementation only returns SCEVs
for live-in IR values (by constructing a SCEV based on the live-in IR
value) and VPExpandSCEVRecipe. This is enough to serve its first use,
getting a SCEV for a VPlan's trip count, but will be extended in the
future.

It also removes createTripCountSCEV, as the new helper can be used to
retrieve the SCEV from the VPlan.

PR: https://github.com/llvm/llvm-project/pull/94464
2024-09-18 14:41:56 +01:00
David Sherwood
b29c5b66fd
[NFC][LoopVectorize] Dont pass LLVMContext to VPTypeAnalysis constructor (#108540)
We already pass a Type object into the VPTypeAnalysis constructor, which
can be used to obtain the context. While in the same area it also made
sense to avoid passing the context into the VPTransformState and
VPCostContext constructors.
2024-09-16 09:12:11 +01:00
Florian Hahn
012dbec604
[VPlan] Handle ForceTargetInstructionCost in during precomputeCosts.
Make sure ForceTargetInstruction is respected in precomputeCosts.
2024-09-15 10:53:43 +01:00
Florian Hahn
cfe3f5fa61
[VPlan] Remove unneeded ExitBB variable after f0c5caa814.
Fix buildbot failures due to an unused variable, e.g.
https://lab.llvm.org/buildbot/#/builders/186/builds/2329
2024-09-14 21:35:45 +01:00
Florian Hahn
f0c5caa814
[VPlan] Add VPIRInstruction, use for exit block live-outs. (#100735)
Add a new VPIRInstruction recipe to wrap existing IR instructions not to
be modified during execution, execept for PHIs. For PHIs, a single
VPValue
operand is allowed, and it is used to add a new incoming value for the
single predecessor VPBB. Expect PHIs, VPIRInstructions cannot have any
operands.

Depends on https://github.com/llvm/llvm-project/pull/100658.

PR: https://github.com/llvm/llvm-project/pull/100735
2024-09-14 21:21:55 +01:00
Ramkumar Ramachandra
75a57edadc
VPlan/Builder: inline VPBuilder::createICmp (NFC) (#105650)
Inline VPBuilder::createICmp in the header, in line with the other
VPBuilder functions.
2024-09-13 20:08:11 +01:00
Florian Hahn
76fd69be74
[VPlan] Simplify VPBuilder insert point when live outs for FORs.
Simplifies setting the insert point, addressing a TODO.
2024-09-13 13:21:23 +01:00
David Sherwood
f3029b330a
[NFC][LoopVectorize] Avoid passing ScalarEvolution to VPlanTransforms::optimize (#108380)
Whilst trying to write some VPlan unit tests I realised
that we don't need to pass a ScalarEvolution object into
VPlanTransforms::optimize because the only thing we
actually need is a LLVMContext.
2024-09-13 12:09:00 +01:00
Florian Hahn
08d294df55
[VPlan] Simplify VPBuilder insert point when adding users in exit block.
Simplifies setting the insert point, addressing a TODO.
2024-09-12 22:47:03 +01:00
Florian Hahn
71cb7811bb
[LV] Remove stale completeLoopSkeleton (NFCI).
The function has been removed a while ago, also remove the stable
declaration.
2024-09-12 21:55:43 +01:00
Florian Hahn
ea83e1c05a
[LV] Assign cost to all interleave members when not interleaving.
At the moment, the full cost of all interleave group members is assigned
to the instruction at the group's insert position, even if the decision
was to not form an interleave group.

This can lead to inaccurate cost estimates, e.g. if the instruction at
the insert position is dead. If the decision is to not vectorize but
scalarize or scather/gather, then the cost will be to total cost for all
members. In those cases, assign individual the cost per member, to more
closely reflect to choice per instruction.

This fixes a divergence between legacy and VPlan-based cost model.

Fixes https://github.com/llvm/llvm-project/issues/108098.
2024-09-11 21:04:34 +01:00
Hari Limaye
7858e14547
[LV] Amend check for IV increments in collectUsersInEntryBlock (#108020)
The check for IV increments in collectUsersInEntryBlock currently
triggers for exit-block PHIs which use the IV start value, resulting in
us failing to add the input value for the middle block to these PHIs.

Fix this by amending the check for IV increments to only include
incoming values that are instructions inside the loop.

Fixes #108004
2024-09-11 16:43:34 +01:00
Florian Hahn
e3c537ff90
[VPlan] Consider non-header phis in planContainsAdditionalSimp.
Update planContainsAdditionalSimplifications to also check phis not in
the loop header. This ensures we don't miss cases where VPBlendRecipes
(which correspond to such phis) have been simplified.

Fixes https://github.com/llvm/llvm-project/issues/107473.
2024-09-10 21:37:14 +01:00
Florian Hahn
a794ee4559
[VPlan] Add VPValue for VF, use it for VPWidenIntOrFpInductionRecipe. (#95305)
Similar to VFxUF, also add a VF VPValue to VPlan and use it to get the
runtime VF in VPWidenIntOrFpInductionRecipe. Code for VF is only
generated if there are users of VF, to avoid unnecessary test changes.

PR: https://github.com/llvm/llvm-project/pull/95305
2024-09-10 10:41:35 +01:00
Simon Pilgrim
97e6f92d31 Fix GCC Wparentheses warning. NFC. 2024-09-08 13:34:34 +01:00
Ramkumar Ramachandra
a6577791d4
LV: fix style after cursory reading (NFC) (#105830) 2024-09-06 18:41:56 +01:00
Florian Hahn
cf2ecc7c1c
[LV] Remove over-aggressive assert from 3fe6a064f15c.
There are some cases where only the first operand is marked for
truncation. In that case, the compare won't be truncated which would
incorrectly trigger the assertion.

It also shows that the check pre 3fe6a064f15c also considered compares
truncated that cannot be truncated.
2024-09-05 18:20:16 +01:00
Florian Hahn
3fe6a064f1
[LV] Check if compare is truncated directly in getInstructionCost.
The current check for truncated compares in getInstructionCost misses
cases where either the first or both operands are constants.
Check directly if the compare is marked for truncation. In that case,
the minimum bitwidth is that of the operands.

The patch also adds asserts to ensure that.

This fixes a divergence between legacy and VPlan-based cost model, where
the legacy cost model incorrectly estimated the cost of compares with
truncated operands.

Fixes https://github.com/llvm/llvm-project/issues/107171.
2024-09-04 20:50:06 +01:00
Florian Hahn
3bd161e98d
[LV] Honor forced scalars in setVectorizedCallDecision.
Similarly to dd94537b4, setVectorizedCallDecision also did not consider
ForcedScalars. This lead to VPlans not reflecting the decision by the
legacy cost model (cost computation would use scalar cost, VPlan would
have VPWidenCallRecipe).

To fix this, check if the call has been forced to scalar in
setVectorizedCallDecision.

Note that this requires moving setVectorizedCallDecision after
collectLoopUniforms (which sets ForcedScalars). collectLoopUniforms does
not depend on call decisions and can safely be moved.

Fixes https://github.com/llvm/llvm-project/issues/107051.
2024-09-03 21:06:32 +01:00