231 Commits

Author SHA1 Message Date
Florian Hahn
53266f73f0
[VPlan] Run DCE after unrolling.
This cleans up a number of dead recipes after unrolling if only their
first or last parts are used. This simplifies a number of tests.

Fixes https://github.com/llvm/llvm-project/issues/109581.
2024-09-22 22:08:46 +01:00
Florian Hahn
8ec406757c
[VPlan] Implement unrolling as VPlan-to-VPlan transform. (#95842)
This patch implements explicit unrolling by UF  as VPlan transform. In
follow up patches this will allow simplifying VPTransform state (no need
to store unrolled parts) as well as recipe execution (no need to
generate code for multiple parts in an each recipe). It also allows for
more general optimziations (e.g. avoid generating code for recipes that
are uniform-across parts).

It also unifies the logic dealing with unrolled parts in a single place,
rather than spreading it out across multiple places (e.g. VPlan post
processing for header-phi recipes previously.)

In the initial implementation, a number of recipes still take the
unrolled part as additional, optional argument, if their execution
depends on the unrolled part.

The computation for start/step values for scalable inductions changed
slightly. Previously the step would be computed as scalar and then
splatted, now vscale gets splatted and multiplied by the step in a
vector mul.

This has been split off https://github.com/llvm/llvm-project/pull/94339
which also includes changes to simplify VPTransfomState and recipes'
::execute.

The current version mostly leaves existing ::execute untouched and
instead sets VPTransfomState::UF to 1.

A follow-up patch will clean up all references to VPTransformState::UF.

Another follow-up patch will simplify VPTransformState to only store a
single vector value per VPValue.

PR: https://github.com/llvm/llvm-project/pull/95842
2024-09-21 19:47:37 +01:00
Florian Hahn
4eb9838409
[VPlan] Generalize VPValue::isDefinedOutsideLoopRegions.
Update isDefinedOutsideLoopRegions to check if a recipe is defined
outside any region. Split off already approved
https://github.com/llvm/llvm-project/pull/95842 now that this can be
tested separately after landing VPlan-based LICM
https://github.com/llvm/llvm-project/issues/107501
2024-09-20 15:34:00 +01:00
Florian Hahn
a861ed411a
[VPlan] Add initial loop-invariant code motion transform. (#107894)
Add initial transform to move out loop-invariant recipes.

This also helps to fix a divergence between legacy and VPlan-based cost
model due to legacy using ScalarEvolution::isLoopInvariant in some
cases.

Fixes https://github.com/llvm/llvm-project/issues/107501.

PR: https://github.com/llvm/llvm-project/pull/107894
2024-09-20 11:22:03 +01:00
Shih-Po Hung
ffcff2f465
[VPlan][NFC] Fix the value name of VECTOR_GEP (#107544)
This patch passes the string `"vector.gep"` to CreateGEP instead of
CreateMul.
2024-09-18 19:22:36 +08:00
LiqinWeng
a2994b2999
[LV][NFC] Unify printing for WidenEVLReicpe with other EVL recipes (#108177) 2024-09-18 15:03:37 +08:00
Luke Lau
30d7dcc1db [RISCV] Add asserts requirement to loop vectorizer tests
Hopefully this fixes a buildbot failure on fuchsia where opt doesn't
have -debug-only
2024-09-17 14:18:36 +08:00
Luke Lau
41f1b467a2
[RISCV] Account for zvfhmin and zvfbfmin promotion in register usage (#108370)
A half with only zvfhmin or bfloat will end up getting promoted to a f32
for most instructions.

Unless the loop consists only of memory ops and permutation instructions
which don't need promoted (is this common?), we'll end up using double
the LMUL than what's currently being returned by getRegUsageForType.

Since this is used by the loop vectorizer, it seems better to be
conservative and assume that any usage of a zvfhmin half/bfloat will end
up being widened to a f32
2024-09-17 13:50:19 +08:00
Florian Hahn
f0c5caa814
[VPlan] Add VPIRInstruction, use for exit block live-outs. (#100735)
Add a new VPIRInstruction recipe to wrap existing IR instructions not to
be modified during execution, execept for PHIs. For PHIs, a single
VPValue
operand is allowed, and it is used to add a new incoming value for the
single predecessor VPBB. Expect PHIs, VPIRInstructions cannot have any
operands.

Depends on https://github.com/llvm/llvm-project/pull/100658.

PR: https://github.com/llvm/llvm-project/pull/100735
2024-09-14 21:21:55 +01:00
Florian Hahn
ea83e1c05a
[LV] Assign cost to all interleave members when not interleaving.
At the moment, the full cost of all interleave group members is assigned
to the instruction at the group's insert position, even if the decision
was to not form an interleave group.

This can lead to inaccurate cost estimates, e.g. if the instruction at
the insert position is dead. If the decision is to not vectorize but
scalarize or scather/gather, then the cost will be to total cost for all
members. In those cases, assign individual the cost per member, to more
closely reflect to choice per instruction.

This fixes a divergence between legacy and VPlan-based cost model.

Fixes https://github.com/llvm/llvm-project/issues/108098.
2024-09-11 21:04:34 +01:00
Florian Hahn
e3c537ff90
[VPlan] Consider non-header phis in planContainsAdditionalSimp.
Update planContainsAdditionalSimplifications to also check phis not in
the loop header. This ensures we don't miss cases where VPBlendRecipes
(which correspond to such phis) have been simplified.

Fixes https://github.com/llvm/llvm-project/issues/107473.
2024-09-10 21:37:14 +01:00
Florian Hahn
a794ee4559
[VPlan] Add VPValue for VF, use it for VPWidenIntOrFpInductionRecipe. (#95305)
Similar to VFxUF, also add a VF VPValue to VPlan and use it to get the
runtime VF in VPWidenIntOrFpInductionRecipe. Code for VF is only
generated if there are users of VF, to avoid unnecessary test changes.

PR: https://github.com/llvm/llvm-project/pull/95305
2024-09-10 10:41:35 +01:00
Kolya Panchenko
00e40c9b5b
[LV] Support binary and unary operations with EVL-vectorization (#93854)
The patch adds `VPWidenEVLRecipe` which represents `VPWidenRecipe` + EVL
argument. The new recipe replaces `VPWidenRecipe` in
`tryAddExplicitVectorLength` for each binary and unary operations.
Follow up patches will extend support for remaining cases, like `FCmp`
and `ICmp`
2024-09-06 11:41:36 -04:00
Florian Hahn
cf2ecc7c1c
[LV] Remove over-aggressive assert from 3fe6a064f15c.
There are some cases where only the first operand is marked for
truncation. In that case, the compare won't be truncated which would
incorrectly trigger the assertion.

It also shows that the check pre 3fe6a064f15c also considered compares
truncated that cannot be truncated.
2024-09-05 18:20:16 +01:00
Florian Hahn
3fe6a064f1
[LV] Check if compare is truncated directly in getInstructionCost.
The current check for truncated compares in getInstructionCost misses
cases where either the first or both operands are constants.
Check directly if the compare is marked for truncation. In that case,
the minimum bitwidth is that of the operands.

The patch also adds asserts to ensure that.

This fixes a divergence between legacy and VPlan-based cost model, where
the legacy cost model incorrectly estimated the cost of compares with
truncated operands.

Fixes https://github.com/llvm/llvm-project/issues/107171.
2024-09-04 20:50:06 +01:00
Philip Reames
1fbb6b4efc
[LV] Prefer FLT_MIN/MAX for fmin/fmax reductions with ninf (#107141)
Analogous to 2c7786e94a1058bd4f96794a1d4f70dcb86e5cc5, cleanup a case
where the vectorizer is emitting a non-canonical identity value given
the available flags. We use largest/smallest value during ISEL, and VP
expansion, but not during vectorization.

Since the fmin/fmax/fminimum/fmaximum intrinsics don't require a start
value, this difference is only visible when masking of inactive lanes is
required.

Primary motivation of this change is simply to remove a difference
between version of code which reason about the identity value of a
reduction so I can kill all but one off.

In review, it was pointed out that this is actually a functional fix as well. 
The old code used inf on a noinf reduction instruction - whose
result is poison!  That wasn't the intent of the code.
2024-09-03 12:21:54 -07:00
Philip Reames
2c7786e94a
Prefer use of 0.0 over -0.0 for fadd reductions w/nsz (in IR) (#106770)
This is a follow up to 924907bc6, and is mostly motivated by consistency
but does include one additional optimization. In general, we prefer 0.0
over -0.0 as the identity value for an fadd. We use that value in
several places, but don't in others. So, let's be consistent and use the
same identity (when nsz allows) everywhere.

This creates a bunch of test churn, but due to 924907bc6, most of that
churn doesn't actually indicate a change in codegen. The exception is
that this change enables the use of 0.0 for nsz, but *not* reasoc, fadd
reductions. Or said differently, it allows the neutral value of an
ordered fadd reduction to be 0.0.
2024-09-03 09:16:37 -07:00
Florian Hahn
654bb4e9f2
[LV] Don't consider branches leaving loop in collectValuesToIgnore.
Branches exiting the loop will remain regardless, so don't consider them
in collectValuesToIgnore.

This fixes another divergence between legacy and VPlan-based cost model.

Fixes https://github.com/llvm/llvm-project/issues/106780.
2024-09-01 20:35:36 +01:00
Florian Hahn
f0e34f3818
[VPlan] Don't skip optimizable truncs in planContainsAdditionalSimps.
A optimizable cast can also be removed by VPlan simplifications. Remove
the restriction from planContainsAdditionalSimplifications, as this
causes it to miss relevant simplifications, triggering false positives
for the cost decision verification.

Also adds debug output for printing additional cost-precomputations.

Fixes https://github.com/llvm/llvm-project/issues/106641.
2024-08-30 11:29:30 +01:00
Florian Hahn
c4906588ce
[VPlan] Use skipCostComputation when pre-computing induction costs.
This ensures we skip any instructions identified to be ignored by the
legacy cost model as well. Fixes a divergence between legacy and
VPlan-based cost model.

Fixes https://github.com/llvm/llvm-project/issues/106417.
2024-08-29 21:20:00 +01:00
Maciej Gabka
95d2d1cba0
Move stepvector intrinsic out of experimental namespace (#98043)
This patch is moving out stepvector intrinsic from the experimental
namespace.

This intrinsic exists in LLVM for several years now, and is widely used.
2024-08-28 12:48:20 +01:00
Mel Chen
dfde1a7232
[LV][NFC] Update and clean up the test case LoopVectorize/RISCV/inloop-reduction.ll. (#102907) 2024-08-28 17:46:58 +08:00
Florian Hahn
cb4efe1d07
[VPlan] Don't trigger VF assertion if VPlan has extra simplifications.
There are cases where VPlans contain some simplifications that are very
hard to accurately account for up-front in the legacy cost model. Those
cases are caused by un-simplified inputs, which trigger the assert
ensuring both the legacy and VPlan-based cost model agree on the VF.

To avoid false positives due to missed simplifications in general, only
trigger the assert if the chosen VPlan doesn't contain any additional
simplifications.

Fixes https://github.com/llvm/llvm-project/issues/104714.
Fixes https://github.com/llvm/llvm-project/issues/105713.
2024-08-22 21:38:06 +01:00
Florian Hahn
4e04286d61
[VPlan] Only use selectVectorizationFactor for cross-check (NFCI). (#103033)
Use getBestVF to select VF up-front and only use
selectVectorizationFactor to get the VF legacy VF to check the
vectorization decision matches the VPlan-based cost model.

PR: https://github.com/llvm/llvm-project/pull/103033
2024-08-21 13:09:01 +02:00
Florian Hahn
99741ac285
[VPlan] Introduce explicit ExtractFromEnd recipes for live-outs. (#100658)
Introduce explicit ExtractFromEnd recipes to extract the final values
for live-outs instead of implicitly extracting in VPLiveOut::fixPhi.

This is a follow-up to the recent changes of modeling extracts for
recurrences and consolidates live-out extract creation for fixed-order
recurrences at a single place: addLiveOutsForFirstOrderRecurrences.

It is also in preparation of replacing VPLiveOut with VPIRInstructions
wrapping the original scalar phis.

PR: https://github.com/llvm/llvm-project/pull/100658
2024-08-21 10:06:44 +02:00
Florian Hahn
e9e3a183d6
[LV] Don't cost branches and conditions to empty blocks.
Update the legacy cost model skip branches with successors blocks
that are empty or only contain dead instructions, together with their
conditions. Such branches and conditions won't result in any
generated code and will be cleaned up by VPlan transforms.

This fixes a difference between the legacy and VPlan-based cost model.

When running LV in its usual pipeline position, such dead blocks should
already have been cleaned up, but they might be generated manually or by
fuzzers.

Fixes https://github.com/llvm/llvm-project/issues/100591.
2024-08-18 12:51:17 +01:00
Florian Hahn
c7a44ec031
[VPlan] Check successors in VPlan to check if scalar epi required (NFC)
Now that the branches to the scalar epilogue are modeled in VPlan
directly, check the VPlan to see if a scalar epilogue is required.

Preparation for https://github.com/llvm/llvm-project/pull/100658.
2024-08-12 15:33:52 +01:00
Craig Topper
59728193a6
[RISCV] Disable fixed length vectors with Zve32* without Zvl64b. (#102405)
Fixed length vectors use scalable vector containers. With Zve32* and not
Zvl64b, vscale is a 0.5 due RVVBitsPerBlock being 64.

To support this correctly we need to lower RVVBitsPerBlock to 32 and
change our type mapping. But we need to RVVBitsPerBlock to alway be
>= ELEN.  This means we need two different mapping depending on ELEN.

That is a non-trivial amount of work so disable fixed lenght vectors
without Zvl64b for now.

We had almost no tests for Zve32x without Zvl64b which is probably why
we never realized that it was broken.

Fixes #102352.
2024-08-08 09:17:43 -07:00
Philip Reames
d3fd28a134
[RISCV][TTI] Properly model odd vector sized LD/ST operations (#100436)
The motivation for this change is the costing of a LD or ST with nearly
power of 2 vectors (e.g. <3 x i32> or <7 x i32>) on V. There's an
experimental option in SLP to allow emitting these if the cost model
says they're profitable. This really helps with e.g. RGB vectors.

Our actual lowering for these depends on whether a wider container type
is known available. If so, we use a vle or vse on the wider type with a
restricted VL. If not, we split until a legal type is found, and then
apply the vle/vse on the sub-pieces.

This change is intentionally restricted to only the case where promotion
(widening w/VL predication) is involved. We appear to have at least one
bug in our splitting lowering (see discussion on review), and to avoid
exposing this more widely, I chose to not adjust costs for the splitting
case. The current splitting costing assumes scalarization (which is not
true of the actual lowering), but that has the effect of biasing
vectorization away from such cases strongly.

For the widening case, the true cost scales with the next largest legal
type. The default implementation assumes that such a type is scalarized.
Changing that brings our cost in line with our actual lowering decision.
Note that since scalarization is not possible for scalable types, the
prior costing falsely returned Invalid for that case.
2024-07-26 12:52:20 -07:00
Florian Hahn
67a55e01e3
[VPlan] Replace getBestPlan by getBestVF use also for epilogue vec. (#98821)
Replace getBestPlan by getBestVF which simply finds the best
VF out of the VFs for the available VPlans.

Then use getBestPlan to retrieve the corresponding VPlan.

This allows using getBestVF & getBestPlan for epilogue vectorization
as well. As the same plan may be used to vectorize both the main
and epilogue loop, restricting the VF of the best plan would cause
issues.

PR: https://github.com/llvm/llvm-project/pull/98821
2024-07-26 14:06:46 +01:00
Alexey Bataev
7432ad6af5
[LV][VP][NFC]Add tests for safe store/load forwarding/dependence distance.
Reviewers: fhahn

Reviewed By: fhahn

Pull Request: https://github.com/llvm/llvm-project/pull/100635
2024-07-25 19:55:37 -04:00
Philip Reames
ea202f9f2e [LV,RISCV] Regenerate a test to reduce spurious deltas in upcoming change 2024-07-25 12:22:59 -07:00
Florian Hahn
b72689a5cb
[LV] Ignore live-out users in cost model if scalar epilogue is required.
Follow-up to ba8126b6fef79.

If a scalar epilogue is required, users outside the loop won't use
live-outs from the vector loop but from the scalar epilogue. Ignore them if
that is the case.

This fixes another case where the VPlan-based cost-model more accurately
computes cost.

Fixes https://github.com/llvm/llvm-project/issues/100464.
2024-07-25 11:16:18 +01:00
Florian Hahn
ba8126b6fe
[LV] Mark dead instructions in loop as free.
Update collectValuesToIgnore to also ignore dead instructions in the
loop. Such instructions will be removed by VPlan-based DCE and won't be
considered by the VPlan-based cost model.

This closes a gap between the legacy and VPlan-based cost model. In
practice with the default pipelines, there shouldn't be any dead
instructions in loops reaching LoopVectorize, but it is easy to generate
such cases by hand or automatically via fuzzers.

Fixes https://github.com/llvm/llvm-project/issues/99701.
2024-07-24 09:31:32 +01:00
Luke Lau
58854facb3
[RISCV] Don't cost vector arithmetic fp ops as cheaper than scalar (#99594)
I was comparing some SPEC CPU 2017 benchmarks across rva22u64 and
rva22u64_v, and noticed that in a few cases that rva22u64_v was
considerably slower.

One of them was 519.lbm_r, which has a large loop that was being
unprofitably vectorized. It has an if/else in the loop which requires
large amounts of predication when vectorized, but despite the loop
vectorizer taking this into account the vector cost came out as cheaper
than the scalar.

It looks like the reason for this is because we cost scalar floating
point ops as 2, but their vector equivalents as 1 (for LMUL 1). This
comes from how we use BasicTTIImpl for scalars which treats floats as
twice as expensive as integers.

This patch doubles the cost of vector floating point arithmetic ops so
that they're at least as expensive as their scalar counterparts, which
gives a 13% speedup on 519.lbm_r at -O3 on the spacemit-x60.

Fixes #62576 (the last point there about scalar fsub/fmul)
2024-07-22 13:56:10 +08:00
Mel Chen
4eb30cfb34
[LV][EVL] Support in-loop reduction using tail folding with EVL. (#90184)
Following from #87816, add VPReductionEVLRecipe to describe vector
predication reduction.

Address one of TODOs from #76172.
2024-07-16 16:15:24 +08:00
Mel Chen
a00754bb2a
[LV] Fix the cost of min/max reductions. (#98453)
This patch updates the function `getReductionPatternCost` to handle the
cost of min/max reductions by `TTI.getMinMaxReductionCost`.
2024-07-12 13:47:33 +08:00
Florian Hahn
9a5a8731e7
[VPlan] Introduce ResumePhi VPInstruction, use to create phi for FOR. (#94760)
This patch introduces a new ResumePhi VPInstruction which creates a phi
in a leaf block of a VPlan. The first use is to create the phi node for
fixed-order recurrence resume values in the scalar preheader.

The VPInstruction takes 2 operands: 1) the incoming value from the
middle-block and a default value to be used for all other incoming
blocks.

In follow-up changes, it will also be used to create phis for reduction
and induction resume values.

Depends on https://github.com/llvm/llvm-project/pull/92651

PR: https://github.com/llvm/llvm-project/pull/94760
2024-07-11 16:08:04 +01:00
Florian Hahn
67f4968a57
[LV] Skip cost for ZExt/SExts that will be removed by truncating ops.
If an extend is truncated, it will be removed if the result type is <=
the source type, as there is nothing to extend. Return a cost of 0.

This was caught by the first step to perform cost-modeling based on
VPlan (b841e2e), as the legacy cost model would query the cost of an
invalid extend, while the extend has been folded away by VPlan
transforms.

Fixes https://github.com/llvm/llvm-project/issues/98413.
2024-07-11 11:40:14 +01:00
Florian Hahn
b841e2eca3
Recommit "[VPlan] First step towards VPlan cost modeling. (#92555)"
This reverts commit 6f538f6a2d3224efda985e9eb09012fa4275ea92.

A number of crashes have been fixed by separate fixes, including
ttps://github.com/llvm/llvm-project/pull/96622. This version of the
PR also pre-computes the costs for branches (except the latch) instead
of computing their costs as part of costing of replicate regions, as
there may not be a direct correspondence between original branches and
number of replicate regions.

Original message:
This adds a new interface to compute the cost of recipes, VPBasicBlocks,
VPRegionBlocks and VPlan, initially falling back to the legacy cost model
for all recipes. Follow-up patches will gradually migrate recipes to
compute their own costs step-by-step.

It also adds getBestPlan function to LVP which computes the cost of all
VPlans and picks the most profitable one together with the most
profitable VF.

The VPlan selected by the VPlan cost model is executed and there is an
assert to catch cases where the VPlan cost model and the legacy cost
model disagree. Even though I checked a number of different build
configurations on AArch64 and X86, there may be some differences
that have been missed.

Additional discussions and context can be found in @arcbbb's
https://github.com/llvm/llvm-project/pull/67647 and
https://github.com/llvm/llvm-project/pull/67934 which is an earlier
version of the current PR.

PR: https://github.com/llvm/llvm-project/pull/92555
2024-07-10 14:22:21 +01:00
Florian Hahn
0577cdaa32
[LV] Split checking if tail-folding is possible, collecting masked ops. (#77612)
Introduce new canFoldTail helper which only checks if tail-folding is
possible, but without modifying MaskedOps.

Just because tail-folding is possible doesn't mean the tail will be
folded; that's up to the cost-model to decide. Separating the check if
tail-folding is possible and preparing for tail-folding makes sure that
MaskedOps is only populated when tail-folding is actually selected.

PR: https://github.com/llvm/llvm-project/pull/77612
2024-07-08 16:34:42 +01:00
Florian Hahn
99d6c6d936
[VPlan] Model branch cond to enter scalar epilogue in VPlan. (#92651)
This patch moves branch condition creation to enter the scalar epilogue
loop to VPlan. Modeling the branch in the middle block also requires
modeling the successor blocks. This is done using the recently
introduced VPIRBasicBlock.

Note that the middle.block is still created as part of the skeleton and
then patched in during VPlan execution. Unfortunately the skeleton needs
to create the middle.block early on, as it is also used for induction
resume value creation and is also needed to properly update the
dominator tree during skeleton creation.

After this patch lands, I plan to move induction resume value and phi
node creation in the scalar preheader to VPlan. Once that is done, we
should be able to create the middle.block in VPlan directly.

This is a re-worked version based on the earlier
https://reviews.llvm.org/D150398 and the main change is the use of
VPIRBasicBlock.

Depends on https://github.com/llvm/llvm-project/pull/92525

PR: https://github.com/llvm/llvm-project/pull/92651
2024-07-05 10:08:42 +01:00
Florian Hahn
2b3b405b09
[LV] Don't vectorize first-order recurrence with VF <vscale x 1 x ..>
The assertion added as part of https://github.com/llvm/llvm-project/pull/93395
surfaced cases where first-order recurrences are vectorized with
<vscale x 1 x ..>. If vscale is 1, then we are unable to extract the
penultimate value (second to last lane). Previously this case got
mis-compiled, trying to extract from an invalid lane (-1)
https://llvm.godbolt.org/z/3adzYYcf9.

Fixes https://github.com/llvm/llvm-project/issues/97452.
2024-07-04 11:44:51 +01:00
Kolya Panchenko
49e5cd2acc
[LV][NFC] Marked functions as const. Added LLVM_DEBUG. (#96681) 2024-06-26 17:38:18 -04:00
Florian Hahn
8681bb8bed
[LV] Add additional test coverage for cost modeling.
Add missing tests uncovered by
https://github.com/llvm/llvm-project/pull/92555.

Includes test for https://github.com/llvm/llvm-project/issues/96294 and
https://github.com/llvm/llvm-project/issues/96328
2024-06-26 10:18:01 +01:00
Florian Hahn
abf5969f76
[VPlan] Don't compute costs if there are no vector VPlans.
In some cases, no vector VPlans can be constructed due to failing VPlan
legality checks (e.g. unable to perform sinking for first order
recurrences or plans being incompatible with EVL).

There's no need to compute costs in those cases, so check directly if
there are no vector plans.
2024-06-24 08:38:31 +01:00
Florian Hahn
f0c674f680
[LV] Add test showing cost is computed when there are no vector plans.
Add test showing unnecessary cost computations, as no vector VPlans are
generated.
2024-06-24 08:08:56 +01:00
Florian Hahn
f1f3c34b47
Revert "Recommit "[VPlan] First step towards VPlan cost modeling. (#92555)""
This reverts commit 242cc200ccb24e22eaf54aed7b0b0c84cfc54c0b and
eea150c84053035163f307b46549a2997a343ce9, as it is causing a build bot
failure and there have been a number of crashes reported at
https://github.com/llvm/llvm-project/pull/92555
2024-06-21 19:54:21 +01:00
Florian Hahn
242cc200cc
Recommit "[VPlan] First step towards VPlan cost modeling. (#92555)"
This reverts commit 6f538f6a2d3224efda985e9eb09012fa4275ea92.

Extra tests for crashes discovered when building Chromium have been
added in fb86cb7ec157689e, 3be7312f81ad2.

Original message:
This adds a new interface to compute the cost of recipes, VPBasicBlocks,
VPRegionBlocks and VPlan, initially falling back to the legacy cost model
for all recipes. Follow-up patches will gradually migrate recipes to
compute their own costs step-by-step.

It also adds getBestPlan function to LVP which computes the cost of all
VPlans and picks the most profitable one together with the most
profitable VF.

The VPlan selected by the VPlan cost model is executed and there is an
assert to catch cases where the VPlan cost model and the legacy cost
model disagree. Even though I checked a number of different build
configurations on AArch64 and X86, there may be some differences
that have been missed.

Additional discussions and context can be found in @arcbbb's
https://github.com/llvm/llvm-project/pull/67647 and
https://github.com/llvm/llvm-project/pull/67934 which is an earlier
version of the current PR.

PR: https://github.com/llvm/llvm-project/pull/92555
2024-06-20 17:32:52 +01:00
Florian Hahn
3808ba78de
[VPlan] Model middle block via VPIRBasicBlock. (#95816)
Use VPIRBasicBlock to wrap the middle block and implement patching up
branches in predecessors in VPIRBasicBlock::execute. The IR middle block
is only created after skeleton creation. Initially a regular
VPBasicBlock is created, which will later be replaced by a
VPIRBasicBlock once the middle IR basic block has been created.

Note that this slightly changes the order of instructions created in the
middle block; code generated by recipe execution in the middle block
will now be inserted before the terminator (and in between the compare
to used by the terminator). The original order will be restored in
https://github.com/llvm/llvm-project/pull/92651.


PR: https://github.com/llvm/llvm-project/pull/95816
2024-06-20 13:42:20 +01:00