4613 Commits

Author SHA1 Message Date
Florian Hahn
88e9c56990
[LV] Don't adjust name of recurrence phi in scalar loop (NFC).
Adjusting the name of the recurrence phi in the scalar loop is a bit
inconsistent, as we do not adjust any other names in the scalar loops
(including other phis).

Remove this adjustment in preparation for
https://github.com/llvm/llvm-project/pull/94760/ and as discussed there.
2024-07-10 18:37:35 +01:00
Florian Hahn
b841e2eca3
Recommit "[VPlan] First step towards VPlan cost modeling. (#92555)"
This reverts commit 6f538f6a2d3224efda985e9eb09012fa4275ea92.

A number of crashes have been fixed by separate fixes, including
ttps://github.com/llvm/llvm-project/pull/96622. This version of the
PR also pre-computes the costs for branches (except the latch) instead
of computing their costs as part of costing of replicate regions, as
there may not be a direct correspondence between original branches and
number of replicate regions.

Original message:
This adds a new interface to compute the cost of recipes, VPBasicBlocks,
VPRegionBlocks and VPlan, initially falling back to the legacy cost model
for all recipes. Follow-up patches will gradually migrate recipes to
compute their own costs step-by-step.

It also adds getBestPlan function to LVP which computes the cost of all
VPlans and picks the most profitable one together with the most
profitable VF.

The VPlan selected by the VPlan cost model is executed and there is an
assert to catch cases where the VPlan cost model and the legacy cost
model disagree. Even though I checked a number of different build
configurations on AArch64 and X86, there may be some differences
that have been missed.

Additional discussions and context can be found in @arcbbb's
https://github.com/llvm/llvm-project/pull/67647 and
https://github.com/llvm/llvm-project/pull/67934 which is an earlier
version of the current PR.

PR: https://github.com/llvm/llvm-project/pull/92555
2024-07-10 14:22:21 +01:00
Alexey Bataev
3742c2a83c [SLP]Use stored signedness after minbitwidth analysis.
Need to used stored signedness info for the root node instead of
recalculating it after the vectorization, which may lead to a compiler
crash.
2024-07-10 03:58:00 -07:00
Han-Kuan Chen
ac299ed2c7
[SLP] Provide an universal interface for FixedVectorType::get. NFC. (#96845)
SLP vectorizes scalar type to vector type. In the future, we will try to
make SLP vectorizes vector type to vector type. We add a getWidenedType
as a helper function. For example, SLP will make the following code

%v0 = load i32, ptr %in0, align 4
%v1 = load i32, ptr %in1, align 4
%v2 = load i32, ptr %in2, align 4
%v3 = load i32, ptr %in3, align 4

into a load <4 x i32>. The ScalarTy is i32 and VF is 4. In the future,
SLP will make the following code

%v0 = load <4 x i32>, ptr %in0, align 4
%v1 = load <4 x i32>, ptr %in1, align 4
%v2 = load <4 x i32>, ptr %in2, align 4
%v3 = load <4 x i32>, ptr %in3, align 4

into a load <16 x i32>. The ScalarTy is <4 x i32> and VF is 4.

reference:
https://discourse.llvm.org/t/rfc-make-slp-vectorizer-revectorize-vector-instructions/79436
2024-07-10 11:50:35 +08:00
Alexey Bataev
af21bc1917 [SLP]Fix a crash on attempt to revectorize vectorized phi.
If the PHI node is vectorized during vectorization of its operands, no
need to try to vectorize its operands once again.
2024-07-09 14:11:08 -07:00
Florian Hahn
ef89e3efa9
[VPlan] Collect ephemeral values for VPlan.
Port collectEphemeralValues to VPlan as collectEphemeralRecipesForVPlan,
use it in willGenerateVectors. This fixes a regression caused by
29b8b72117 for loops where the only vector values are ephemeral.
2024-07-09 21:34:49 +01:00
Alexey Bataev
822a818786 [SLP][NFC]Add comments for the code, NFC. 2024-07-09 10:06:34 -07:00
Alexey Bataev
a988821123
[SLP]Keep the original order in the reductions.
The patch tries to keep the original order of the instruction in the
reductions. Previously, two first instructions were switched, giving
reverse order.
The first step to support of the ordered reductions.

Reviewers: RKSimon

Reviewed By: RKSimon

Pull Request: https://github.com/llvm/llvm-project/pull/98025
2024-07-09 12:26:42 -04:00
Florian Hahn
7346e7cc47
[VPlan] Update HCFG builder after 72937203dd3b to fix leak.
Update buildPlainCFG to re-use the vector and latch VPBBs created as
part of the initial skeleton in 72937203dd3b.

This should fix the leak sanitizer failure discovered by
https://lab.llvm.org/buildbot/#/builders/52/builds/619.
2024-07-09 15:28:43 +01:00
Alexey Bataev
2cba218ca5 [SLP]Fix PR98133: Inserting PHI after debug-records!
The phi-node-to-be-deleted still should be inserted as the first
instruction in the block to avoid random compiler crashes.

Fixes https://github.com/llvm/llvm-project/issues/98133
2024-07-09 05:44:45 -07:00
Simon Pilgrim
ef5b1ec0dd [VectorCombine] foldShuffleToIdentity - ensure casts have the same source type 2024-07-09 13:10:11 +01:00
Florian Hahn
c16e37867c
[VPlan] Clarify setting Lane in fixPhi (NFCI).
Split off from https://github.com/llvm/llvm-project/pull/94760, clarify
as suggested.
2024-07-09 13:09:40 +01:00
Florian Hahn
72937203dd
[VPlan] Create vector header and latch VPBBs in createInitialVPlan (NFC)
The empty header and latch blocks can be created together with the
vector loop region.

This is in preparation for splitting up the very large
tryToBuildVPlanWithVPRecipes into several distinct functions, as
suggested multiple times, including in
https://github.com/llvm/llvm-project/pull/94760
2024-07-09 12:41:12 +01:00
Florian Hahn
a2a0ef567c
[VPlan] Retrieve LatchVPBB from region in adjustRecipesForRed (NFC)
The HeaderVPBB is retrieved in a similar fashion already.

This is in preparation for splitting up the very large
tryToBuildVPlanWithVPRecipes into several distinct functions, as
suggested multiple times, including in
https://github.com/llvm/llvm-project/pull/94760
2024-07-09 10:48:43 +01:00
Alexey Bataev
f5ee07a1b5
[SLP]Improve instruction reordering mode detection.
The "instruction" reordering mode should be selected only if there are
compatible instructions in other operands, which can be reordered.
Otherwise, better to select splat reordering mode.

Metric: size..text

Program                                                                                                                                                size..text
                                                                                                                                                       results     results0    diff

test-suite :: External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 12383340.00 12383324.00 -0.0%

Some 4x operations get replaced by 8x.

Reviewers: RKSimon

Reviewed By: RKSimon

Pull Request: https://github.com/llvm/llvm-project/pull/97485
2024-07-08 16:01:55 -04:00
Florian Hahn
0577cdaa32
[LV] Split checking if tail-folding is possible, collecting masked ops. (#77612)
Introduce new canFoldTail helper which only checks if tail-folding is
possible, but without modifying MaskedOps.

Just because tail-folding is possible doesn't mean the tail will be
folded; that's up to the cost-model to decide. Separating the check if
tail-folding is possible and preparing for tail-folding makes sure that
MaskedOps is only populated when tail-folding is actually selected.

PR: https://github.com/llvm/llvm-project/pull/77612
2024-07-08 16:34:42 +01:00
Alexey Bataev
385118644c [SLP]Remove operands upon marking instruction for deletion.
If the instruction is marked for deletion, better to drop all its
operands and mark them for deletion too (if allowed). It allows to have
more vectorizable patterns and generate less useless extractelement
instructions.

Reviewers: RKSimon

Reviewed By: RKSimon

Pull Request: https://github.com/llvm/llvm-project/pull/97409
2024-07-08 07:56:48 -07:00
Simon Pilgrim
7e054c33d4 [VectorCombine] foldShuffleOfCastops - don't restrict to oneuse but compare total costs instead
Some casts (especially bitcasts but others as well) are incredibly cheap (or free), so don't limit the shuffle(cast(x),cast(y)) -> cast(shuffle(x,y)) to oneuse cases, but instead compare the total before/after costs of possibly repeating some casts.
2024-07-08 14:57:51 +01:00
Alexey Bataev
4c47b41771
[SLP]Allow matching and shuffling of extractelement vector operands with different VF.
Allows better codegen with the free resizing of small VF vector operands
and then regular shuffling of the operands of the same size and
simplifies the code.

Reviewers: RKSimon

Reviewed By: RKSimon

Pull Request: https://github.com/llvm/llvm-project/pull/97414
2024-07-08 09:27:08 -04:00
tcwzxx
c2fe75f99c
Make the logic for checking scatter vectorized nodes of GEP clearer (#97826)
There is no functional change.

Authored-by: zhizhixu <zhizhixu@tencent.com>
2024-07-08 06:08:04 -04:00
Florian Hahn
29b8b72117
[LV] Move check if any vector insts will be generated to VPlan. (#96622)
This patch moves the check if any vector instructions will be generated
from getInstructionCost to be based on VPlan. This simplifies
getInstructionCost, is more accurate as we check the final result and
also allows us to exit early once we visit a recipe that generates
vector instructions.

The helper can then be re-used by the VPlan-based cost model to match
the legacy selectVectorizationFactor behavior, this fixing a crash and
paving the way to recommit
https://github.com/llvm/llvm-project/pull/92555.

PR: https://github.com/llvm/llvm-project/pull/96622
2024-07-07 20:08:01 +01:00
Kazu Hirata
75bc20ff89
[llvm] Remove redundant calls to std::unique_ptr<T>::get (NFC) (#97914) 2024-07-07 08:23:41 +09:00
Florian Hahn
ac03ae30cf
[LV] Preserve LAA in LoopVectorize (NFCI).
LoopVectorize already always preserves DT, LI and SCEV. If any changes
get made to the CFG, cached LAA info for loops are cleared.

LoopAccessAnalysis also implements ::invalidate to clear the analysis if
SE, DT or LI gets invalidated. Hence it should be safe to preserve LAA
and save a small amount of compile-time.
2024-07-05 21:41:31 +01:00
Florian Hahn
eedc2c8cb2
[LV] Remove now obsolete DT updates of scalar exit block.
Remove manual DT updates of scalar exit blocks during legacy skeleton
creation, as they are not needed after 99d6c6d9365.

This fixes DT verification failures with expensive checks, including
https://lab.llvm.org/buildbot/#/builders/16/builds/1270.
2024-07-05 11:20:44 +01:00
Florian Hahn
99d6c6d936
[VPlan] Model branch cond to enter scalar epilogue in VPlan. (#92651)
This patch moves branch condition creation to enter the scalar epilogue
loop to VPlan. Modeling the branch in the middle block also requires
modeling the successor blocks. This is done using the recently
introduced VPIRBasicBlock.

Note that the middle.block is still created as part of the skeleton and
then patched in during VPlan execution. Unfortunately the skeleton needs
to create the middle.block early on, as it is also used for induction
resume value creation and is also needed to properly update the
dominator tree during skeleton creation.

After this patch lands, I plan to move induction resume value and phi
node creation in the scalar preheader to VPlan. Once that is done, we
should be able to create the middle.block in VPlan directly.

This is a re-worked version based on the earlier
https://reviews.llvm.org/D150398 and the main change is the use of
VPIRBasicBlock.

Depends on https://github.com/llvm/llvm-project/pull/92525

PR: https://github.com/llvm/llvm-project/pull/92651
2024-07-05 10:08:42 +01:00
Simon Pilgrim
b546096d94
[VectorCombine] foldShuffleToIdentity - handle bitcasts with equal element counts (#97731)
Basic initial patch for #96884 that just handles case where we bitcast between float/integers of the same element width
2024-07-05 09:47:42 +01:00
Florian Hahn
8299bfaf29
[VPlan] Extract reduction result insertion point to variable (NFCI).
Split off from https://github.com/llvm/llvm-project/pull/92651 as
suggested.
2024-07-04 16:25:49 +01:00
Florian Hahn
2b3b405b09
[LV] Don't vectorize first-order recurrence with VF <vscale x 1 x ..>
The assertion added as part of https://github.com/llvm/llvm-project/pull/93395
surfaced cases where first-order recurrences are vectorized with
<vscale x 1 x ..>. If vscale is 1, then we are unable to extract the
penultimate value (second to last lane). Previously this case got
mis-compiled, trying to extract from an invalid lane (-1)
https://llvm.godbolt.org/z/3adzYYcf9.

Fixes https://github.com/llvm/llvm-project/issues/97452.
2024-07-04 11:44:51 +01:00
Jon Roelofs
d3a76b03d8
[llvm][SLPVectorizer] Fix a bad cast assertion (#97621)
Fixes: rdar://128092379
2024-07-03 16:25:32 -07:00
Alexey Bataev
873c3f7e78 Revert "[SLP]Remove operands upon marking instruction for deletion."
This reverts commit bbd52dd44ceee80e3b6ba6a9b2bd8ee9a9713833 to fix
a crash revealed in https://lab.llvm.org/buildbot/#/builders/4/builds/505
2024-07-03 13:05:17 -07:00
Alexey Bataev
bbd52dd44c
[SLP]Remove operands upon marking instruction for deletion.
If the instruction is marked for deletion, better to drop all its
operands and mark them for deletion too (if allowed). It allows to have
more vectorizable patterns and generate less useless extractelement
instructions.

Reviewers: RKSimon

Reviewed By: RKSimon

Pull Request: https://github.com/llvm/llvm-project/pull/97409
2024-07-03 15:11:18 -04:00
Alexey Bataev
4eecf3c650
[SLP]Reorder buildvector/reduction vectorization and fuse the loops.
Currently SLP vectorizer tries at first to find reduction nodes, and
then vectorize buildvector sequences. Need to try to vectorize wide
buildvector sequences at first and only then try to vectorize
reductions, and then smaller buildvector sequences.

Reviewers: RKSimon

Reviewed By: RKSimon

Pull Request: https://github.com/llvm/llvm-project/pull/96943
2024-07-03 14:36:30 -04:00
Min-Yih Hsu
8b55d342b6
[RISCV][LoopIdiomVectorize] Support VP intrinsics in LoopIdiomVectorize (#94082)
Teach LoopIdiomVectorize to use VP intrinsics to replace the byte
compare loops. Right now only RISC-V uses LoopIdiomVectorize of this
style.
2024-07-02 18:48:28 -07:00
Min-Yih Hsu
de5ff38a0d
[LoopIdiomVectorize][NFC] Factoring out the part that handles vectorization strategy (#94682)
To pave the way for porting LIV to RISC-V, which uses VP intrinsics for
vectors.

NFC.
2024-07-02 18:21:41 -07:00
Gabriel Baraldi
380beaec86
Fix potential crash in SLPVectorizer caused by missing check (#95937)
I'm not super familiar with this code, but it seems that we were just
missing a check.

The original code that triggered this did not have uselistorders but
llvm-reduce created them and it reproduces the same issue in a way more
compact way.

Fixes https://github.com/llvm/llvm-project/issues/95016
2024-07-02 08:15:51 -04:00
Youngsuk Kim
2051736f7b [llvm][Transforms] Avoid 'raw_string_ostream::str' (NFC)
Since `raw_string_ostream` doesn't own the string buffer, it is
desirable (in terms of memory safety) for users to directly reference
the string buffer rather than use `raw_string_ostream::str()`.

Work towards TODO comment to remove `raw_string_ostream::str()`.
2024-06-30 09:03:29 -05:00
Alexey Bataev
d70963a762 [SLP]Fix the cost of the adjusted extracts in per-register analysis.
Previous patch did not pass the list of the extract indices by
reference, so the compiler just ignored them. Pass indices by reference
and fix the per-register analysis.

Reviewers: RKSimon

Reviewed By: RKSimon

Pull Request: https://github.com/llvm/llvm-project/pull/96808
2024-06-28 14:33:08 -07:00
Alexey Bataev
a9c12e481b Revert "[SLP]Fix the cost of the adjusted extracts in per-register analysis."
This reverts commit 784152056ea40a800a8fd9f4157a428dfb7a6de8 to fix
buildbots issues reported in
https://lab.llvm.org/buildbot/#/builders/4/builds/315 and https://lab.llvm.org/buildbot/#/builders/35/builds/481
2024-06-28 13:41:51 -07:00
Alexey Bataev
784152056e
[SLP]Fix the cost of the adjusted extracts in per-register analysis.
Previous patch did not pass the list of the extract indices by
reference, so the compiler just ignored them. Pass indices by reference
and fix the per-register analysis.

Reviewers: RKSimon

Reviewed By: RKSimon

Pull Request: https://github.com/llvm/llvm-project/pull/96808
2024-06-28 15:49:47 -04:00
David Green
76c8e1d857 [VectorCombine] Guard against the lane zero select predicate being scalar
All but the first lane was being checked, but this could leave the first lane
with a scalar select predicate. This just extends the check to make sure the
types are all the same
2024-06-28 17:27:16 +01:00
Nikita Popov
9df71d7673
[IR] Add getDataLayout() helpers to Function and GlobalValue (#96919)
Similar to https://github.com/llvm/llvm-project/pull/96902, this adds
`getDataLayout()` helpers to Function and GlobalValue, replacing the
current `getParent()->getDataLayout()` pattern.
2024-06-28 08:36:49 +02:00
Nikita Popov
2d209d964a
[IR] Add getDataLayout() helpers to BasicBlock and Instruction (#96902)
This is a helper to avoid writing `getModule()->getDataLayout()`. I
regularly try to use this method only to remember it doesn't exist...

`getModule()->getDataLayout()` is also a common (the most common?)
reason why code has to include the Module.h header.
2024-06-27 16:38:15 +02:00
Florian Hahn
06079233f8
[VPlan] Return std::nullopt early if plans are empty.
Fixes a crash caused by abf5969.
2024-06-27 12:25:59 +01:00
Kolya Panchenko
49e5cd2acc
[LV][NFC] Marked functions as const. Added LLVM_DEBUG. (#96681) 2024-06-26 17:38:18 -04:00
Alexey Bataev
6f582b7ed3 [SLP][NFC]Remove extra check for VU. 2024-06-26 05:39:37 -07:00
Alexey Bataev
0280f97b36 [SLP]Fix PR95925: extract vectorized index of the potential buildvector sequence.
If the vectorized scalar is not the insert value in the buildvector
sequence but the index, it should be always extracted.
2024-06-25 14:07:51 -07:00
Alexey Bataev
228c2e1473 [SLP]Fix incorrect promotion of nodes before shuffling.
If the base node is signed, but some values are unsigned, still the
whole node should be considered signed. Also, an extra bitwidth analysis
should be performed, when estimating the minimal bitwidth.
2024-06-25 13:39:28 -07:00
Han-Kuan Chen
de7c1396f2
[SLP] NFC. Refactor and add getAltInstrMask help function. (#94709)
Co-authored-by: Alexey Bataev <a.bataev@gmx.com>
2024-06-26 00:42:38 +08:00
Nikita Popov
8263bec533
[SLP] Use poison instead of undef in reorderScalars() (#96619)
-1 mask elements are specified to return poison rather than undef
nowadays , so update the reorderScalars() implementation to match.
2024-06-25 14:23:40 +02:00
Ramkumar Ramachandra
0f111ba790
LoopInfo: introduce Loop::getLocStr; unify debug output (#93051)
Introduce a Loop::getLocStr stolen from LoopVectorize's static function
getDebugLocString in order to have uniform debug output headers across
LoopVectorize, LoopAccessAnalysis, and LoopDistribute. The motivation
for this change is to have UpdateTestChecks recognize the headers and
automatically generate CHECK lines for debug output, with minimal
special-casing.
2024-06-25 13:12:15 +01:00