4829 Commits

Author SHA1 Message Date
Jie Fu
20fa37bbfa [Vectorize] Fix -Wunused-variable in SLPVectorizer.cpp (NFC)
/llvm-project/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:10310:26:
error: unused variable 'isExtractSubvectorMask' [-Werror,-Wunused-variable]
                    bool isExtractSubvectorMask =
                         ^
1 error generated.
2024-09-03 21:46:42 +08:00
Han-Kuan Chen
ce8ec31298
[SLP][REVEC] Support more mask pattern usage in shufflevector. (#106212) 2024-09-03 21:30:40 +08:00
Alexey Bataev
b74e09cb20 [SLP]Check for the whole vector vectorization in unique scalars analysis
Need to check that thr whole number of register is attempted to
vectorize before actually trying to build the node to avoid compiler
crash.
2024-09-03 06:19:21 -07:00
Florian Hahn
dd94537b40
[LV] Update call widening decision when scalarzing calls.
collectInstsToScalarize may decide to scalarize a call. If so, we have
to update the widening decision for the call, otherwise the call won't
be scalarized as expected during VPlan construction.

This issue was uncovered by f82543d509.
2024-09-03 14:12:41 +01:00
Alexey Bataev
f381cd0699 [SLP]Fix PR107036: Check if the type of the user is sizable before requesting its size.
Only some instructions should be considered as potentially reducing the
size of the operands types, not all instructions should be considered.

Fixes https://github.com/llvm/llvm-project/issues/107036
2024-09-03 05:29:59 -07:00
Florian Hahn
954ed05c10
[VPlan] Simplify MUL operands at recipe construction.
This moves the logic to create simplified operands using SCEV to MUL
recipe creation. This is needed to match the behavior of the legacy's cost
model. TODOs are to extend to other opcodes and move to a transform.

Note that this also restricts the number of SCEV simplifications we
apply to more precisely match the cases handled by the legacy cost
model.

Fixes https://github.com/llvm/llvm-project/issues/107015.
2024-09-02 21:25:31 +01:00
Florian Hahn
50a02e7c68
[VPlan] Pass intrinsic inst to TTI in VPWidenCallRecipe::computeCost.
Follow-up to 9ccf825, adjust computeCost to also pass IntrinsicInst to
TTI if available, as there are multiple places in TTI which use the
IntrinsicInst.

Fixes https://github.com/llvm/llvm-project/issues/107016.
2024-09-02 20:47:37 +01:00
Florian Hahn
b0de7fa466
[VPlan] Use op from underlying call in computeCost if needed.
This fixes a divergence between legacy and VPlan-based cost model, e.g.
if one of the operands has an first-order recurrence phi as operand.
2024-09-02 14:00:10 +01:00
David Sherwood
dc6c3ba4c4
[NFC][IR] Add CreateCountTrailingZeroElems helper (#106711)
The LoopIdiomVectorize pass already creates calls to the intrinsic
experimental_cttz_elts, but PR #88385 will start calling this more
too so I've created a helper for it.
2024-09-02 13:40:14 +01:00
Florian Hahn
654bb4e9f2
[LV] Don't consider branches leaving loop in collectValuesToIgnore.
Branches exiting the loop will remain regardless, so don't consider them
in collectValuesToIgnore.

This fixes another divergence between legacy and VPlan-based cost model.

Fixes https://github.com/llvm/llvm-project/issues/106780.
2024-09-01 20:35:36 +01:00
Florian Hahn
9ccf82543d
[VPlan] Implement VPWidenCallRecipe::computeCost (NFCI). (#106047)
Implement cost computation for VPWidenCallRecipe. In some cases, targets
use argument info to compute intrinsic costs. If all operands of the
call are VPValues with an underlying IR value, use the IR values as
arguments.

PR: https://github.com/llvm/llvm-project/pull/106731
2024-09-01 16:26:08 +01:00
Alexey Bataev
6e68fa921b [SLP]Fix PR106909: add a check for unsafe FP operations.
NEON has non-IEEE compliant denormal flushing and the compiler should
check if it safe to vectorize instructions for NEON in non-fast math
mode.

Fixes https://github.com/llvm/llvm-project/issues/106909
2024-09-01 07:10:09 -07:00
tcwzxx
24a043a6ff
[SLP] Fix crash of shuffle poison (#106857)
When the shuffle masks are `PoisonMaskElem`, there is not need to check
the cost of `SK_ExtractSubvector`. It is free. Otherwise, it will cause
the compiler to crash.

Assertion `(Idx + EltsPerVector) <= alignTo(NumElts, EltsPerVector) &&
"SK_ExtractSubvector index out of range"' failed.
2024-09-01 20:24:09 +08:00
Alexey Bataev
a3ea90ffbb [SLP]Initial support for non-power-of-2 (but still whole register) number of elements in operands.
Patch adds basic support for non-power-of-2 number of elements in
operands. The patch still requires that this number addresses whole
registers.

Reviewers: RKSimon

Reviewed By: RKSimon

Pull Request: https://github.com/llvm/llvm-project/pull/106449
2024-08-31 08:14:49 -07:00
Martin Storsjö
9e86d4f2ed Revert "[SLP]Initial support for non-power-of-2 (but still whole register) number of elements in operands."
This reverts commit 6ab07d71174982e5cb95420ee4df01347333c342.

This commit caused failed asserts, see
https://github.com/llvm/llvm-project/pull/106449.
2024-08-31 14:53:08 +03:00
Philip Reames
c53008de89 [VPlan] Manually jumpthread a bit of reduction code for readability [nfc] 2024-08-30 12:46:49 -07:00
Alexey Bataev
6ab07d7117
[SLP]Initial support for non-power-of-2 (but still whole register) number of elements in operands.
Patch adds basic support for non-power-of-2 number of elements in
operands. The patch still requires that this number addresses whole
registers.

Reviewers: RKSimon

Reviewed By: RKSimon

Pull Request: https://github.com/llvm/llvm-project/pull/106449
2024-08-30 14:50:34 -04:00
Alexey Bataev
8a267b7211 [SLP][NFC]Remove unused variable 2024-08-30 11:44:29 -07:00
Alexey Bataev
079746d2c0
[SLP]Better cost estimation for masked gather or "clustered" loads.
After landing support for actual vectorization of the "clustered" loads,
need better estimate the cost between the masked gather and clustered loads.
This includes estimation of the address calculation and better
estimation of the gathered loads. Also, this estimation now relies on
SLPCostThreshold option, allowing modify the behavior of the compiler.

Reviewers: RKSimon

Reviewed By: RKSimon

Pull Request: https://github.com/llvm/llvm-project/pull/105858
2024-08-30 14:27:51 -04:00
Alexey Bataev
6023d17e6b [SLP][NFC]Add a function description, NFC. 2024-08-30 10:35:10 -07:00
Alexey Bataev
a4aa6bc8fc [SLP]Fix PR106667: carefully look for operand nodes.
If the operand node has the same scalars as one of the vectorized nodes,
the compiler could miss this and incorrectly request minbitwidth data
for the wrong node. It may lead to a compiler crash, because the
  vectorized node might have different minbw result.

Fixes https://github.com/llvm/llvm-project/issues/106667
2024-08-30 10:19:27 -07:00
Simon Pilgrim
b719c92551 [SLP] findBestRootPair - fix incorrect argument name comment. NFC. 2024-08-30 14:45:48 +01:00
Simon Pilgrim
96ad495289 [SLP] vectorizeChainsInBlock - remove superfluous continue at the end of for loop. NFC. 2024-08-30 14:45:48 +01:00
Paul Walker
ce5620ba9a
[LLVM][VPlan] Pick more optimal initial value for VPBlend. (#104019)
By choosing an initial value whose mask is only used by the blend we can
remove the need for the mask entirely.
2024-08-30 13:30:23 +01:00
Alexey Bataev
87a988e881 [SLP]Fix PR106655: Use FinalShuffle for alternate cast nodes.
Need to use FinalShuffle function for all vectorized results to
correctly produce vectorized value.

Fixes https://github.com/llvm/llvm-project/issues/106655
2024-08-30 05:18:21 -07:00
Florian Hahn
f0e34f3818
[VPlan] Don't skip optimizable truncs in planContainsAdditionalSimps.
A optimizable cast can also be removed by VPlan simplifications. Remove
the restriction from planContainsAdditionalSimplifications, as this
causes it to miss relevant simplifications, triggering false positives
for the cost decision verification.

Also adds debug output for printing additional cost-precomputations.

Fixes https://github.com/llvm/llvm-project/issues/106641.
2024-08-30 11:29:30 +01:00
Alexey Bataev
cc943a67d1 [SLP]Fix PR106626: trye several attempts for lookup values, if not found.
If the value is used in Scalar several times, the first attempt to find
its position in the node (if ReuseShuffleIndices and ReorderIndices not
empty) may fail. In this case need to find another copy of the same
value and try again.
Fixes https://github.com/llvm/llvm-project/issues/106626
2024-08-29 15:07:20 -07:00
Florian Hahn
c4906588ce
[VPlan] Use skipCostComputation when pre-computing induction costs.
This ensures we skip any instructions identified to be ignored by the
legacy cost model as well. Fixes a divergence between legacy and
VPlan-based cost model.

Fixes https://github.com/llvm/llvm-project/issues/106417.
2024-08-29 21:20:00 +01:00
Alexey Bataev
aeedab77b5 [SLP]Correctly decide if the non-power-of-2 number of stores can be vectorized.
Need to consider the maximum type size in the graph before doing attempt
for the vectorization of non-power-of-2 number of elements, which may be
  less than MinVF.
2024-08-29 12:40:31 -07:00
Philip Reames
4bc7c74240
[SLP] Extract isIdentityOrder to common routine [probably NFC] (#106582)
This isn't quite just code motion as the four different versions we had
of this routine differed in whether they ignored the "size" marker used
to represent undef. I doubt this matters in practice, but it is a
functional change.

---------

Co-authored-by: Alexey Bataev <a.bataev@gmx.com>
2024-08-29 11:00:31 -07:00
Philip Reames
b5a1b45fe3 [SLP] Early return in getReorderingData [nfc] 2024-08-29 08:58:27 -07:00
Alexey Bataev
50515db57f [SLP][NFC]Format canVectorizeLoads after previous NFC patches. 2024-08-29 04:31:13 -07:00
Florian Hahn
0a272d3a17
[LV] Use SCEV to analyze second operand for cost query.
Improve operand analysis using SCEV for cost purposes. This fixes a
divergence between legacy and VPlan-based cost-modeling after
533e6bbd0d34.

Fixes https://github.com/llvm/llvm-project/issues/106248.
2024-08-29 12:08:27 +01:00
Alexey Bataev
fdf72c992b [SLP]Fix a crash when requestin the cost for buildvector cmp nodes types.
Need to use original cmp type i1 when estimating the cost for the
buildvector node, not its operand types to prevent compiler crash upon
TTI cost estimation.
2024-08-29 03:53:28 -07:00
tcwzxx
121fb2c2cc
[SLP] Fix the Vec lane overridden by the shuffle mask (#106341)
Currently, SLP uses shuffle for the external user of `InsertElementInst`
and iterates through the `InsertElementInst` chain to fill the mask with
constant indices. However, it may override the original Vec lane. Using
the original Vec lane is sufficient.
2024-08-29 11:18:26 +08:00
Michael Maitland
18c79ca360 [LV][NFC] Remove unnecessary space in comment 2024-08-28 14:23:44 -07:00
Alexey Bataev
ec360d6523 [SLP][NFC]Add getValueType function and use instead of complex scalar type analysis 2024-08-28 13:02:59 -07:00
Florian Hahn
4b84288f00
[VPlan] Pass live-ins used as exit values straight to live-out.
Live-ins that are used as exit values don't need to be extracted, they
can be passed through directly. This fixes a crash when trying to
extract from a live-in.

Fixes https://github.com/llvm/llvm-project/issues/106257.
2024-08-28 19:12:05 +01:00
Florian Hahn
16910a21ee
[VPlan] Move logic to create interleave groups to VPlanTransforms (NFC).
This is a step towards further breaking up the rather large
tryToBuildVPlanWithVPRecipes. It moves logic create interleave groups to
VPlanTransforms.cpp, where similar replacements for other recipes are
defined as well (e.g. EVL-based ones)
2024-08-28 15:56:09 +01:00
Florian Hahn
96e1320a9a
[VPlan] Move properlyDominates to VPDominatorTree (NFCI).
This allows for easier re-use in additional places in the future. Also
move code to VPlanAnalysis.cpp
2024-08-28 13:58:12 +01:00
Ramkumar Ramachandra
71ede8d831
VPlan: factor out VPlanUtils into its own file (NFC) (#105857) 2024-08-28 13:54:41 +01:00
Philip Reames
ee764a2603
[SLP] Remove -slp-optimize-identity-hor-reduction-ops option (#106238)
This code has been unchanged for two years; let's simplify the code
and remove configurability which makes the code harder to follow.
2024-08-27 13:21:57 -07:00
Philip Reames
6a74b0ee59 [SLP] Use early-return in canVectorizeLoads [nfc] 2024-08-27 12:30:15 -07:00
Philip Reames
ed03070eb3
[SLP] Support vectorizing 2^N-1 reductions (#106266)
Build on the -slp-vectorize-non-power-of-2 experimental option, and
support vectorizing reductions with 2^N-1 sized vector.

Specifically, two related changes:
1) When searching for a profitable VL, start with the 2^N-1 reduction
width.
If cost model does not select that VL, return to power of two boundaries
   when halfing the search VL.  The later is mostly for simplicity.
2) Reduce the minimum reduction width from 4 to 3 when supporting
non-power
   of two vectors.  This is required to support <3 x Ty> cases.

One thing which isn't directly related to this change, but I want to
note for clarity is that the non-power-of-two vectorization appears to
be sensative to operand order of reduction. I haven't yet fully figured
out why, but I suspect this is non-power-of-two specific.
2024-08-27 12:27:03 -07:00
Alexey Bataev
2dbc6d4d4b [SLP][NFC]Assert total number of scalar uses not less than number of scalar uses, NFC. 2024-08-27 09:57:08 -07:00
Danial Klimkin
9671ed1afc
Revert "LSV: forbid load-cycles when vectorizing; fix bug (#104815)" (#106245)
This reverts commit c46b41aaa6eaa787f808738d14c61a2f8b6d839f.

Multiple tests time out, either due to performance hit (see comment) or
a cycle.
2024-08-27 18:45:22 +02:00
Philip Reames
d0a6434e86 [SLP] Reduce scope of variable using if clause [NFC]
This particular variable name is shadowed by another lower in the
function, so reducing it's scope to it's single use removes the
shadowing and makes the code much less error prone.
2024-08-27 09:14:30 -07:00
Alexey Bataev
9b408961eb [SLP][NFC]Use has_single_bit instead of isPowerOf2 functions, NFC. 2024-08-27 08:21:19 -07:00
Alexey Bataev
9b4a8f44ed [SLP][NFC]Improve auto types, NFC. 2024-08-27 06:11:08 -07:00
Han-Kuan Chen
3d1c63ee2c
[SLP][REVEC] Expand getelementptr into vector form. (#103704) 2024-08-27 16:11:52 +08:00