2074 Commits

Author SHA1 Message Date
Simon Pilgrim
d8cd8d56ea
[SLP] getSpillCost - fully populate IntrinsicCostAttributes to improve cost analysis. (#124129)
We were only constructing the IntrinsicCostAttributes with the arg type info, and not the args themselves, preventing more detailed cost analysis (constant / uniform args etc.)

Just pass the whole IntrinsicInst to the constructor and let it resolve everything it can.

Noticed while having yet another attempt at #63980
2025-01-23 16:57:13 +00:00
Alexey Bataev
ccd77953d0 [SLP][NFC]Add a test with potential alternate node, marked for minbitwidth size 2025-01-22 06:48:34 -08:00
Sushant Gokhale
c6c647588f
[SLP][NFC] Update test for PR #118055 (#122696)
This patch updates the motivating test for the above PR so that it does
not conflict with urem PR #122236
2025-01-22 03:28:34 -08:00
Alexey Bataev
184c056e35 [SLP][NFC]Update the test by replacing undefs with constant values, NFC 2025-01-21 08:43:33 -08:00
Alexey Bataev
5deb4ef9ab
[SLP]Initial non-power-of-2 (but still whole register) for remaining nodes
Added non-power-of-2 (but still whole registers) vectorization support
for nodes other than stores and reductions.

Reviewers: preames, RKSimon, hiraditya

Reviewed By: RKSimon

Pull Request: https://github.com/llvm/llvm-project/pull/113356
2025-01-21 10:33:03 -05:00
Alexey Bataev
7d01a8f2b9 [SLP]Fix vector factor for repeated node for bv
When adding a node vector, when it is used already in the shuffle for
buildvector, need to calculate vector factor from all vector, not only
this single vector, to avoid incorrect result. Also, need to increase
stability of the reused entries detection to avoid mismatch in cost
estimation/codegen.

Fixes #123639
2025-01-20 14:22:20 -08:00
Alexey Bataev
5e4c34a9b6 [SLP][NFC]Add a test with incorrect length and cost for repeated matching node 2025-01-20 14:17:15 -08:00
Alexey Bataev
2b1e037adb [SLP]Fix createInsertVector mask emission 2025-01-18 11:48:53 -08:00
Alexey Bataev
92a6eff62b [SLP][NFC]Fix the test to use poison and update to show the error 2025-01-18 11:46:04 -08:00
Alexey Bataev
55f7491dde [SLP][NFC]Add a test with incomplete insertion mask, NFC 2025-01-18 08:13:54 -08:00
Han-Kuan Chen
07d496538f
[SLP] Replace MainOp and AltOp in TreeEntry with InstructionsState. (#122443)
Add TreeEntry::hasState.
Add assert for getTreeEntry.
Remove the OpValue parameter from the canReuseExtract function.
Remove the Opcode parameter from the ComputeMaxBitWidth lambda function.
2025-01-18 10:23:20 +08:00
Alexey Bataev
ebfdd38228 [SLP][NFC]Replace undef with constant zero in tests, NFC 2025-01-17 09:48:03 -08:00
Alexey Bataev
98e2328451 [SLP][NFC]Add a test with non-power-of-2 gathered consecutive loads, NFC 2025-01-13 12:50:11 -08:00
Alexey Bataev
066b88879a [SLP]Correctly set vector operand for extracts with poisons
When extracts are vectorized and it has some poison values instead of
instructions, need to correctly set the vectorized operand not as
poison, but as a main vector operand of the main extract instruction.

Fixes #122583
2025-01-13 10:57:07 -08:00
Alexey Bataev
ae54617523 [SLP][NFC]Add a test with incorrect extractelement parameter after extending with poison 2025-01-13 10:32:11 -08:00
Alexey Bataev
092d628383 [SLP]Check for div/rem instructions before extending with poisons
Need to check if the instructions can be safely extended with poison
before actually doing this to avoid incorrect transformations.

Fixes #122691
2025-01-13 09:28:27 -08:00
Alexey Bataev
af524de1fa [SLP]Do not include subvectors for fully matched buildvectors
If the buildvector node fully matched another node, need to exclude
subvectors, when building final shuffle, just a shuffle of the original
node must be emitted.

Fixes #122584
2025-01-13 07:24:16 -08:00
Alexey Bataev
681c83a2f9 [SLP]Fix mask generation after cost estimation
When estimating the cost of entries shuffles for buildvectors, need to
rebuild original mask, not a generated submask, used for subregisters
analysis.

Fixes #122430
2025-01-10 09:32:35 -08:00
Alexey Bataev
3c9c94a24f Revert "[SLP]Fix mask generation after cost estimation"
This reverts commit 547ba9730bf05df3383150f730a689f2c8336206 to fix
buildbots reported in
https://lab.llvm.org/buildbot/#/builders/123/builds/11370, https://lab.llvm.org/buildbot/#/builders/133/builds/9492
2025-01-10 08:46:42 -08:00
Alexey Bataev
547ba9730b [SLP]Fix mask generation after cost estimation
When estimating the cost of entries shuffles for buildvectors, need to
rebuild original mask, not a generated submask, used for subregisters
analysis.

Fixes #122430
2025-01-10 08:17:56 -08:00
Alexey Bataev
920c58916a [SLP][NFC]Add a test with the mask translate after buildvector shuffle cost estimation 2025-01-10 08:12:03 -08:00
Alexey Bataev
5ff36748cf [SLP]Fix mask processing for reused gathered scalars
Need to sync the mask between cost and actual emission to avoid bugs in
mask calculation

Fixes #122324
2025-01-09 11:24:48 -08:00
Alexey Bataev
1160994602 [SLP]Fix a crash for very long GEP chains
Need to check if the GEP bases are equal and return false early. Also,
need to return false if the lookup is too deep, considering bases equal
too. Fixes a crash in the assertion.
2025-01-08 06:47:41 -08:00
David Green
a8dab1aa03
[AArch64] Add a subvector extract cost. (#121472)
These can generally be emitted using an ext instruction or mov from the
high half. The half half extracts can be free depending on the users,
but that is not handled here, just the basic costs. It originally
included all subvector extracts, but that was toned-down to just
half-vector extracts to try and help the mid end not breakup high/low
extracts without having the SLP vectorizer create a mess using other
shuffles.
2025-01-08 08:13:07 +00:00
Alexey Bataev
889215a30e [SLP]Followup fix for the poisonous logical op in reductions
If the VectorizedTree still may generate poisonous value, but it is not
the original operand of the reduction op, need to check if Res still the
operand, to generate correct code.

Fixes #114905
2024-12-26 05:11:26 -08:00
Alexey Bataev
07d284d4eb
[SLP]Add cost estimation for gather node reshuffling
Adds cost estimation for the variants of the permutations of the scalar
values, used in gather nodes. Currently, SLP just unconditionally emits
shuffles for the reused buildvectors, but in some cases better to leave
them as buildvectors rather than shuffles, if the cost of such
buildvectors is better.

X86, AVX512, -O3+LTO
Metric: size..text

Program                                                                        size..text
                                                                               results     results0    diff
                 test-suite :: External/SPEC/CINT2006/445.gobmk/445.gobmk.test   912998.00   913238.00  0.0%
 test-suite :: MultiSource/Benchmarks/MiBench/consumer-lame/consumer-lame.test   203070.00   203102.00  0.0%
     test-suite :: External/SPEC/CFP2017speed/638.imagick_s/638.imagick_s.test  1396320.00  1396448.00  0.0%
      test-suite :: External/SPEC/CFP2017rate/538.imagick_r/538.imagick_r.test  1396320.00  1396448.00  0.0%
                       test-suite :: MultiSource/Benchmarks/Bullet/bullet.test   309790.00   309678.00 -0.0%
      test-suite :: External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 12477607.00 12470807.00 -0.1%

CINT2006/445.gobmk - extra code vectorized
MiBench/consumer-lame - small variations
CFP2017speed/638.imagick_s
CFP2017rate/538.imagick_r - extra vectorized code
Benchmarks/Bullet - extra code vectorized
CFP2017rate/526.blender_r - extra vector code

RISC-V, sifive-p670, -O3+LTO
CFP2006/433.milc - regressions, should be fixed by https://github.com/llvm/llvm-project/pull/115173
CFP2006/453.povray - extra vectorized code
CFP2017rate/508.namd_r - better vector code
CFP2017rate/510.parest_r - extra vectorized code
SPEC/CFP2017rate - extra/better vector code
CFP2017rate/526.blender_r - extra vectorized code
CFP2017rate/538.imagick_r - extra vectorized code
CINT2006/403.gcc - extra vectorized code
CINT2006/445.gobmk - extra vectorized code
CINT2006/464.h264ref - extra vectorized code
CINT2006/483.xalancbmk - small variations
CINT2017rate/525.x264_r - better vectorization

Reviewers: RKSimon

Reviewed By: RKSimon

Pull Request: https://github.com/llvm/llvm-project/pull/115201
2024-12-24 15:35:29 -05:00
Alexey Bataev
f0f8dab712 [SLP]Check if the first reduced value requires freeze/swap, if it may be too poisonous
If several reduced values are combined and the first reduced value is
just the original reduced value of the bool logical op, need to freeze
it to prevent the propagation of the poison value.

Fixes #114905
2024-12-24 07:40:35 -08:00
Alexey Bataev
6bafbc99b0 [SLP][NFC]Add a test with incorrect (more poisnous) reduction chain 2024-12-24 07:33:40 -08:00
Alexey Bataev
030829a7e5 [SLP]Drop samesign flag if the vector node has reduced bitwidth
If the operands of the icmp instructions has reduced bitwidth after
MinBitwidth analysis, need to drop samesign flag to preserve correctness
of the transformation.

Fixes #120823
2024-12-23 16:55:11 -08:00
Simon Pilgrim
611401c115 [CostModel][X86] getShuffleCost - use processShuffleMasks to split SK_PermuteTwoSrc shuffles to legal types (#120599)
processShuffleMasks can now correctly handle 2 src shuffles, so we can use the existing SK_PermuteSingleSrc splitting cost logic to handle SK_PermuteTwoSrc as well and correctly recognise the number of active subvectors per legalised shuffle.
2024-12-20 10:39:45 +00:00
Simon Pilgrim
091448e3c1
Revert "[CostModel][X86] getShuffleCost - use processShuffleMasks to split SK_PermuteTwoSrc shuffles to legal types" (#120707)
Reverts llvm/llvm-project#120599 - some recent tests are currently failing
2024-12-20 10:06:03 +00:00
Simon Pilgrim
81e63f9e0c
[CostModel][X86] getShuffleCost - use processShuffleMasks to split SK_PermuteTwoSrc shuffles to legal types (#120599)
processShuffleMasks can now correctly handle 2 src shuffles, so we can use the existing SK_PermuteSingleSrc splitting cost logic to handle SK_PermuteTwoSrc as well and correctly recognise the number of active subvectors per legalised shuffle.
2024-12-20 09:55:11 +00:00
DianQK
e7a4d78ad3
[SLP] Check if instructions exist after vectorization (#120434)
Fixes #120433.
2024-12-19 06:21:57 +08:00
Alexander Kornienko
23a239267e
Revert "[InstCombine] Infer nuw for gep inbounds from base of object" (#120460)
Reverts llvm/llvm-project#119225 due to the lack of sanitizer support,
large potential of breaking code containing latent UB, non-trivial
localization and investigation, and what seems to be a bad interaction
with msan (a test is in the works).

Related discussions:
https://github.com/llvm/llvm-project/pull/119225#issuecomment-2551904822
https://github.com/llvm/llvm-project/pull/118472#issuecomment-2549986255
2024-12-18 19:06:34 +01:00
Alexey Bataev
0e11e19416 [SLP][NFC]Remove undef and update tests 2024-12-17 11:45:20 -08:00
Alexey Bataev
d1a7225076 [SLP]Check if the node must keep its original bitwidth
Need to check if during previous analysis the node has requested to keep
its original bitwidth to avoid incorrect codegen.

Fixes #120076
2024-12-16 08:01:22 -08:00
Alexey Bataev
c53901405a [SLP][NFC]Add a test with incorrect bitwidth for the node, previously identified as non-shrinkable 2024-12-16 07:50:49 -08:00
Han-Kuan Chen
3133acf1fb Revert "[SLP] Make getSameOpcode support different instructions if they have same semantics. (#112181)"
This reverts commit 82204154b7bd1f8c487c94c7ef00399d776b29f0.
2024-12-12 20:38:31 -08:00
Han-Kuan Chen
82204154b7
[SLP] Make getSameOpcode support different instructions if they have same semantics. (#112181) 2024-12-13 12:06:10 +08:00
Han-Kuan Chen
2546ae4ed0
[SLP][REVEC] Fix the number of elements in the mask of a ShuffleVectorInst is not a power of 2. (#119689)
The following shufflevector should not be vectorized when
slp-vectorize-non-power-of-2 is enabled.

shufflevector <8 x float> %1, <8 x float> poison, <3 x i32> <i32 0, i32
1, i32 2>
shufflevector <8 x float> %1, <8 x float> poison, <3 x i32> <i32 4, i32
5, i32 6>
2024-12-13 02:22:41 +08:00
Han-Kuan Chen
51a0c1bf25
[SLP] NFC. Replace TreeEntry::setOperandsInOrder with VLOperands. (#118949)
To reduce repeated code, TreeEntry::setOperandsInOrder will be replaced
by VLOperands.
Arg_size will be provided to make sure other operands will not be
reorderd when VL[0] is IntrinsicInst (because APO is a boolean value).
In addition, BoUpSLP::reorderInputsAccordingToOpcode will also be
removed since it is simple.
2024-12-11 10:09:23 +08:00
Alexey Bataev
a42aa8f265 [SLP]Fix adjusting of the mask for the fully matched nodes.
When checking for the poison elements in the matches node, need to
consider the register number, when clearing the corresponding mask
element.

Fixes #119393
2024-12-10 09:47:16 -08:00
Nikita Popov
e21ab4d16b
[InstCombine] Infer nuw for gep inbounds from base of object (#119225)
When we have a gep inbounds from the base of an object (e.g. alloca or
global), we know that the index cannot be negative, as this would go out
of bounds. As such, we can infer nuw as well.

The implementation is a bit stricter than necessary, we could also
accept one unknown index followed by known-non-negative indices.

Proof: https://alive2.llvm.org/ce/z/Hp7-6w (Note that alive2 currently
incorrectly doesn't require the inbounds for the alloca case, see
https://github.com/AliveToolkit/alive2/issues/1138).
2024-12-10 10:00:50 +01:00
Nikita Popov
10f315dc9c
[ConstantFolding] Infer getelementptr nuw flag (#119214)
Infer nuw from nusw and nneg. This is the constant expression variant of
https://github.com/llvm/llvm-project/pull/111144.

Proof: https://alive2.llvm.org/ce/z/ihztLy
2024-12-09 16:44:05 +01:00
Alexey Bataev
376dad72ab [SLP]Move resulting vector before inert point, if the late generated buildvector fully matched
If the perfect diamond match was detected for the postponed buildvectors
and the vector for the previous node comes after the current node, need
to move the vector register before the current inserting point to
prevent compiler crash.

Fixes #119002
2024-12-06 13:54:48 -08:00
Nikita Popov
f7685af4a5 [InstCombine] Move gep of phi fold into separate function
This makes sure that an early return during this fold doesn't end
up skipping later gep folds.
2024-12-05 15:20:56 +01:00
Nikita Popov
462cb3cd6c
[InstCombine] Infer nusw + nneg -> nuw for getelementptr (#111144)
If the gep is nusw (usually via inbounds) and the offset is
non-negative, we can infer nuw.

Proof: https://alive2.llvm.org/ce/z/ihztLy
2024-12-05 14:36:40 +01:00
Simon Pilgrim
85d15bd130
[TTI][X86] getMemoryOpCost - reduced costs when loading uniform values due to value reuse (#118642)
Similar to what we do for broadcast shuffles, when legalising load costs, if the value is known to be uniform, then we will only load a single vector and reuse this across the split legalised registers.

Fixes #111126
2024-12-04 16:36:00 +00:00
Simon Pilgrim
140df02aa2 [SLP][X86] Update test coverage for #111126
I'd copied the test case from #118016 instead of the original #111126 test case
2024-12-04 12:28:55 +00:00
Simon Pilgrim
2202f0e093 [SLP][X86] Add test coverage for #111126
This needs to be expanded to a wider range of tests but for now just focus on #111126
2024-12-04 10:03:43 +00:00