5358 Commits

Author SHA1 Message Date
Florian Hahn
b06a45c66f
[VPlan] Add all blocks to outer loop if present during ::execute (NFCI).
This ensures that all blocks created during VPlan execution are properly
added to an enclosing loop, if present.

Split off from https://github.com/llvm/llvm-project/pull/108378 and also
needed once more of the skeleton blocks are created directly via VPlan.

This also allows removing the custom logic for early-exit loop
vectorization added as part of
https://github.com/llvm/llvm-project/pull/117008.
2024-12-31 19:34:34 +00:00
Simon Pilgrim
b195bb87e1 [VectorCombine] scalarizeLoadExtract - consistently use LoadInst and ExtractElementInst specific operand getters. NFC
Noticed while investigating the hung builds reported after af83093933ca73bc82c33130f8bda9f1ae54aae2
2024-12-31 14:42:39 +00:00
Florian Hahn
ddef380cd6
[VPlan] Move simplifyRecipe(s) definitions up to allow re-use (NFC)
Move definitions to allow easy reuse in
https://github.com/llvm/llvm-project/pull/108378.
2024-12-31 13:23:19 +00:00
Muhammad Omair Javaid
332d2647ff Revert "[LV]: Teach LV to recursively (de)interleave. (#89018)"
This reverts commit ccfe0de0e1e37ed369c9bf89dd0188ba0afb2e9a.

This breaks LLVM build on AArch64 SVE Linux buildbots
https://lab.llvm.org/buildbot/#/builders/143/builds/4462
https://lab.llvm.org/buildbot/#/builders/17/builds/4902
https://lab.llvm.org/buildbot/#/builders/4/builds/4399
https://lab.llvm.org/buildbot/#/builders/41/builds/4299
2024-12-31 03:12:24 +05:00
Simon Pilgrim
d5a96eb125 Revert af83093933ca73bc82c33130f8bda9f1ae54aae2 "[VectorCombine] eraseInstruction - ensure we reattempt to fold other users of an erased instruction's operands"
Reports of hung builds, but I don't have time to investigate at the moment.
2024-12-30 21:20:56 +00:00
Simon Pilgrim
af83093933 [VectorCombine] eraseInstruction - ensure we reattempt to fold other users of an erased instruction's operands
As we're reducing the use count of the operands its more likely that they will now fold, as they were previously being prevented by a m_OneUse check, or the cost of retaining the extra instruction had been too high.

This is necessary for some upcoming patches, although the only change so far is instruction ordering as it allows some SSE folds of 256/512-bit with 128-bit subvectors to occur earlier in foldShuffleToIdentity as the subvector concats are free.

Pulled out of #120984
2024-12-30 17:52:42 +00:00
Florian Hahn
16d19aaedf
[VPlan] Manage created blocks directly in VPlan. (NFC) (#120918)
This patch changes the way blocks are managed by VPlan. Previously all
blocks reachable from entry would be cleaned up when a VPlan is
destroyed. With this patch, each VPlan keeps track of blocks created for
it in a list and this list is then used to delete all blocks in the list
when the VPlan is destroyed. To do so, block creation is funneled
through helpers in directly in VPlan.

The main advantage of doing so is it simplifies CFG transformations, as
those do not have to take care of deleting any blocks, just adjusting
the CFG. This helps to simplify
https://github.com/llvm/llvm-project/pull/108378 and
https://github.com/llvm/llvm-project/pull/106748.

This also simplifies handling of 'immutable' blocks a VPlan holds
references to, which at the moment only include the scalar header block.

PR: https://github.com/llvm/llvm-project/pull/120918
2024-12-30 12:08:12 +00:00
Florian Hahn
7f3428d3ed
[VPlan] Compute induction end values in VPlan. (#112145)
Use createDerivedIV to compute IV end values directly in VPlan, instead
of creating them up-front.

This allows updating IV users outside the loop as follow-up.

Depends on https://github.com/llvm/llvm-project/pull/110004 and
https://github.com/llvm/llvm-project/pull/109975.

PR: https://github.com/llvm/llvm-project/pull/112145
2024-12-29 19:05:08 +00:00
Simon Pilgrim
f2f02b21cd [VectorCombine] foldShuffleOfBinops - only accept exact matching cmp predicates
m_SpecificCmp allowed equivalent predicate+flags which don't necessarily work after being folded from "shuffle (cmpop), (cmpop)" into "cmpop (shuffle), (shuffle)"

Fixes #121110
2024-12-28 09:21:31 +00:00
Fangrui Song
edc42b2dc1 [SLP] Migrate away from PointerUnion::get 2024-12-27 21:01:09 -08:00
Zequan Wu
4d8f9594b2 Revert "Reland "[LoopVectorizer] Add support for partial reductions" (#120721)"
This reverts commit c858bf620c3ab2a4db53e84b9365b553c3ad1aa6 as it casuse optimization crash on -O2, see https://github.com/llvm/llvm-project/pull/120721#issuecomment-2563192057
2024-12-27 11:51:54 -08:00
Florian Hahn
8caeb2e0c2
[VPlan] Always create initial blocks in constructor (NFC).
Update C++ unit tests to use VPlanTestBase to construct initial VPlan,
using a constructor that creates the VP blocks directly in the
constructor.

Split off from and in preparation for
https://github.com/llvm/llvm-project/pull/120918.
2024-12-27 17:43:22 +00:00
Alexey Bataev
07ba457525 [SLP][NFC]Add dump of combined entries, where applicable 2024-12-27 07:56:10 -08:00
Hassnaa Hamdi
ccfe0de0e1
[LV]: Teach LV to recursively (de)interleave. (#89018)
Currently available intrinsics are only ld2/st2, which don't support interleaving factor > 2.
This patch teaches the LV to use ld2/st2 recursively to support high
interleaving factors.
2024-12-27 12:42:07 +00:00
Elvis Wang
47e1c87a61
[VPlan] Set debug location for VPReduction/VPWidenIntrinsicRecipe. (#120054)
This patch add missing debug location for
VPReduction/VPWidenIntrinsicRecipe.
2024-12-27 10:37:21 +08:00
Florian Hahn
2dfe1b4042
[VPlan] Remove stray space when printing reverse vector pointer.
printFlags() takes care of printing the required space, remove the extra
printed space between flags and operands.
2024-12-26 21:26:17 +00:00
Alexey Bataev
889215a30e [SLP]Followup fix for the poisonous logical op in reductions
If the VectorizedTree still may generate poisonous value, but it is not
the original operand of the reduction op, need to check if Res still the
operand, to generate correct code.

Fixes #114905
2024-12-26 05:11:26 -08:00
LiqinWeng
b5f0ec80d5
[VPlan] Remove redundant printing final in VPlan::execute (#121048)
Multiple prints will cause problems when testing ir-bb
2024-12-25 10:11:02 +08:00
Alexey Bataev
07d284d4eb
[SLP]Add cost estimation for gather node reshuffling
Adds cost estimation for the variants of the permutations of the scalar
values, used in gather nodes. Currently, SLP just unconditionally emits
shuffles for the reused buildvectors, but in some cases better to leave
them as buildvectors rather than shuffles, if the cost of such
buildvectors is better.

X86, AVX512, -O3+LTO
Metric: size..text

Program                                                                        size..text
                                                                               results     results0    diff
                 test-suite :: External/SPEC/CINT2006/445.gobmk/445.gobmk.test   912998.00   913238.00  0.0%
 test-suite :: MultiSource/Benchmarks/MiBench/consumer-lame/consumer-lame.test   203070.00   203102.00  0.0%
     test-suite :: External/SPEC/CFP2017speed/638.imagick_s/638.imagick_s.test  1396320.00  1396448.00  0.0%
      test-suite :: External/SPEC/CFP2017rate/538.imagick_r/538.imagick_r.test  1396320.00  1396448.00  0.0%
                       test-suite :: MultiSource/Benchmarks/Bullet/bullet.test   309790.00   309678.00 -0.0%
      test-suite :: External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 12477607.00 12470807.00 -0.1%

CINT2006/445.gobmk - extra code vectorized
MiBench/consumer-lame - small variations
CFP2017speed/638.imagick_s
CFP2017rate/538.imagick_r - extra vectorized code
Benchmarks/Bullet - extra code vectorized
CFP2017rate/526.blender_r - extra vector code

RISC-V, sifive-p670, -O3+LTO
CFP2006/433.milc - regressions, should be fixed by https://github.com/llvm/llvm-project/pull/115173
CFP2006/453.povray - extra vectorized code
CFP2017rate/508.namd_r - better vector code
CFP2017rate/510.parest_r - extra vectorized code
SPEC/CFP2017rate - extra/better vector code
CFP2017rate/526.blender_r - extra vectorized code
CFP2017rate/538.imagick_r - extra vectorized code
CINT2006/403.gcc - extra vectorized code
CINT2006/445.gobmk - extra vectorized code
CINT2006/464.h264ref - extra vectorized code
CINT2006/483.xalancbmk - small variations
CINT2017rate/525.x264_r - better vectorization

Reviewers: RKSimon

Reviewed By: RKSimon

Pull Request: https://github.com/llvm/llvm-project/pull/115201
2024-12-24 15:35:29 -05:00
Florian Hahn
2d038caeeb
[VPlan] Remove stray space when printing VPWidenCastRecipe.
printFlags() already takes care of printing a single space if there are
no flags. Remove the extra space when printing a recipe without flags.
2024-12-24 20:23:48 +00:00
Alexey Bataev
852feea820 [SLP]Propagate AssumptionCache where possible 2024-12-24 09:20:26 -08:00
Alexey Bataev
0d6cb0ae9d [SLP]Fix strict weak ordering criterion in comparators
Fixes #121019
2024-12-24 08:13:57 -08:00
Alexey Bataev
f0f8dab712 [SLP]Check if the first reduced value requires freeze/swap, if it may be too poisonous
If several reduced values are combined and the first reduced value is
just the original reduced value of the bool logical op, need to freeze
it to prevent the propagation of the poison value.

Fixes #114905
2024-12-24 07:40:35 -08:00
Sam Tebbs
c858bf620c
Reland "[LoopVectorizer] Add support for partial reductions" (#120721)
This re-lands the reverted #92418 

When the VF is small enough so that dividing the VF by the scaling
factor results in 1, the reduction phi execution thinks the VF is scalar
and sets the reduction's output as a scalar value, tripping assertions
expecting a vector value. The latest commit in this PR fixes that by
using `State.VF` in the scalar check, rather than the divided VF.

---------

Co-authored-by: Nicholas Guy <nicholas.guy@arm.com>
2024-12-24 12:08:17 +00:00
Alexey Bataev
030829a7e5 [SLP]Drop samesign flag if the vector node has reduced bitwidth
If the operands of the icmp instructions has reduced bitwidth after
MinBitwidth analysis, need to drop samesign flag to preserve correctness
of the transformation.

Fixes #120823
2024-12-23 16:55:11 -08:00
Benjamin Maxwell
9ab5474e56
[LV] Rename ToVectorTy to toVectorTy (NFC) (#120404)
This is for consistency with other helpers (and also follows the LLVM
naming conventions).
2024-12-23 23:33:44 +00:00
Florian Hahn
c7a777322d
[VPlan] Replace else-if dyn_cast with cast (NFC).
The recipes handled here are either VPWidenIntrinsic or VPWidenCast, so
replace the else-if dyn_cast with a single else + cast.
2024-12-23 19:46:22 +00:00
Simon Pilgrim
e3f8c229f5 [VectorCombine] foldInsExtVectorToShuffle - inserting into a poison base vector can be modelled as a single src shuffle
We already canonicalized an undef base vector to the RHS to improve further folding, this extends this to improve the shuffle cost estimate of the single src shuffle
2024-12-23 15:49:17 +00:00
Simon Pilgrim
29c89d7265
[VectorCombine] foldShuffleOfShuffles - fold "shuffle (shuffle x, y, m1), (shuffle y, x, m2)" -> "shuffle x, y, m3" (#120959)
foldShuffleOfShuffles currently only folds unary shuffles to ensure we don't end up with a merged shuffle with more than 2 sources, but this prevented cases where both shuffles were sharing sources.

This patch generalizes the merge process to find up to 2 sources as it merges with the inner shuffles, it also moves the undef/poison handling stages into the merge loop as well.

Fixes #120764
2024-12-23 14:56:15 +00:00
Han-Kuan Chen
11676da808
[SLP] Normalize debug messages for newTreeEntry. (#119514)
A debug message should follow after newTreeEntry.
Make ExtractValueInst and ExtractElementInst use setOperand directly.
2024-12-23 21:42:02 +08:00
LiqinWeng
b1fab4f849
[LV][VPlan] Initialize the variable 'VPID' of the createEVLRecipe (#120926)
Resolve the compilation error caused by the merge issue: #119510
2024-12-23 09:23:22 +08:00
LiqinWeng
8a51471d83
[LV][VPlan] Extract the implementation of transform Recipe to EVLRecipe into a small function. NFC (#119510) 2024-12-23 08:28:19 +08:00
Simon Pilgrim
bf873aa3ec [VectorCombine] foldShuffleToIdentity - add debug message for match
Helps with debugging to show to that the fold found the match.
2024-12-22 17:21:44 +00:00
Simon Pilgrim
f96337e04e [VectorCombine] foldConcatOfBoolMasks - add debug message for match + cost-comparison
Helps with debugging to show to that the fold found the match, and shows the old + new costs to indicate whether the fold was/wasn't profitable.
2024-12-22 16:21:02 +00:00
Florian Hahn
e1833e3a7e
[VPlan] Simplify redundant VPDerivedIVRecipe (NFC).
Split DerivedIV simplification off from
https://github.com/llvm/llvm-project/pull/112145 and use to remove the
need for extra checks in createScalarIVSteps. Required an extra
simplification run after IV transforms.
2024-12-22 09:39:19 +00:00
LiqinWeng
86fa35ce7e
[LV][VPlan] Use opcode to retrieve the VPID of the CallRecipe, rather than underlying instruction (#120816)
This patch may cause the flags in the CallRecipe to be lost after EVL
transformation, and it has been addressed in the patch: #119847
2024-12-22 10:28:20 +08:00
Florian Hahn
9b496deb90
[VPlan] Set and use debug location for VPPredInstPHIRecipe.
Update the recipe it always set its debug location and use it during IR
generation.
2024-12-21 21:57:47 +00:00
Florian Hahn
bb86c5dd4d
[VPlan] Use inferScalarType in VPInstruction::ResumePhi codegen (NFC).
Use VPlan-based type analysis to retrieve type of phi node. Also adds
missing type inference for ResumePhi and ComputeReductionResult opcodes.
2024-12-21 15:55:21 +00:00
vporpo
7a38445ee2
[SandboxVec][DAG] Register move instr callback (#120146)
This patch implements the move instruction notifier for the DAG.
Whenever an instruction moves the notifier will maintain the DAG.
2024-12-20 23:10:24 -08:00
Simon Pilgrim
82b5bda42c [VectorCombine] Add "VC: Erasing" debug message to help the log show when dead WorkList instructions are erased. 2024-12-20 17:59:14 +00:00
Simon Pilgrim
e3157d3f0d [VectorCombine] foldBitcastShuffle - add debug message for match + cost-comparison
Helps with debugging to show to that the fold found the match, and shows the old + new costs to indicate whether the fold was/wasn't profitable.
2024-12-20 17:59:13 +00:00
David Green
70eac255b8
[VectorCombine] Add fp cast handling for shuffletoidentity (#120641)
This fixes some regressions from recent changes to vector combine in
#120216. It allows shuffleToIdentity to look through fp casts as other
casts, and makes sure mismatching vector types in splats and casts do
not block the transform, as only the lanes should matter.
2024-12-20 15:05:08 +00:00
Simon Pilgrim
b87a5fb9fd [VectorCombine] Add "VC: Visiting" debug message to help the log show the instruction folding order. 2024-12-20 14:57:58 +00:00
Simon Pilgrim
5f0db7c112 [VectorCombine] Add "VECTORCOMBINE on <FUNCTION_NAME>" title debug message to help finding vectorcombine stages in the debug log 2024-12-20 13:32:49 +00:00
Simon Pilgrim
c5434804ee [VectorCombine] foldInsExtVectorToShuffle - add debug message for match + cost-comparison
Helps with debugging to show to that the fold found the match, and shows the old + new costs to indicate whether the fold was/wasn't profitable.
2024-12-20 13:32:49 +00:00
hanbeom
ff93ca7d6c
[VectorCombine] Combine scalar fneg with insert/extract to vector fneg when length is different (#120461)
insertelt DestVec, (fneg (extractelt SrcVec, Index)), Index
-> shuffle DestVec, (shuffle (fneg SrcVec), poison, SrcMask), Mask

Original combining left the combine between vectors of different lengths as a TODO. this commit do that. (see
#[baab4aa1ba])
2024-12-20 10:44:49 +00:00
Florian Hahn
5f096fd221
Revert "[LoopVectorizer] Add support for partial reductions (#92418)"
This reverts commit 060d62b48aeb5080ffcae1dc56e41a06c6f56701.

It looks like this is triggering an assertion when build llvm-test-suite
on ARM64 macOS.

Reproducer from MultiSource/Benchmarks/Ptrdist/bc/number.c

    target datalayout = "e-m:o-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-n32:64-S128-Fn32"
    target triple = "arm64-apple-macosx15.0.0"

    define void @test(i64 %idx.neg, i8 %0) #0 {
    entry:
      br label %while.body

    while.body:                                       ; preds = %while.body, %entry
      %n1ptr.0.idx131 = phi i64 [ %n1ptr.0.add, %while.body ], [ %idx.neg, %entry ]
      %n2ptr.0.idx130 = phi i64 [ %n2ptr.0.add, %while.body ], [ 0, %entry ]
      %sum.1129 = phi i64 [ %add99, %while.body ], [ 0, %entry ]
      %n1ptr.0.add = add i64 %n1ptr.0.idx131, 1
      %conv = sext i8 %0 to i64
      %n2ptr.0.add = add i64 %n2ptr.0.idx130, 1
      %1 = load i8, ptr null, align 1
      %conv97 = sext i8 %1 to i64
      %mul = mul i64 %conv97, %conv
      %add99 = add i64 %mul, %sum.1129
      %cmp94 = icmp ugt i64 %n1ptr.0.idx131, 0
      %cmp95 = icmp ne i64 %n2ptr.0.idx130, -1
      %2 = and i1 %cmp94, %cmp95
      br i1 %2, label %while.body, label %while.end.loopexit

    while.end.loopexit:                               ; preds = %while.body
      %add99.lcssa = phi i64 [ %add99, %while.body ]
      ret void
    }

    attributes #0 = { "target-cpu"="apple-m1" }

> opt -p loop-vectorize
Assertion failed: ((VF.isScalar() || V->getType()->isVectorTy()) && "scalar values must be stored as (0, 0)"), function set, file VPlan.h, line 284.
2024-12-19 21:46:51 +00:00
Finn Plummer
45c01e8a33
[NFC][TargetTransformInfo][VectorUtils] Consolidate isVectorIntrinsic... api (#117635)
- update `VectorUtils:isVectorIntrinsicWithScalarOpAtArg` to use TTI for
all uses, to allow specifiction of target specific intrinsics
- add TTI to the `isVectorIntrinsicWithStructReturnOverloadAtField` api
- update TTI api to provide `isTargetIntrinsicWith...` functions and
  consistently name them
- move `isTriviallyScalarizable` to VectorUtils
  
- update all uses of the api and provide the TTI parameter

Resolves #117030
2024-12-19 11:54:26 -08:00
Nicholas Guy
060d62b48a
[LoopVectorizer] Add support for partial reductions (#92418)
Following on from https://github.com/llvm/llvm-project/pull/94499, this
patch adds support to the Loop Vectorizer to emit the partial reduction
intrinsics where they may be beneficial for the target.

---------

Co-authored-by: Samuel Tebbs <samuel.tebbs@arm.com>
2024-12-19 11:42:40 +00:00
David Sherwood
c18fda02e1
[LoopVectorize] Use new single string variant of reportVectorizationFailure (#120414) 2024-12-19 10:07:13 +00:00