5312 Commits

Author SHA1 Message Date
Florian Hahn
5f096fd221
Revert "[LoopVectorizer] Add support for partial reductions (#92418)"
This reverts commit 060d62b48aeb5080ffcae1dc56e41a06c6f56701.

It looks like this is triggering an assertion when build llvm-test-suite
on ARM64 macOS.

Reproducer from MultiSource/Benchmarks/Ptrdist/bc/number.c

    target datalayout = "e-m:o-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-n32:64-S128-Fn32"
    target triple = "arm64-apple-macosx15.0.0"

    define void @test(i64 %idx.neg, i8 %0) #0 {
    entry:
      br label %while.body

    while.body:                                       ; preds = %while.body, %entry
      %n1ptr.0.idx131 = phi i64 [ %n1ptr.0.add, %while.body ], [ %idx.neg, %entry ]
      %n2ptr.0.idx130 = phi i64 [ %n2ptr.0.add, %while.body ], [ 0, %entry ]
      %sum.1129 = phi i64 [ %add99, %while.body ], [ 0, %entry ]
      %n1ptr.0.add = add i64 %n1ptr.0.idx131, 1
      %conv = sext i8 %0 to i64
      %n2ptr.0.add = add i64 %n2ptr.0.idx130, 1
      %1 = load i8, ptr null, align 1
      %conv97 = sext i8 %1 to i64
      %mul = mul i64 %conv97, %conv
      %add99 = add i64 %mul, %sum.1129
      %cmp94 = icmp ugt i64 %n1ptr.0.idx131, 0
      %cmp95 = icmp ne i64 %n2ptr.0.idx130, -1
      %2 = and i1 %cmp94, %cmp95
      br i1 %2, label %while.body, label %while.end.loopexit

    while.end.loopexit:                               ; preds = %while.body
      %add99.lcssa = phi i64 [ %add99, %while.body ]
      ret void
    }

    attributes #0 = { "target-cpu"="apple-m1" }

> opt -p loop-vectorize
Assertion failed: ((VF.isScalar() || V->getType()->isVectorTy()) && "scalar values must be stored as (0, 0)"), function set, file VPlan.h, line 284.
2024-12-19 21:46:51 +00:00
Finn Plummer
45c01e8a33
[NFC][TargetTransformInfo][VectorUtils] Consolidate isVectorIntrinsic... api (#117635)
- update `VectorUtils:isVectorIntrinsicWithScalarOpAtArg` to use TTI for
all uses, to allow specifiction of target specific intrinsics
- add TTI to the `isVectorIntrinsicWithStructReturnOverloadAtField` api
- update TTI api to provide `isTargetIntrinsicWith...` functions and
  consistently name them
- move `isTriviallyScalarizable` to VectorUtils
  
- update all uses of the api and provide the TTI parameter

Resolves #117030
2024-12-19 11:54:26 -08:00
Nicholas Guy
060d62b48a
[LoopVectorizer] Add support for partial reductions (#92418)
Following on from https://github.com/llvm/llvm-project/pull/94499, this
patch adds support to the Loop Vectorizer to emit the partial reduction
intrinsics where they may be beneficial for the target.

---------

Co-authored-by: Samuel Tebbs <samuel.tebbs@arm.com>
2024-12-19 11:42:40 +00:00
David Sherwood
c18fda02e1
[LoopVectorize] Use new single string variant of reportVectorizationFailure (#120414) 2024-12-19 10:07:13 +00:00
DianQK
e7a4d78ad3
[SLP] Check if instructions exist after vectorization (#120434)
Fixes #120433.
2024-12-19 06:21:57 +08:00
Florian Hahn
5ca3794e82
[VPlan] Move initial VPlan block creation to constructor. (NFC)
This sets up the initial blocks needed to initialize a VPlan directly
in the constructor. This will allow tracking of all created blocks
directly in VPlan, simplifying block deletion.
2024-12-18 22:00:30 +00:00
Florian Hahn
6910aec097
[VPlan] Don't use VPlan ctor taking trip count in most unit tests (NFC).
Update tests to use constructor not passing a trip count VPValue. The
tests don't need that and are simpler as a result.
2024-12-18 19:57:09 +00:00
Florian Hahn
0e8d022ffe
[VPlan] Handle exit phis with multiple operands in addUsersInExitBlocks. (#120260)
Currently the addUsersInExitBlocks incorrectly assumes exit phis only
have a single operand, which may not be the case for loops with early
exits when they share a common exit block.

Also further relax the assertion in fixupIVUsers to allow exit values if
they come from theloop latch/middle.block.

PR: https://github.com/llvm/llvm-project/pull/120260
2024-12-18 14:47:16 +00:00
Simon Pilgrim
fbc18b85d6
Revert "[VectorCombine] Combine scalar fneg with insert/extract to vector fneg when length is different" (#120422)
Reverts llvm/llvm-project#115209 - investigating a reported regression
2024-12-18 13:32:53 +00:00
David Sherwood
13107cb094
[LoopVectorize] Enable more early exit vectorisation tests (#117008)
PR #112138 introduced initial support for dispatching to
multiple exit blocks via split middle blocks. This patch
fixes a few issues so that we can enable more tests to use
the new enable-early-exit-vectorization flag. Fixes are:

1. The code to bail out for any loop live-out values happens
too late. This is because collectUsersInExitBlocks ignores
induction variables, which get dealt with in fixupIVUsers.
I've moved the check much earlier in processLoop by looking
for outside users of loop-defined values.
2. We shouldn't yet be interleaving when vectorising loops
with uncountable early exits, since we've not added support
for this yet.
3. Similarly, we also shouldn't be creating vector epilogues.
4. Similarly, we shouldn't enable tail-folding.
5. The existing implementation doesn't yet support loops
that require scalar epilogues, although I plan to add that
as part of PR #88385.
6. The new split middle blocks weren't being added to the
parent loop.
2024-12-18 09:25:45 +00:00
hanbeom
b7a8d9584c
[VectorCombine] Combine scalar fneg with insert/extract to vector fneg when length is different (#115209)
insertelt DestVec, (fneg (extractelt SrcVec, Index)), Index
-> shuffle DestVec, (shuffle (fneg SrcVec), poison, SrcMask), Mask

Original combining left the combine between vectors of different lengths as a TODO.
2024-12-18 07:47:42 +00:00
Luke Lau
c2a879ecaa
[VPlan] Fix VPTypeAnalysis cache clobbering in EVL transform (#120252)
When building SPEC CPU 2017 with RISC-V and EVL tail folding, this
assertion in VPTypeAnalysis would trigger during the transformation to
EVL recipes:


d8a0709b10/llvm/lib/Transforms/Vectorize/VPlanAnalysis.cpp (L135-L142)

It was caused by this recipe:

```
WIDEN ir<%shr> = vp.or ir<%add33>, ir<0>, vp<%6>
```

Having its type inferred as i16, when ir<%add33> and ir<0> had inferred
types of i32 somehow.

The cause of this turned out to be because the VPTypeAnalysis cache was
getting clobbered: In this transform we were erasing recipes but keeping
around the same mapping from VPValue* to Type*. In the meantime, new
recipes would be created which would have the same address as the old
value. They would then incorrectly get the old erased VPValue*'s cached
type:

```
  --- before ---
  0x600001ec5030: WIDEN ir<%mul21.neg> = vp.mul vp<%11>, ir<0>, vp<%6>
  0x600001ec5450: <badref> <- some value that was erased
  --- after ---
  0x600001ec5030: WIDEN ir<%mul21.neg> = vp.mul vp<%11>, ir<0>, vp<%6>
  0x600001ec5450: WIDEN ir<%shr> = vp.or ir<%add33>, ir<0>, vp<%6>  <- a new value that happens to have the same address
```

This fixes this by deferring the erasing of recipes till after the
transformation.

The test case might be a bit flakey since it just happens to have the
right conditions to recreate this. I tried to add an assert in
inferScalarType that every VPValue in the cache was valid, but couldn't
find a way of telling if a VPValue had been erased.

---------

Co-authored-by: Florian Hahn <flo@fhahn.com>
2024-12-18 11:28:28 +08:00
Luke Lau
4a7f60d328
[VPlan] Handle VPWidenCastRecipe without underlying value in EVL transform (#120194)
This fixes a crash that shows up when building SPEC CPU 2017 with EVL
tail folding on RISC-V.

A VPWidenCastRecipe doesn't always have an underlying value, and in the
case of this crash this happens whenever a widened cast is created via
truncateToMinimalBitwidths.

Fix this by just using the opcode stored in the recipe itself.

I think a similar issue exists with VPWidenIntrinsicRecipe and how it's
widened, but I haven't run into any crashes with it just yet.
2024-12-18 11:28:07 +08:00
Florian Hahn
eb59fe8d04
[VPlan] Remove redundant assignment in VPReductionPHIRecipe (NFC)
Suggested post-commit for 0e528ac404e13ed2d952a2d83aaf8383293c851e.
2024-12-17 21:32:40 +00:00
Florian Hahn
4ad0fdd163
[VPlan] Remove reverse() of predecessors from VPInstruction::generate.
This was originally done to reduce the diff for the change. Remove it
and update the remaining tests. NFC modulo reordering of incoming
values.

Clean up after https://github.com/llvm/llvm-project/pull/114292.
2024-12-17 20:44:32 +00:00
Simon Pilgrim
5287299f88
[VectorCombine] foldShuffleOfBinops - prefer same cost fold if it reduces instruction count (#120216)
We don't fold "shuffle (binop), (binop)" -> "binop (shuffle), (shuffle)" if the old/new costs are equal, but we can relax this if either new shuffle will constant fold as it will reduce instruction count.
2024-12-17 18:10:20 +00:00
Nikita Popov
1157187496
[VPlan] Propagate all GEP flags (#119899)
Store GEPNoWrapFlags instead of only InBounds and propagate them.
2024-12-17 13:48:50 +01:00
Florian Hahn
58cfa39861
[VPlan] Remove legacy VPlan() constructors (NFC).
The constructors were retained to reduce the diff during transition.

Remove them now.
2024-12-17 08:22:22 +00:00
Luke Lau
fba3e069b4
[VPlan] Remove overlapping VPInstruction::mayWriteToMemory. NFCI (#120039)
VPInstruction has a definition of mayWriteToMemory, which seems to only
be used by VPlanSLP. However VPInstructions are already handled in
VPRecipeBase::mayWriteToMemory, and everywhere else seems to use this
definition. I think these should be the same for all intents and
purposes. The VPRecipeBase definition is more conservative but returns
true for stores/calls/invokes/SLPStores.
2024-12-17 11:02:55 +08:00
Florian Hahn
0e528ac404
[VPlan] Use start value operand for FindLastIV reduction phis.
Update VPReductionPHIRecipe::execute to use the start value from the
start value operand of the recipe. This is needed to make sure we resume
from the correct value during epilogue vectorization.

At the moment, the start value is set to the sentinel value in
adjustRecipesForReductions, as the original start value needs to be used
when creating ResumePhi recipes.

Fixes a mis-compile introduced by b3cba9be41bfa8 in SPEC2017 on AArch64.
2024-12-16 23:29:49 +00:00
Florian Hahn
f9120dc2a6
[VPlan] Make sure vector trip count is ready for prepareToExecute (NFC)
Split off from https://github.com/llvm/llvm-project/pull/112145. This
ensures that getOrCreateVectorTripCount creates the trip count as needed
when induction resume value creation is moved to VPlan and no longer
creates the vector trip count early.
2024-12-16 20:44:20 +00:00
Florian Hahn
89d5272841
[VPlan] Remove getPreheader(). (NFC)
The preheader is now the entry block, connected to the vector.ph.

Clean up after https://github.com/llvm/llvm-project/pull/114292.
2024-12-16 19:48:02 +00:00
Simon Pilgrim
8217c2eaef
[VectorCombine] foldShuffleOfBinops - extend to handle icmp/fcmp ops as well (#120075)
Extend binary instructions matching to match compare instructions + predicate as well.
2024-12-16 17:23:04 +00:00
Alexey Bataev
d1a7225076 [SLP]Check if the node must keep its original bitwidth
Need to check if during previous analysis the node has requested to keep
its original bitwidth to avoid incorrect codegen.

Fixes #120076
2024-12-16 08:01:22 -08:00
Florian Hahn
95e509a989
[VPlan] Add VPWidenInduction recipe as common base class (NFC). (#120008)
This helps to simplify some existing code and new code
(https://github.com/llvm/llvm-project/pull/112145)

PR: https://github.com/llvm/llvm-project/pull/120008
2024-12-16 09:40:03 +00:00
Luke Lau
4746395bd7
[VPlan] Omit zero add in VPWidenIntOrFpInductionRecipe (#119668)
I'm not sure if getStepVector was used for other things in the past
where StartIdx was non-zero, but nowadays VPWidenIntOrFpInductionRecipe
is the only user of it, and just passes zero to it. I presume
InstCombine was already catching this so hopefully removing this won't
affect codegen.
2024-12-16 11:55:48 +08:00
Florian Hahn
43045051d4
[VPlan] Modernize VPWidenIntOrFpInductionRecipe printing (NFC).
Modernize VPWidenIntOrFpInductionRecipe printing by including the result
VPValue and all operand VPValues, similar to VPScalarIVStepsRecipe and
VPDerivedIVRecipe.
2024-12-15 20:46:52 +00:00
Florian Hahn
e64650d702
[VPlan] Get types and step from VPWidenPointerInductionRecipe (NFC).
Use information directly from operands instead of going through
IVDescriptor.
2024-12-15 18:52:10 +00:00
Florian Hahn
2067e604a4
[VPlan] Manage VPWidenPointerInduction debug location via recipe.
Update VPWidenPointerInduction to manage its debug location via recipe.
This makes sure we emit a proper debug location for
VPWidenPointerInductionRecipes.
2024-12-15 14:41:07 +00:00
Florian Hahn
734a204fbd
[VPlan] Manage VPWidenIntOrFPInduction debug location via recipe (NFC).
Properly set VPWidenIntOrFpInductionRecipe's debug location in the
recipe and use it, instead of using the debug location of the underlying
IR instruction.
2024-12-15 13:45:28 +00:00
Simon Pilgrim
916bae2d92 [VectorCombine] foldShuffleOfBinops - refactor to make it easier to match icmp/fcmp patterns
NFC refactor to make it easier to also use the fold for icmp/fcmp patterns in a future patch - match the Shuffle with general Instruction operands and avoid explicit use of the BinaryOperator matches as much as possible for the general costing / fold.
2024-12-15 12:49:24 +00:00
Florian Hahn
2564f1e199
[VPlan] Simplify Not(Not(A)) -> A.
Follow-up simplification to 5fae408d3a4c073ee4.
2024-12-14 20:08:26 +00:00
Simon Pilgrim
cc54a0ce56
[VectorCombine] vectorizeLoadInsert - only fold when inserting into a poison vector (#119906)
We have corresponding poison tests in the "-inseltpoison.ll" sibling test files.

Fixes #119900
2024-12-14 11:56:12 +00:00
Han-Kuan Chen
da439d3af4
[SLP] NFC. Refactor getEntryCost and isReverseOrder usage. (#119680)
Users should check whether an input is empty before using
isReverseOrder.
2024-12-14 02:01:25 +08:00
Ramkumar Ramachandra
4a0d53a0b0
PatternMatch: migrate to CmpPredicate (#118534)
With the introduction of CmpPredicate in 51a895a (IR: introduce struct
with CmpInst::Predicate and samesign), PatternMatch is one of the first
key pieces of infrastructure that must be updated to match a CmpInst
respecting samesign information. Implement this change to Cmp-matchers.

This is a preparatory step in migrating the codebase over to
CmpPredicate. Since we no functional changes are desired at this stage,
we have chosen not to migrate CmpPredicate::operator==(CmpPredicate)
calls to use CmpPredicate::getMatching(), as that would have visible
impact on tests that are not yet written: instead, we call
CmpPredicate::operator==(Predicate), preserving the old behavior, while
also inserting a few FIXME comments for follow-ups.
2024-12-13 14:18:33 +00:00
Han-Kuan Chen
3133acf1fb Revert "[SLP] Make getSameOpcode support different instructions if they have same semantics. (#112181)"
This reverts commit 82204154b7bd1f8c487c94c7ef00399d776b29f0.
2024-12-12 20:38:31 -08:00
Han-Kuan Chen
82204154b7
[SLP] Make getSameOpcode support different instructions if they have same semantics. (#112181) 2024-12-13 12:06:10 +08:00
Florian Hahn
4e828f8d74
[VPlan] Perform DT expensive input DT verification earlier (NFC).
After 6c8f41d33674, DT adjustments for the skeleton are applied as VPBBs
are executed. Move input DT verification up before starting to execute
any VPBBs to avoid checking DT while the CFG and DT are in an incomplete
state.

This fixes a number of verification failures with expensive checks
enabled, including
https://lab.llvm.org/buildbot/#/builders/16/builds/10584
2024-12-12 20:06:20 +00:00
Han-Kuan Chen
2546ae4ed0
[SLP][REVEC] Fix the number of elements in the mask of a ShuffleVectorInst is not a power of 2. (#119689)
The following shufflevector should not be vectorized when
slp-vectorize-non-power-of-2 is enabled.

shufflevector <8 x float> %1, <8 x float> poison, <3 x i32> <i32 0, i32
1, i32 2>
shufflevector <8 x float> %1, <8 x float> poison, <3 x i32> <i32 4, i32
5, i32 6>
2024-12-13 02:22:41 +08:00
Florian Hahn
c95af0844d
[VPlan] Move ::getVectorLoopRegion out of ifdef (NFC).
Fixes a build failure with assertions disabled after
6c8f41d336747.
2024-12-12 16:21:21 +00:00
Florian Hahn
6c8f41d336
[VPlan] Hook IR blocks into VPlan during skeleton creation (NFC) (#114292)
As a first step to move towards modeling the full skeleton in VPlan,
start by wrapping IR blocks created during legacy skeleton creation in
VPIRBasicBlocks and hook them into the VPlan. This means the skeleton
CFG is represented in VPlan, just before execute. This allows moving
parts of skeleton creation into recipes in the VPBBs gradually.

Note that this allows retiring some manual DT updates, as this will be
handled automatically during VPlan execution.

PR: https://github.com/llvm/llvm-project/pull/114292
2024-12-12 15:58:16 +00:00
Kazu Hirata
2f8238f849
[llvm] Migrate away from PointerUnion::{is,get} (NFC) (#119679)
Note that PointerUnion::{is,get} have been soft deprecated in
PointerUnion.h:

  // FIXME: Replace the uses of is(), get() and dyn_cast() with
  //        isa<T>, cast<T> and the llvm::dyn_cast<T>

I'm not touching PointerUnion::dyn_cast for now because it's a bit
complicated; we could blindly migrate it to dyn_cast_if_present, but
we should probably use dyn_cast when the operand is known to be
non-null.
2024-12-12 07:54:48 -08:00
Simon Pilgrim
86779da52b
[VectorCombine] Fold "(or (zext (bitcast X)), (shl (zext (bitcast Y)), C))" -> "(bitcast (concat X, Y))" MOVMSK bool mask style patterns (#119695)
Mask/Bool vectors are often bitcast to/from scalar integers, in particular when concatenating mask results, often this is due to the difficulties of working with vector of bools on C/C++. On x86 this typically involves the MOVMSK/KMOV instructions.

To concatenate bool masks, these are typically cast to scalars, which are then zero-extended, shifted and OR'd together.

This patch attempts to match these scalar concatenation patterns and convert them to vector shuffles instead. This in turn often assists with further vector combines, depending on the cost model.

Reapplied patch from #119559 - fixed use after free issue.

Fixes #111431
2024-12-12 13:45:10 +00:00
Florian Hahn
a480d51722
[VPlan] Use existing vector trip count VPValue for resume phi (NFC)
Instead of going through getOrAddLiveIn to get a VPValue for the vector
trip count retrieve it directly from VPlan via getVectorTripCount.

Small simplification following 0e70289f373.
2024-12-12 11:03:47 +00:00
Simon Pilgrim
b604d23feb [VectorCombine] Pull out isa<VectorType> check.
Noticed while investigating a crash in #119559 - we don't account for I being replaced and its Type being reallocated. So hoist the checks to the start of the loop.
2024-12-12 11:02:01 +00:00
Mel Chen
b3cba9be41
[LoopVectorize] Vectorize select-cmp reduction pattern for increasing integer induction variable (#67812)
Consider the following loop:
```
  int rdx = init;
  for (int i = 0; i < n; ++i)
    rdx = (a[i] > b[i]) ? i : rdx;
```
We can vectorize this loop if `i` is an increasing induction variable.
The final reduced value will be the maximum of `i` that the condition
`a[i] > b[i]` is satisfied, or the start value `init`.

This patch added new RecurKind enums - IFindLastIV and FFindLastIV.

---------

Co-authored-by: Alexey Bataev <5361294+alexey-bataev@users.noreply.github.com>
2024-12-12 16:48:31 +08:00
Luke Lau
b26fe5b7e9
[VPlan] Use variadic isa<> in a few more places. NFC (#119538) 2024-12-12 13:26:39 +08:00
Michal Paszkowski
04313b86a5
Revert "[LoadStoreVectorizer] Postprocess and merge equivalence classes" (#119657)
Reverts llvm/llvm-project#114501, due to the following failure:
https://lab.llvm.org/buildbot/#/builders/55/builds/4171
2024-12-11 20:36:23 -08:00
Vyacheslav Klochkov
fd2f8d485d
[LoadStoreVectorizer] Postprocess and merge equivalence classes (#114501)
This patch introduces a new method:
void Vectorizer::mergeEquivalenceClasses(EquivalenceClassMap &EQClasses)
const

The method is called at the end of
Vectorizer::collectEquivalenceClasses() and is needed to merge
equivalence classes that differ only by their underlying objects (UO1
and UO2), where UO1 is 1-level-indirection underlying base for UO2. This
situation arises due to the limited lookup depth used during the search
of underlying bases with llvm::getUnderlyingObject(ptr).

Using any fixed lookup depth can result into creation of multiple
equivalence classes that only differ by 1-level indirection bases.

The new approach merges equivalence classes if they have adjacent bases
(1-level indirection). If a series of equivalence classes form ladder
formed of 1-step/level indirections, they are all merged into a single
equivalence class. This provides more opportunities for the load-store
vectorizer to generate better vectors.

---------

Signed-off-by: Klochkov, Vyacheslav N <vyacheslav.n.klochkov@intel.com>
2024-12-11 19:01:35 -08:00
Florian Hahn
5fae408d3a
[VPlan] Dispatch to multiple exit blocks via middle blocks. (#112138)
A more lightweight variant of
https://github.com/llvm/llvm-project/pull/109193,
which dispatches to multiple exit blocks via the middle blocks.

The patch also introduces a bit of required scaffolding to enable
early-exit vectorization, including an option. At the moment, early-exit
vectorization doesn't come with legality checks, and is only used if the
option is provided and the loop has metadata forcing vectorization. This
is only intended to be used for testing during bring-up, with @david-arm
enabling auto early-exit vectorization plugging in the changes from
https://github.com/llvm/llvm-project/pull/88385.

PR: https://github.com/llvm/llvm-project/pull/112138
2024-12-11 21:11:05 +00:00