4261 Commits

Author SHA1 Message Date
Florian Hahn
cec24f0d7e
[VPlan] Update stale test after 9536a6286, fix formatting. 2024-01-31 13:45:38 +00:00
Florian Hahn
9536a6286e
[VPlan] Preserve original induction order when creating scalar steps.
Update createScalarIVSteps to take an insert point as parameter. This
ensures that the inserted scalar steps are in the same order as the
recipes they replace (vs in reverse order as currently). This helps to
reduce the diff for follow-up changes.
2024-01-31 13:31:28 +00:00
Alexey Bataev
285bc69846 [SLP]Fix PR80027: Fix costs processing for minbitwidth types.
Need to switch the types, the destination is first in getCastInstrCost
function.
2024-01-30 10:32:55 -08:00
Alexey Bataev
976374d982 [SLP][NFC]Use MutableArrayRef instead of SmallVectorImpl&, NFC. 2024-01-30 06:21:47 -08:00
Nilanjana Basu
c492eb6b28
[LV] Update interleaving count computation when scalar epilogue loop needs to run at least once (#79651)
Update loop interleaving count computation to address loops that require at least one scalar iteration in the epilogue loop. For this case, the available trip count for interleaving the loop is one less.
2024-01-29 13:41:15 -08:00
Alexey Bataev
8d89dd4a58 [SLP]Fix PR79743: Check that all users are demoted before trying to
demote the tree entry.

Need to check if all user nodes are marked for demotion before demoting
the node. Otherwise, some data info might be lost after vectorization.
2024-01-29 10:51:20 -08:00
Florian Hahn
743946e8ef
[VPlan] Replace VPRecipeOrVPValue with VP2VP recipe simplification. (#76090)
Move simplification of VPBlendRecipes from early VPlan construction to
VPlan-to-VPlan based recipe simplification. This simplifies initial
construction.

Note that some in-loop reduction tests are failing at the moment, due to
the reduction predicate being created after the reduction recipe. I will
provide a patch for that soon.

PR: https://github.com/llvm/llvm-project/pull/76090
2024-01-29 09:52:05 +00:00
Florian Hahn
2d0d65b3ba
[VPlan] Create edge masks all cases up front needed.(NFC)
Similarly to how block masks are created up front and later only
retrieved also make sure masks are created in cases where edge masks are
needed, i.e. blend recipes.

Creating block-in masks for all blocks in the loop also ensures edge
masks for all relevant edges have been created. Later, the new
getEdgeMask can be used to look up cached edge masks.

This makes sure edge masks are available in all cases for
https://github.com/llvm/llvm-project/pull/76090.
2024-01-28 21:20:18 +00:00
Florian Hahn
1b37e8087e
[VPlan] use getVPValueOrAddLiveIn in VPlan::duplicate.
Instead of creating live-ins manually, use getOrAddLiveIn which
automatically takes care of adding them to VPLiveInsToFree. Also use it
to create the VPValue for the trip-count. This fixes a leak:
https://lab.llvm.org/buildbot/#/builders/168/builds/18308/steps/10/logs/stdio
2024-01-28 12:39:39 +00:00
Florian Hahn
7c03d5d41d
[VPlan] Use unique_ptr to clean up duplicated plan. 2024-01-27 20:51:55 +00:00
Florian Hahn
ec402a2e53
[VPlan] Implement cloning of VPlans. (#73158)
This patch implements cloning for VPlans and recipes. Cloning is used in
the epilogue vectorization path, to clone the VPlan for the main vector
loop. This means we won't re-use a VPlan when executing the VPlan for
the epilogue vector loop, which in turn will enable us to perform
optimizations based on UF & VF.
2024-01-27 13:30:52 +00:00
David Sherwood
962fbafecf
[LoopVectorize] Refine runtime memory check costs when there is an outer loop (#76034)
When we generate runtime memory checks for an inner loop it's
possible that these checks are invariant in the outer loop and
so will get hoisted out. In such cases, the effective cost of
the checks should reduce to reflect the outer loop trip count.

This fixes a 25% performance regression introduced by commit

49b0e6dcc296792b577ae8f0f674e61a0929b99d

when building the SPEC2017 x264 benchmark with PGO, where we
decided the inner loop trip count wasn't high enough to warrant
the (incorrect) high cost of the runtime checks. Also, when
runtime memory checks consist entirely of diff checks these are
likely to be outer loop invariant.
2024-01-26 14:43:48 +00:00
Florian Hahn
731c2049a4
[VPlan] Relax IV user assertion after 0ab539f for epilogue vec.
After 0ab539fd6748adf2f638e10514dd9419597d8863, the canonical IV in the
epilogue vector loop may be used by a trunc. Relax the corresponding
assert.

This should fix some build-bot failures, including
    https://lab.llvm.org/buildbot/#/builders/187/builds/14113
    https://lab.llvm.org/buildbot/#/builders/98/builds/32350
    https://lab.llvm.org/buildbot/#/builders/239/builds/5473
2024-01-26 13:19:25 +00:00
Graham Hunter
d4c0171423
[LV] Fix handling of interleaving linear args (#78725)
Currently when interleaving vector calls with linear arguments,
the Part is ignored and all vector calls use the initial value
from the first lane of the current iteration.

Fix this to extract from the correct part of the linear vector.
2024-01-26 11:30:35 +00:00
Florian Hahn
0ab539fd67
[VPlan] Add new VPScalarCastRecipe, use for IV & step trunc. (#78113)
Add a new recipe to model scalar cast instructions, without relying on
an underlying instruction.

This allows creating scalar casts, without relying on an underlying
instruction (like the current VPReplicateRecipe). The new recipe is 
used to explicitly model both truncating the induction step and the
VPDerivedIVRecipe, thus simplifying both the recipe and code
needed to introduce it.

Truncating VPWidenIntOrFpInductionRecipes should also be modeled using
the new recipe, as follow-up.

PR: https://github.com/llvm/llvm-project/pull/78113
2024-01-26 11:13:05 +00:00
Alexey Bataev
92ae2ca12b [SLP][NFC]Improve BottomTopTop reordering of orders for multi-iterations
attempts, NFC.

If several iterations of reodering of orders is required, need to use
different algorithm.
2024-01-25 13:04:01 -08:00
Alexey Bataev
6fe21bc1da [SLP]Fix PR79229: Do not erase extractelement, if it used in
multiregister node.

If the node can be span between several registers and same
extractelement instruction is used in several parts, it may be required
to keep such extractelement instruction to avoid compiler crash.
2024-01-25 06:20:53 -08:00
Florian Hahn
a04f615291
[LV] Check for innermost loop instead of EnableVPlanNativePath in CM.
Replace EnableVPlanNativePath checks in the cost-model by assertions
that the code is only called for innermost loops. This ensures that the
cost model isn't used in the VPlanNativePath, which is only used for
outer-loop vectorization.

Even with EnableVPlanNativePath, inner loops are processed by the
inner loop vectorization path, not the native path, so checking for
EnableVPlanNativePath may impact decisions for inner loops and can
cause crashes, like in the attached test case.
2024-01-25 12:49:52 +00:00
Alexey Bataev
36e4a7ecca [SLP]Fix PR79321: SLPVectorizer's PHICompare doesn't provide a strict
weak ordering.
Try to make PHICompare to meat strict weak ordering criteria.
2024-01-24 13:46:05 -08:00
Alexey Bataev
48bbd76587 [SLP]Fix PR79229: Check that extractelement is used only in a single node
before erasing.

Before trying to erase the extractelement instruction, not enough to
check for single use, need to check that it is not used in several nodes
because of the preliminary nodes reordering.
2024-01-24 11:22:22 -08:00
Alexey Bataev
ca654acc16 [SLP]Fix PR79321: SLPVectorizer's PHICompare doesn't provide a strict
weak ordering.

Compared NumUses to meet the reaquirements of the strict weak ordering.
2024-01-24 09:36:25 -08:00
Alexey Bataev
bb3e0d7fc3 [SLP]Fix PR79193: skip analysis of gather nodes for minbitwidth.
No need in trying to analyze small graphs with gather node only to avoid
crash.
2024-01-23 12:44:49 -08:00
Stephen Tozer
632f44e5ed
[RemoveDIs][DebugInfo] Handle DPVAssign in most transforms (#78986)
This patch trivially updates various opt passes to handle DPVAssigns. In
all cases, this means some combination of generifying existing code to
handle DPValues and DbgAssignIntrinsics, iterating over DPValues where
previously we did not, or duplicating code for DbgAssignIntrinsics to
the equivalent DPValue function (in inlining and salvageDebugInfo).
2024-01-23 16:16:59 +00:00
Florian Hahn
3683852d49
[VPlan] Use replaceUsesWithIf in replaceAllUseswith and add comment (NFCI).
Follow-up to post-commit commens for b1bfe221e6.
2024-01-21 12:56:16 +00:00
Florian Hahn
42fb1fac9e
[VPlan] Use DebugLoc from recipe in VPWidenCallRecipe (NFCI).
Instead of using the debug location of the underlying instruction, use
the debug location from the recipe. This removes an unneeded dependency
of the underlying instruction.
2024-01-19 13:33:03 +00:00
Florian Hahn
abdb61f5fd
[VPlan] Introduce VPSingleDefRecipe. (#77023)
This patch introduces a new common base class for recipes defining a
single result VPValue. This has been discussed/mentioned at various
previous reviews as potential follow-up and helps to replace various
getVPSingleValue calls.

PR: https://github.com/llvm/llvm-project/pull/77023
2024-01-19 10:27:53 +00:00
Paschalis Mpeis
37c87d5689
[LV][AArch64] LoopVectorizer allows scalable frem instructions (#76247)
LoopVectorizer is aware when a target can replace a scalable frem
instruction with a vector library call for a given VF and it returns the
relevant cost. Otherwise, it returns an invalid cost (as previously).

Add test that check costs on AArch64, when there is no vector library
available and when there is (with and without tail-folding).

NOTE: Invoking CostModel directly (not through LV) would still return
invalid costs.
2024-01-18 08:32:53 +00:00
Alexey Bataev
093206bb7e [SLP]Fix PR78298: Assertion `GEP->getNumIndices() == 1 &&
!isa<Constant>(GEPIdx)' failed.

The non-constant index might be folded to constant during earlier stages
of vectorization. Need to consider this option and filter out out GEP
with the constant indices from the candidates list.
2024-01-16 09:17:35 -08:00
Florian Hahn
9a402d6fbb
[LV] Make DL optional argument for VPBuilder member functions (NFCI). 2024-01-16 15:50:09 +00:00
Florian Hahn
e7671bc9d6
[LV] Fix indent for loop in adjustRecipesForReductions (NFC). 2024-01-16 15:28:46 +00:00
Alexey Bataev
d79fdb2749 [SLP]Fix PR78236: correctly track external values, replaced several
times during reduction vectorization.

If the external value was replaced in the vectorizer several times during reduction vectorization, need to find the original value to correctly handle external uses and emit extractelement instructions properly.
2024-01-16 06:52:43 -08:00
Florian Hahn
6011d6b2cc
[VPlan] Use start value of reduction phi to determine type (NFCI).
Instead of accessing the underlying original IR value, check the type of
the start value from the recipe directly.
2024-01-16 14:39:51 +00:00
Mel Chen
b6e8f6604c
[LV] Skipping all debug instructions when native vplan is enabled (#77413)
The following internal error occurred when using native vplan to
vectorize the program with the debug info generation.

Assertion `!isa<DbgInfoIntrinsic>(CI) && "DbgInfoIntrinsic should have been dropped during VPlan construction"' failed.

This patch ignored all debug instructions to fix the error when native
vplan is enabled.
2024-01-16 11:08:10 +08:00
Alexey Bataev
6fdc2ce8c5 [SLP]Fix PR77916: transform the whole mask, not only the elements for
the second vector.

Need to transform all elements in the long mask, if we decided to
produce shorter version, some elements may still have incorrect inifices
after transformation for the first vector in the permutation.
2024-01-12 07:07:43 -08:00
Nikita Popov
6c2fbc3a68
[IRBuilder] Add CreatePtrAdd() method (NFC) (#77582)
This abstracts over the common pattern of creating a gep with i8 element
type.
2024-01-12 14:21:21 +01:00
Florian Hahn
59d6f033a2
[VPlan] Support narrowing widened loads in truncateToMinimimalBitwidths.
MinBWs may also contain widened load instructions, handle them by only
narrowing their result.

Fixes https://github.com/llvm/llvm-project/issues/77468
2024-01-12 13:14:13 +00:00
Alexey Bataev
39b2104b4a [SLP]Fix a crash for reduced values with minbitwidth, which are reused.
If the reduced values are additionally affected by minbitwidth analysis,
need to cast them to a proper type before doing any math, if they are
reused.
2024-01-12 04:49:48 -08:00
Alexey Bataev
18473eb108 [SLP]Do not require external uses for roots and single use for other instructions in computeMinimumValueSizes. (#72679)
After changes, that does not require support from InstCombine, we can
drop some extra requirements for values-to-be-demoted. No need to check
for external uses for roots/other instructions, just check that the
  no non-vectorized insertelement instruction, which may require
  widening.

Review: https://github.com/llvm/llvm-project/pull/72679
2024-01-11 06:59:57 -08:00
Martin Storsjö
1de3f46938 Revert "[SLP]Do not require external uses for roots and single use for other instructions in computeMinimumValueSizes. (#72679)"
This reverts commit 408dce82016463dcb5026b2ddfc62174970a88e9.

This triggered failed asserts with code like this:

    char a[];
    short *b;
    int c, d, e, f;
    void g() {
      char *h;
      for (;;) {
        for (; f; ++f) {
          h[f] = b[0] * a[e] + b[c] * a[1] >> 7;
          ++b;
        }
        h += d;
      }
    }

Compiled like this:

    $ clang -target x86_64-linux-gnu -c repro.c -O2
    clang: ../lib/IR/Instructions.cpp:3335: static llvm::CastInst* llvm::CastInst::Create(llvm::Instruction::CastOps, llvm::Value*, llvm::Type*, const llvm::Twine&, llvm::Instruction*): Assertion `castIsValid(op, S, Ty) && "Invalid cast!"' failed.
2024-01-11 12:15:35 +02:00
Craig Topper
1c342571b8
[LV] Use value_or to simplify code. NFC (#77030) 2024-01-10 12:40:26 -08:00
Alexey Bataev
408dce8201
[SLP]Do not require external uses for roots and single use for other instructions in computeMinimumValueSizes. (#72679)
After changes, that does not require support from InstCombine, we can
drop some extra requirements for values-to-be-demoted. No need to check
for external uses for roots/other instructions, just check that the
  no non-vectorized insertelement instruction, which may require
  widening.
2024-01-10 14:06:29 -05:00
Alexey Bataev
73ce13d79b
[SLP][TTI]Improve detection of the insert-subvector pattern for SLP. (#74749)
SLP vectorizer passes the type of the subvector and the mask, which size
determines the size of the resulting vector. TTI should support this
pattern to improve cost estimation of the insert_subvector shuffle
pattern.
2024-01-10 10:39:34 -05:00
Florian Hahn
8b7bbedec7
[LV] Re-add early exit in VPRecipeBuilder::createBlockInMask.
Re-add early exit that was accidentally dropped in  51afb10.
2024-01-10 15:02:14 +00:00
Florian Hahn
51afb10174
[LV] Create block in mask up-front if needed. (#76635)
At the moment, block and edge masks are created on demand, which means
that they are inserted at the point where they are demanded and then
cached. It is possible that the mask for a block is looked up later at a
point that's not dominated by the point where the mask has been
inserted.

To avoid this, create masks up front on entry to the corresponding basic
block and leave it to VPlan simplification to remove unneeded masks.

Note that we need to create masks for all blocks, if any of the blocks
in the loop needs predication, as computing the mask of a block depends
on the masks of its predecessor.

Needed for #76090.

https://github.com/llvm/llvm-project/pull/76635
2024-01-09 10:50:08 +00:00
Alexey Bataev
036e48e2f5 [SLP]Fix PR76850: do the analysis of the submask.
Need to limit the transformation of the VecMask by the corresponding part of the mask of SliceSize size to avoid compiler crash during further cost analysis.
2024-01-08 07:51:02 -08:00
Florian Hahn
18ec3304a9
[VPlan] Manage InBounds via VPRecipeWithIRFlags for VectorPtrRecipe.
As suggested as follow-up in
https://github.com/llvm/llvm-project/pull/72164, manage inbounds via
VPRecipeWithIRFlags.

Note that in some cases we can now preserve inbounds in a few more
cases.
2024-01-07 13:58:05 +00:00
Florian Hahn
3fb0d8dc80
Recommit "[VPlan] Mark Select VPInstructions as not having sideeffects."
With #70253 landed, selects for reduction results are explicitly used by
ComputeReductionResult and Selects can be marked as not having
side-effects again.

This reverts the revert commit 173032902c960d4d0d67b521d8c149553d8e8ba3.
2024-01-06 12:08:06 +00:00
Florian Hahn
241fe83704
[VPlan] Introduce ComputeReductionResult VPInstruction opcode. (#70253)
This patch introduces a new ComputeReductionResult opcode to compute the
final reduction result in the middle block. The code from fixReduction
has been moved to ComputeReductionResult, after some earlier cleanup
changes to model parts of fixReduction explicitly elsewhere as needed.

The recipe may be broken down further in the future.

Note that  the phi nodes to merge the reduction result from the trip 
count check and the middle block, to be used as resume value for the
scalar remainder loop are also generated based on 
ComputeReductionResult.

Once we have a VPValue for the reduction result, this can also be
modeled explicitly and moved out of the recipe.
2024-01-04 22:53:18 +00:00
Florian Hahn
2ab5c47c87
[VPlan] Don't replace scalarizing recipe with VPWidenCastRecipe.
Don't replace a scalarizing recipe with a VPWidenCastRecipe. This would
introduce wide (vectorizing) recipes when interleaving only.

Fixes https://github.com/llvm/llvm-project/issues/76986
2024-01-04 20:39:44 +00:00
Alexey Bataev
79e62315be [SLP]Use revectorized value for extracts from buildvector, beeing
vectorized.

When trying to reuse the extractelement instruction, emitted for the
insertelement instruction, need to check, if the this insertelement
instruction was vectorized. In this case, need to use vectorized value,
not the original insertelement.
2024-01-04 06:45:26 -08:00