1588 Commits

Author SHA1 Message Date
Alexey Bataev
15295d0135 [SLP][NFC]Introduce and use computeCommonAlignment function, NFC. 2024-02-01 06:13:39 -08:00
Alexey Bataev
285bc69846 [SLP]Fix PR80027: Fix costs processing for minbitwidth types.
Need to switch the types, the destination is first in getCastInstrCost
function.
2024-01-30 10:32:55 -08:00
Alexey Bataev
976374d982 [SLP][NFC]Use MutableArrayRef instead of SmallVectorImpl&, NFC. 2024-01-30 06:21:47 -08:00
Alexey Bataev
8d89dd4a58 [SLP]Fix PR79743: Check that all users are demoted before trying to
demote the tree entry.

Need to check if all user nodes are marked for demotion before demoting
the node. Otherwise, some data info might be lost after vectorization.
2024-01-29 10:51:20 -08:00
Alexey Bataev
92ae2ca12b [SLP][NFC]Improve BottomTopTop reordering of orders for multi-iterations
attempts, NFC.

If several iterations of reodering of orders is required, need to use
different algorithm.
2024-01-25 13:04:01 -08:00
Alexey Bataev
6fe21bc1da [SLP]Fix PR79229: Do not erase extractelement, if it used in
multiregister node.

If the node can be span between several registers and same
extractelement instruction is used in several parts, it may be required
to keep such extractelement instruction to avoid compiler crash.
2024-01-25 06:20:53 -08:00
Alexey Bataev
36e4a7ecca [SLP]Fix PR79321: SLPVectorizer's PHICompare doesn't provide a strict
weak ordering.
Try to make PHICompare to meat strict weak ordering criteria.
2024-01-24 13:46:05 -08:00
Alexey Bataev
48bbd76587 [SLP]Fix PR79229: Check that extractelement is used only in a single node
before erasing.

Before trying to erase the extractelement instruction, not enough to
check for single use, need to check that it is not used in several nodes
because of the preliminary nodes reordering.
2024-01-24 11:22:22 -08:00
Alexey Bataev
ca654acc16 [SLP]Fix PR79321: SLPVectorizer's PHICompare doesn't provide a strict
weak ordering.

Compared NumUses to meet the reaquirements of the strict weak ordering.
2024-01-24 09:36:25 -08:00
Alexey Bataev
bb3e0d7fc3 [SLP]Fix PR79193: skip analysis of gather nodes for minbitwidth.
No need in trying to analyze small graphs with gather node only to avoid
crash.
2024-01-23 12:44:49 -08:00
Stephen Tozer
632f44e5ed
[RemoveDIs][DebugInfo] Handle DPVAssign in most transforms (#78986)
This patch trivially updates various opt passes to handle DPVAssigns. In
all cases, this means some combination of generifying existing code to
handle DPValues and DbgAssignIntrinsics, iterating over DPValues where
previously we did not, or duplicating code for DbgAssignIntrinsics to
the equivalent DPValue function (in inlining and salvageDebugInfo).
2024-01-23 16:16:59 +00:00
Alexey Bataev
093206bb7e [SLP]Fix PR78298: Assertion `GEP->getNumIndices() == 1 &&
!isa<Constant>(GEPIdx)' failed.

The non-constant index might be folded to constant during earlier stages
of vectorization. Need to consider this option and filter out out GEP
with the constant indices from the candidates list.
2024-01-16 09:17:35 -08:00
Alexey Bataev
d79fdb2749 [SLP]Fix PR78236: correctly track external values, replaced several
times during reduction vectorization.

If the external value was replaced in the vectorizer several times during reduction vectorization, need to find the original value to correctly handle external uses and emit extractelement instructions properly.
2024-01-16 06:52:43 -08:00
Alexey Bataev
6fdc2ce8c5 [SLP]Fix PR77916: transform the whole mask, not only the elements for
the second vector.

Need to transform all elements in the long mask, if we decided to
produce shorter version, some elements may still have incorrect inifices
after transformation for the first vector in the permutation.
2024-01-12 07:07:43 -08:00
Alexey Bataev
39b2104b4a [SLP]Fix a crash for reduced values with minbitwidth, which are reused.
If the reduced values are additionally affected by minbitwidth analysis,
need to cast them to a proper type before doing any math, if they are
reused.
2024-01-12 04:49:48 -08:00
Alexey Bataev
18473eb108 [SLP]Do not require external uses for roots and single use for other instructions in computeMinimumValueSizes. (#72679)
After changes, that does not require support from InstCombine, we can
drop some extra requirements for values-to-be-demoted. No need to check
for external uses for roots/other instructions, just check that the
  no non-vectorized insertelement instruction, which may require
  widening.

Review: https://github.com/llvm/llvm-project/pull/72679
2024-01-11 06:59:57 -08:00
Martin Storsjö
1de3f46938 Revert "[SLP]Do not require external uses for roots and single use for other instructions in computeMinimumValueSizes. (#72679)"
This reverts commit 408dce82016463dcb5026b2ddfc62174970a88e9.

This triggered failed asserts with code like this:

    char a[];
    short *b;
    int c, d, e, f;
    void g() {
      char *h;
      for (;;) {
        for (; f; ++f) {
          h[f] = b[0] * a[e] + b[c] * a[1] >> 7;
          ++b;
        }
        h += d;
      }
    }

Compiled like this:

    $ clang -target x86_64-linux-gnu -c repro.c -O2
    clang: ../lib/IR/Instructions.cpp:3335: static llvm::CastInst* llvm::CastInst::Create(llvm::Instruction::CastOps, llvm::Value*, llvm::Type*, const llvm::Twine&, llvm::Instruction*): Assertion `castIsValid(op, S, Ty) && "Invalid cast!"' failed.
2024-01-11 12:15:35 +02:00
Alexey Bataev
408dce8201
[SLP]Do not require external uses for roots and single use for other instructions in computeMinimumValueSizes. (#72679)
After changes, that does not require support from InstCombine, we can
drop some extra requirements for values-to-be-demoted. No need to check
for external uses for roots/other instructions, just check that the
  no non-vectorized insertelement instruction, which may require
  widening.
2024-01-10 14:06:29 -05:00
Alexey Bataev
73ce13d79b
[SLP][TTI]Improve detection of the insert-subvector pattern for SLP. (#74749)
SLP vectorizer passes the type of the subvector and the mask, which size
determines the size of the resulting vector. TTI should support this
pattern to improve cost estimation of the insert_subvector shuffle
pattern.
2024-01-10 10:39:34 -05:00
Alexey Bataev
036e48e2f5 [SLP]Fix PR76850: do the analysis of the submask.
Need to limit the transformation of the VecMask by the corresponding part of the mask of SliceSize size to avoid compiler crash during further cost analysis.
2024-01-08 07:51:02 -08:00
Alexey Bataev
79e62315be [SLP]Use revectorized value for extracts from buildvector, beeing
vectorized.

When trying to reuse the extractelement instruction, emitted for the
insertelement instruction, need to check, if the this insertelement
instruction was vectorized. In this case, need to use vectorized value,
not the original insertelement.
2024-01-04 06:45:26 -08:00
Alexey Bataev
7c963fde16 [SLP]Use revectorized value for extracts from buildvector, beeing
vectorized.

If the insertelement instruction is vectorized, and the extractelement
instruction from such insertelement also vectorized as part of the same
tree, need to extract from the corresponding for insertelement vectorized value rather than original insertelement instruction.
2024-01-03 10:38:09 -08:00
Alexandros Lamprineas
e512df3ecc
[LV] Fix crash when vectorizing function calls with linear args. (#76274)
llvm/lib/IR/Type.cpp:694:
    Assertion `isValidElementType(ElementType) && "Element type of a
    VectorType must be an integer, floating point, or pointer type."'
    failed.
Stack dump:
    llvm::FixedVectorType::get(llvm::Type*, unsigned int)
    llvm::VPWidenCallRecipe::execute(llvm::VPTransformState&)
    llvm::VPBasicBlock::execute(llvm::VPTransformState*)
    llvm::VPRegionBlock::execute(llvm::VPTransformState*)
    llvm::VPlan::execute(llvm::VPTransformState*)
    ...

Happens with function calls of void return type.
2024-01-02 18:14:16 +00:00
Enna1
9943d33997
[SLP][NFC] Fix assertion in vectorizeGEPIndices() (#76660)
The index constraints for the collected getelementptr instructions
should be single **and** non-constant.
2024-01-02 21:32:18 +08:00
Enna1
a51c2f39f5
[SLP] no need to generate extract for in-tree uses for original scala… (#76077)
…r instruction.

Before
77a609b556,
we always skip in-tree uses of the vectorized scalars in
`buildExternalUses()`,
that commit handles the case that if the in-tree use is scalar operand
in vectorized instruction,
we need to generate extract for these in-tree uses.

in-tree uses remain as scalar in vectorized instructions can be 3 cases:
- The pointer operand of vectorized LoadInst uses an in-tree scalar
- The pointer operand of vectorized StoreInst uses an in-tree scalar
- The scalar argument of vector form intrinsic uses an in-tree scalar

Generating extract for in-tree uses for vectorized instructions are
implemented in `BoUpSLP::vectorizeTree()`:
-
https://github.com/llvm/llvm-project/blob/main/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp#L11497-L11506
-
https://github.com/llvm/llvm-project/blob/main/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp#L11542-L11551
-
https://github.com/llvm/llvm-project/blob/main/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp#L11657-L11667

However,
77a609b556
not only generates extract for vectorized instructions,
but also generates extract for original scalar instructions.
There is no need to generate extract for origin scalar instrutions,
as these scalar instructions will be replaced by vector instructions and
get erased later.

This patch marks there is no exact user for in-tree scalars that 
remain as scalar in vectorized instructions when building external uses,
In this case all uses of this scalar will be automatically replaced by extractelement.
and remove
-
https://github.com/llvm/llvm-project/blob/main/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp#L11497-L11506
-
https://github.com/llvm/llvm-project/blob/main/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp#L11542-L11551
-
https://github.com/llvm/llvm-project/blob/main/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp#L11657-L11667
extracts.
2023-12-30 10:45:26 +08:00
Alexey Bataev
5096501082 [SLP][TTI][X86]Add addsub pattern cost estimation. (#76461)
SLP/TTI do not know about the cost estimation for addsub pattern,
supported by X86. Previously the support for pattern detection was added
(seeTTI::isLegalAltInstr), but the cost still did not estimated
properly.
2023-12-28 05:04:04 -08:00
Douglas Yung
fb981e6b4b Revert "[SLP][TTI][X86]Add addsub pattern cost estimation. (#76461)"
This reverts commit bc8c4bbd7973ab9527a78a20000aecde9bed652d.

Change is failing to build on several bots:
- https://lab.llvm.org/buildbot/#/builders/127/builds/60184
- https://lab.llvm.org/buildbot/#/builders/123/builds/23709
- https://lab.llvm.org/buildbot/#/builders/216/builds/32302
2023-12-27 23:52:04 -08:00
Alexey Bataev
bc8c4bbd79
[SLP][TTI][X86]Add addsub pattern cost estimation. (#76461)
SLP/TTI do not know about the cost estimation for addsub pattern,
supported by X86. Previously the support for pattern detection was added
(seeTTI::isLegalAltInstr), but the cost still did not estimated
properly.
2023-12-27 15:57:21 -05:00
Kazu Hirata
03dc806b12 [Transforms] Use {DenseMap,SmallPtrSet}::contains (NFC) 2023-12-22 14:51:22 -08:00
Alexey Bataev
a13148a880 [SLP]Fix PR75995: drop wrapping flags for resized wrapped binops.
If decided to resize the instruction, need to drop wrapping flags from
the resulting vector instructions to avoid incorrect
optimizations/assumptions later.
Fixes PR75995.
2023-12-20 06:51:39 -08:00
Arthur Eubanks
71a9292298 Revert "[SLP]Improve findReusedOrderedScalars processing, NFCI."
This reverts commit 44dc1e0baae7c4b8a02ba06dcf396d3d452aa873.

Causes non-determinism, see #75987.
2023-12-19 16:14:04 -08:00
Alexey Bataev
00edad17c2 [SLP][NFC]Check for equal opcode preliminary to meet weak strict order
requirement, NFC.

This change does not affect functionality, just fixes the assertions in
some standard c++ library implementations.
2023-12-18 14:12:33 -08:00
Alexey Bataev
a7e10e6603 Revert "[SLP][NFC]Check for equal opcode preliminary to meet weak strict order"
This reverts commit 58a2c4e2f24ffce3966c3988d1a4ca7b04c52244 to fix the
issue detected by https://lab.llvm.org/buildbot/#/builders/233/builds/5424.
2023-12-18 12:35:52 -08:00
Alexey Bataev
58a2c4e2f2 [SLP][NFC]Check for equal opcode preliminary to meet weak strict order
requirement, NFC.

This change does not affect functionality, just fixes the assertions in
some standard c++ library implementations.
2023-12-18 06:42:03 -08:00
Reid Kleckner
3e16152ebc [SLP] Fix OOB GEP index access for a no-op GEP
Issue is covered by existing test
llvm/test/Transforms/SLPVectorizer/RISCV/phi-const.ll

See issue #75632 for ideas for how we could catch these more easily in
the future.
2023-12-15 17:33:06 +00:00
Maurice Heumann
f42b930af9
[SLP] Pessimistically handle unknown vector entries in SLP vectorizer (#75438)
SLP Vectorizer can discard vector entries at unknown positions. This
example shows the behaviour:

https://godbolt.org/z/or43EM594

The following instruction inserts an element at an unknown position:

```
%2 = insertelement <3 x i64> poison, i64 %value, i64 %position
```

The position depends on an argument that is unknown at compile time.

After running SLP, one can see there is no more instruction present
referencing `%position`.

This happens as SLP parallelizes the two adds in the example. It then
needs to merge the original vector with the new vector.

Within `isUndefVector`, the SLP vectorizer constructs a bitmap
indicating which elements of the original vector are poison values. It
does this by walking the insertElement instructions.

If it encounters an insert with a non-constant position, it is ignored.
This will result in poison values to be used for all entries, where
there are no inserts with constant positions.

However, as the position is unknown, the element could be anywhere.
Therefore, I think it is only safe to assume none of the entries are
poison values and to simply take them all over when constructing the
shuffleVector instruction.

This fixes #75437
2023-12-14 09:48:23 -05:00
Alexey Bataev
44dc1e0baa [SLP]Improve findReusedOrderedScalars processing, NFCI.
Tries to simplify structural complexity of the findReusedOrderedScalars function.
2023-12-08 14:27:55 -08:00
Alexey Bataev
fb35bb48c6 [SLP][NFC]Build value-to-gather-nodes map during nodes building, NFC. 2023-12-07 13:41:19 -08:00
Alexey Bataev
58785ebd24 [SLP][NFC]Check for ephemeral values beforehand, NFC. 2023-12-07 13:25:15 -08:00
Alexey Bataev
0e1a9e3084 [SLP]Fix PR74607: Fix dependency between buildvector nodes with user
nodes, having same last instruction.

If the user nodes has the same last-instruction, used as insert points
for the buildvector nodes, finding the proper dependency is crucial.
  Before, it depended on the indices of the buildvectors themselves but
  looks like it should depend on indices of the user nodes, because it
  identifies the vectorization order and, thus, properly aligns
  buildvector nodes in terms of def-use chain.
2023-12-06 10:15:01 -08:00
Paschalis Mpeis
7b83f69db4
[NFC] Replace CallInst with FunctionType in VFABI, VFShape API (#74569)
Minor simplification applied to VFShape::getScalarShape,
VFShape::get, and VFABI::tryDemangleForVFABI methods.

Also, remove unnecessary `static_cast` in `SLPVectorizer.cpp`
2023-12-06 17:14:58 +00:00
Alexey Bataev
279b1ea65f [SLP]Improve gathering of the scalars used in the graph.
Currently we emit gathers for scalars being vectorized in the tree as
a pair of extractelement/insertelement instructions. Instead we can try
to find all required vectors and emit shuffle vector instructions
directly, improving the code and reducing compile time.

Part of non-power-of-2 vectorization.

Differential Revision: https://reviews.llvm.org/D110978
2023-12-01 11:23:57 -08:00
Alexey Bataev
ba52310657
[SLP][NFC] Unify code for cost estimation/codegen for buildvector, NFC. (#73182)
This just moves towards reusing same function for both cost
estimation/codegen for buildvector.
2023-11-30 10:04:57 -05:00
Alexey Bataev
1f88e62db4 [SLP]Fix/improve minbitwidth mapping to use TreeEntry as a key.
Currently, MinBWs map uses Value* as a key and stores mapping for each
value to be demoted. It make is it hard to get the actual MinBWs value
for the buildvector scalars(constants), since same constant might be
  used in different nodes with the different MinBWs values/decisions.
Also, it consumes extra memory for the vectorized values/instructions
 from the same nodes.
Better to map actual nodes. It fixes the bitwidth data fetching for
buildvector scalars and improves memory consumption/analysis time for
other instructions.
2023-11-30 06:33:31 -08:00
Alexey Bataev
447da954c7 [SLP][NFC]Use DenseSet instead of SetVector, NFC.
For CSEBlocks we can safely use DenseSet, the order should not be
preserved for this container.
2023-11-28 11:27:49 -08:00
Alexey Bataev
badec9b7bf [SLP][NFC]Fix loops variables names, NFC. 2023-11-28 10:30:19 -08:00
Alexey Bataev
c72884225b [SLP][NFC]Fix naming of variables/functions, NFC. 2023-11-28 09:15:38 -08:00
Alexey Bataev
b6eb740cae [SLP][NFC]Improve/fix auto declarations, NFC. 2023-11-28 07:39:21 -08:00
Alexey Bataev
45139ab6ca [SLP][NFC]Improve aliasing support in SLP, NFC.
No need to store optional boolean in the map, enough to store boolean
directly. Also, we can do preliminary check for instruction and if they
are not simple, mark as aliased without storing this result in the map.
2023-11-28 07:24:44 -08:00
Alexey Bataev
e9fdb965f9 [SLP][NFC]Make collectValuesToDemote member of BoUpSLP to avoid using
Expr container, NFC.

Saves the memory and may improve compile time.
2023-11-24 08:05:19 -08:00