56 Commits

Author SHA1 Message Date
Nikita Popov
2d69827c5c [Transforms] Convert tests to opaque pointers (NFC) 2024-02-05 11:57:34 +01:00
Nikita Popov
6a06155c53 [VectorCombine] Discard ScalarizationResults if transform aborted
Fixes https://github.com/llvm/llvm-project/issues/69820.
2023-10-31 11:24:30 +01:00
Nikita Popov
3b82397965 [VectorCombine] Check for non-byte-sized element type
We should check whether the element type is non-byte-sized, not
the vector type. For types like <32 x i1> the whole type is
byte-sized, but the individual elements (that we scalarize to)
are not.

Fixes https://github.com/llvm/llvm-project/issues/67060.
2023-09-28 14:18:30 +02:00
Nikita Popov
95606a58c1 [VectorCombine] Add tests for #67060 (NFC) 2023-09-28 14:18:29 +02:00
Ben Shi
ea0ee55c02
[VectorCombine] Enable transform 'scalarizeLoadExtract' for non constant indexes (#65445)
Enable the transform if a non constant index is guaranteed to be safe
via a UREM/AND.
2023-09-26 09:41:53 +08:00
Ben Shi
068357d9b0
[VectorCombine] Enable transform 'scalarizeLoadExtract' for scalable vector types (#65443)
The transform 'scalarizeLoadExtract' can be applied to scalable
vector types if the index is less than the minimum number of elements.

The check whether the index is less than the minimum number of elements
locates at line 1175~1180. 'scalarizeLoadExtract' will call
'canScalarizeAccess' and check the returned result if this transform is safe.

At the beginning of the function 'canScalarizeAccess', the index will be
checked
1. If it is less than the number of elements of a fixed vector type.
2. If it is less than the minimum number of elements of a scalable vector type.

Otherwise 'canScalarizeAccess' will return unsafe and this transform
will be prevented.
2023-09-18 10:49:18 +08:00
Ben Shi
f3fbea2cac
[VectorCombine][test] Supplement tests of the load-extractelement sequence (#65442)
The newly added tests are all about scalable vector types.
2023-09-13 18:56:44 +08:00
Alexey Bataev
9a207578ac [TTI]Add InsertSubvector pattern in improveShuffleKindFromMask().
It improves shuffle instructions estimation and improves vectorization
outcome.

Differential Revision: https://reviews.llvm.org/D157425
2023-08-18 13:47:01 -07:00
David Green
2a859b2014 [AArch64] Change the cost of vector insert/extract to 2
The cost of vector instructions has always been high under AArch64, in order to
add a high cost for inserts/extracts, shuffles and scalarization. This is a
conservative approach to limit the scope of unusual SLP vectorization where the
codegen ends up being quite poor, but has always been higher than the correct
costs would be for any specific core.

This relaxes that, reducing the vector insert/extract cost from 3 to 2. It is a
generalization of D142359 to all AArch64 cpus. The ScalarizationOverhead is
also overridden for integer vector at the same time, to remove the effect of
lane 0 being considered free for integer vectors (something that should only be
true for float when scalarizing).

The lower insert/extract cost will reduce the cost of insert, extracts,
shuffling and scalarization. The adjustments of ScalaizationOverhead will
increase the cost on integer, especially for small vectors. The end result will
be lower cost for float and long-integer types, some higher cost for some
smaller vectors. This, along with the raw insert/extract cost being lower, will
generally mean more vectorization from the Loop and SLP vectorizer.

We may end up regretting this, as that vectorization is not always profitable.
In all the benchmarking I have done this is generally an improvement in the
overall performance, and I've attempted to address the places where it wasn't
with other costmodel adjustments.

Differential Revision: https://reviews.llvm.org/D155459
2023-07-28 21:26:50 +01:00
Fangrui Song
d39b4ce3ce [test] Replace aarch64-*-eabi with aarch64
Using "eabi" for aarch64 targets is a common mistake and warned by Clang Driver.
We want to avoid it elsewhere as well. Just use the common "aarch64" without
other triple components.
2023-06-27 20:02:52 -07:00
Tobias Hieta
f84bac329b
[NFC][Py Reformat] Reformat lit.local.cfg python files in llvm
This is a follow-up to b71edfaa4ec3c998aadb35255ce2f60bba2940b0
since I forgot the lit.local.cfg files in that one.

Reformatting is done with `black`.

If you end up having problems merging this commit because you
have made changes to a python file, the best way to handle that
is to run git checkout --ours <yourfile> and then reformat it
with black.

If you run into any problems, post to discourse about it and
we will try to help.

RFC Thread below:

https://discourse.llvm.org/t/rfc-document-and-standardize-python-code-style

Reviewed By: barannikov88, kwk

Differential Revision: https://reviews.llvm.org/D150762
2023-05-17 17:03:15 +02:00
ManuelJBrito
8b56da5e9f [IR] Change shufflevector undef mask to poison
With this patch an undefined mask in a shufflevector will be printed as poison.
This change is done to support the new shufflevector semantics
for undefined mask elements.

Differential Revision: https://reviews.llvm.org/D149210
2023-04-27 14:41:10 +01:00
Nikita Popov
c00ffbe02b [VectorCombine] Convert tests to opaque pointers (NFC) 2022-12-23 10:04:26 +01:00
Matt Devereau
ee4d6c8bf0 [VectorCombine] Enable scalarizeBinopOrCmp for scalable vectors
This reverts a change to exclude scalarizeBinopOrCmp in VectorCombine for
scalable vectors which caused poor scalable Binop codegen.

Differential Revision: https://reviews.llvm.org/D138545
2022-11-23 13:17:21 +00:00
Matt Devereau
a8c24d57b8 [InstCombine] Remove redundant splats in InstCombineVectorOps
Splatting the first vector element of the result of a BinOp, where any of the
BinOp's operands are the result of a first vector element splat can be simplified to
splatting the first vector element of the result of the BinOp

Differential Revision: https://reviews.llvm.org/D135876
2022-11-07 15:39:05 +00:00
Peter Waller
e1790c8c29 Revert "[InstCombine] Remove redundant splats in InstCombineVectorOps"
This reverts commit 957eed0b1af2cb88edafe1ff2643a38165c67a40.
2022-11-03 07:56:03 +00:00
Matt Devereau
957eed0b1a [InstCombine] Remove redundant splats in InstCombineVectorOps
Splatting the first vector element of the result of a BinOp, where any of the
BinOp's operands are the result of a first vector element splat can be simplified to
splatting the first vector element of the result of the BinOp

Differential Revision: https://reviews.llvm.org/D135876
2022-11-02 11:57:05 +00:00
Matt Devereau
be0d427a14 [VectorCombine] Add insertelement-shufflevector VectorCombine tests
This is a precommit which adds some tests to show the functionality of an
upcoming VectorCombine optimization
2022-10-13 14:10:06 +00:00
David Green
4b7913c357 [VectorCombine] Only consider shuffle uses with the same type.
The backend getShuffleCosts do not currently handle shuffles that change
size very well. Limit the shuffles we collect to the same type to make
sure they do not cause issues as reported in D128732.
2022-07-16 13:23:39 +01:00
Sander de Smalen
519d7876cb [VectorCombine] Avoid creating shuffle for extract-extract pattern on scalable vector.
This addresses https://github.com/llvm/llvm-project/issues/56377

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D129136
2022-07-07 08:37:04 +00:00
Sander de Smalen
f45a3f7f9b [VectorCombine] NFC: rename test extract-cmp-binop.ll to extract-scalable.ll 2022-07-07 08:37:04 +00:00
David Green
5493f8fc59 [VectorCombine] Improve shuffle select shuffle-of-shuffles
This in an extension to the code added in D123911 which added vector
combine folding of shuffle-select patterns, attempting to reduce the
total amount of shuffling required in patterns like:
  %x = shuffle %i1, %i2
  %y = shuffle %i1, %i2
  %a = binop %x, %y
  %b = binop %x, %y
  shuffle %a, %b, selectmask

This patch extends the handing of shuffles that are dependent on one
another, which can arise from the SLP vectorizer, as-in:
  %x = shuffle %i1, %i2
  %y = shuffle %x

The input shuffles can also be emitted, in which case they are treated
like identity shuffles. This patch also attempts to calculate a better
ordering of input shuffles, which can help getting lower cost input
shuffles, pushing complex shuffles further down the tree.

This is a recommit with some additional checks for supported forms and
out-of-bounds mask elements, with some extra tests.

Differential Revision: https://reviews.llvm.org/D128732
2022-07-05 17:16:18 +01:00
Nikita Popov
b69c75d53f Revert "[VectorCombine] Improve shuffle select shuffle-of-shuffles"
This reverts commit 19a1e20b8a0f69da2a871eae6cbd03d1314ee02d.

Clang crashes while linking bullet from llvm-test-suite in
ReleaseLTO-g cmake configuration.
2022-07-05 09:31:20 +02:00
David Green
19a1e20b8a [VectorCombine] Improve shuffle select shuffle-of-shuffles
This in an extension to the code added in D123911 which added vector
combine folding of shuffle-select patterns, attempting to reduce the
total amount of shuffling required in patterns like:
  %x = shuffle %i1, %i2
  %y = shuffle %i1, %i2
  %a = binop %x, %y
  %b = binop %x, %y
  shuffle %a, %b, selectmask

This patch extends the handing of shuffles that are dependent on one
another, which can arise from the SLP vectorizer, as-in:
  %x = shuffle %i1, %i2
  %y = shuffle %x

The input shuffles can also be emitted, in which case they are treated
like identity shuffles. This patch also attempts to calculate a better
ordering of input shuffles, which can help getting lower cost input
shuffles, pushing complex shuffles further down the tree.

Differential Revision: https://reviews.llvm.org/D128732
2022-07-04 13:38:43 +01:00
Nikita Popov
03aceab08b [ValueTracking] Enable -branch-on-poison-as-ub by default
Now that SimpleLoopUnswitch and other transforms no longer introduce
branch on poison, enable the -branch-on-poison-as-ub option by
default. The practical impact of this is mostly better flag
preservation in SCEV, and some freeze instructions no longer being
necessary.

Differential Revision: https://reviews.llvm.org/D125299
2022-06-01 10:46:06 +02:00
David Green
4c6a070a2c [AArch64] Teach perfect shuffles tables about D-lane movs
Similar to D123386, this adds D-Movs to the AArch64 perfect shuffle
tables, slightly lowering the costs a little more. This is a rough
improvement in general, especially if you ignore mov v0.16b, v2.16b type
moves that are often artefacts of the calling convention.

The D register movs are encoded as (0x4 | LaneIdx), and to generate a D
register move we are required to bitcast into a higher type, but it is
otherwise very similar to the S-lane mov's already supported.

Differential Revision: https://reviews.llvm.org/D125477
2022-05-17 18:16:45 +01:00
David Green
6f9e1ea0ef [VectorCombine] Attempt to fold select shuffles from reductions
Given a commutative reduction leading from a shuffle, the order of the
lanes on the shuffle are not important for the result. This means we can
reorder the shuffle to something simpler, which we try shuffling the
first vector lanes first. This was D123494.

The new shuffle may not be profitable though, and if it is not we can
try the folding of select shuffles from D123911. This, with some
adjustment as the output lane ordering is now unimportant, can allow the
final shuffle to simplify given the inputs to the patterns from D123911.
Where as each transformation on their own are not profitable, the
combination is.

We can only support a single shuffle when called from reductions, but we
are able to sort the ReconstructMask, potentially allowing it to
simplify to an identity or concat mask.

Differential Revision: https://reviews.llvm.org/D125086
2022-05-08 10:32:41 +01:00
David Green
100cb9a2ba [VectorCombine] Fold shuffle select pattern
This patch adds a combine to attempt to reduce the costs of certain
select-shuffle patterns. The form of code it attempts to detect is:
  %x = shuffle ...
  %y = shuffle ...
  %a = binop %x, %y
  %b = binop %x, %y
  shuffle %a, %b, selectmask

A classic select-mask will pick items from each lane of a or b. These
do not always have a great lowering on many architectures. This patch
attempts to pack a and b into the lower elements, creating a differently
ordered shuffle for reconstructing the orignal which may be better than
the select mask. This can be better for performance, especially if less
elements of a and b need to be computed and the input shuffles are
cheaper.

Because select-masks are just one form of shuffle, we generalize to any
mask. So long as the backend has decent costmodel for the shuffles, this
can generally improve things when they come up. For more basic cost
models the folds do not appear to be profitable, not getting past the
cost checks.

Differential Revision: https://reviews.llvm.org/D123911
2022-05-06 08:13:18 +01:00
David Green
7aadfc5099 [VectorCombine] Add tests for shuffle binops patterns. NFC 2022-05-04 15:07:47 +01:00
David Green
ded8187e35 [VectorCombine] Try to reduce shuffle cost for commutative reduction operands
Given a shuffle feeding a commutative reduction, the lane ordering of
the shuffle will not alter the result. This is also true if there are a
number of operations between the reduction and the shuffle, providing
they only operate lane-wise. This patch searches for cases like that in
Vector Combine, allowing us to check the cost of the shuffle vs an
in-order identity shuffle and replace the order if possible. This only
handles a single shuffle at the moment to keep things simple, and is
able to ignore splats that produce results where every result is the
same.

This is a more powerful version of a combine that already happens in
instrcombine, capable of optimizing more cases by looking through more
instructions and being able to cost the shuffle.

Differential Revision: https://reviews.llvm.org/D123494
2022-04-28 19:46:12 +01:00
David Green
f9f2276399 [VecCombine] Add tests for removing shuffles from reductions. NFC 2022-04-28 15:06:24 +01:00
Bjorn Pettersson
5e4dbd7a2f [NewPM][test] Use -passes syntax in VectorCombine lit tests
The legacy PM is deprecated, so use the new PM syntax in lit tests
running the vector-combine pass.
2021-10-20 15:16:17 +02:00
Florian Hahn
e2f6290e06
[VectorCombine] Discard ScalarizationResult state in early exit.
ScalarizationResult's destructor makes sure ToFreeze is not ignored if
set. Currently, scalarizeLoadExtract has an early exit if the index is
not safe directly. But when it is SafeWithFreeze, we need to discard the
state first, otherwise we hit the assert in the destructor.

Fixes PR51992.
2021-09-28 12:52:16 +01:00
Florian Hahn
300870a95c
[VectorCombine] Switch to using a worklist.
This patch updates VectorCombine to use a worklist to allow iterative
simplifications where a combine enables other combines.

Suggested in D100302.

The main use case at the moment is foldSingleElementStore and
scalarizeLoadExtract working together to improve scalarization.

Note that we now also do not run SimplifyInstructionsInBlock on the
whole function if there have been changes. This means we fail to
remove/simplify instructions not related to any of the vector combines.
IMO this is fine, as simplifying the whole function seems more like a
workaround for not tracking the changed instructions.

Compile-time impact looks neutral:
NewPM-O3: +0.02%
NewPM-ReleaseThinLTO: -0.00%
NewPM-ReleaseLTO-g: -0.02%

http://llvm-compile-time-tracker.com/compare.php?from=52832cd917af00e2b9c6a9d1476ba79754dcabff&to=e66520a4637290550a945d528e3e59573485dd40&stat=instructions

Reviewed By: spatel, lebedev.ri

Differential Revision: https://reviews.llvm.org/D110171
2021-09-22 09:54:58 +01:00
Florian Hahn
5131037ea9
[ValueTracking,VectorCombine] Allow passing DT to computeConstantRange.
isValidAssumeForContext can provide better results with access to the
dominator tree in some cases. This patch adjusts computeConstantRange to
allow passing through a dominator tree.

The use VectorCombine is updated to pass through the DT to enable
additional scalarization.

Note that similar APIs like computeKnownBits already accept optional dominator
tree arguments.

Reviewed By: lebedev.ri

Differential Revision: https://reviews.llvm.org/D110175
2021-09-21 16:54:47 +01:00
Florian Hahn
ea27dd7497
[VectorCombine] Add tests which require DT to use info from assumes. 2021-09-21 13:07:06 +01:00
Florian Hahn
96ca03493a
[VectorCombine] Limit scalarization to non-poison indices for now.
As Eli mentioned post-commit in D103378, the result of the freeze may
still be out-of-range according to Alive2. So for now, just limit the
transform to indices that are non-poison.
2021-06-14 16:40:14 +01:00
Roman Lebedev
20542b47d6
[VectorCombine] scalarizeLoadExtract(): use computeAlignmentAfterScalarization() helper
This results in slightly more optimistic alignments in some cases
2021-06-11 12:47:10 +03:00
Kerry McLaughlin
5db52751a5 [CostModel] Return an invalid cost for memory ops with unsupported types
Fixes getTypeConversion to return `TypeScalarizeScalableVector` when a scalable vector
type cannot be legalized by widening/splitting. When this is the method of legalization
found, getTypeLegalizationCost will return an Invalid cost.

The getMemoryOpCost, getMaskedMemoryOpCost & getGatherScatterOpCost functions already call
getTypeLegalizationCost and will now also return an Invalid cost for unsupported types.

Reviewed By: sdesmalen, david-arm

Differential Revision: https://reviews.llvm.org/D102515
2021-06-08 12:07:36 +01:00
Florian Hahn
d4c070d801
[VectorCombine] Freeze index unless it is known to be non-poison.
If the index itself is already poison, the poison propagates through
instructions clamping the index to a valid range. This still causes
introducing a load of poison, as flagged by Alive2 and pointed out
at 575e2aff5574.

This patch updates the code to freeze the index, unless it is proven to
not be poison.

Reviewed By: nlopes

Differential Revision: https://reviews.llvm.org/D103378
2021-06-01 10:40:57 +01:00
Florian Hahn
f000c4cfb6
[VectorCombine] Add tests with multiple noundef indices for scalarization. 2021-06-01 10:17:50 +01:00
Florian Hahn
829978744d
[VectorCombine] Add tests with noundef index for load scalarization. 2021-05-30 12:15:41 +01:00
Florian Hahn
007f268c35
[VectorCombine] Check indices for all extracts we scalarize.
We need to make sure that the indices of all extracts we scalarize are
valid.
2021-05-28 18:35:29 +01:00
Florian Hahn
f01df9805c
[VectorCombine] Add variants of multi-extract tests with assumes. 2021-05-28 18:35:24 +01:00
Florian Hahn
a92376d297
[VectorCombine] Add test that combines load & store scalarization. 2021-05-25 14:28:37 +01:00
Florian Hahn
575e2aff55
[VectorCombine] Use constant range info for index scalarization legality.
We can only scalarize memory accesses if we know the index is valid.

This patch adjusts canScalarizeAcceess to fall back to
computeConstantRange to check if the index is known to be valid.

Reviewed By: nlopes

Differential Revision: https://reviews.llvm.org/D102476
2021-05-25 13:58:42 +01:00
Florian Hahn
d251d6f812
[VectorCombine] Fix load extract scalarization tests with assumes.
The input IR for @load_extract_idx_var_i64_known_valid_by_assume
and @load_extract_idx_var_i64_not_known_valid_by_assume_after_load
has been swapped.

This patch fixes the test so that @load_extract_idx_var_i64_known_valid_by_assume
has the assume before the load and the other test has it after.
2021-05-24 13:14:13 +01:00
Florian Hahn
4e8c28b6fb
Recommit "[VectorCombine] Scalarize vector load/extract."
This reverts commit 94d54155e2f38b56171811757044a3e6f643c14b.

This fixes a sanitizer failure by moving scalarizeLoadExtract(I)
before foldSingleElementStore(I), which may remove instructions.
2021-05-24 11:35:07 +01:00
Florian Hahn
94d54155e2
Revert "[VectorCombine] Scalarize vector load/extract."
This reverts commit 86497785d540e59eaca24bed4219ddec183cbc9b.

One of the tests causes an ASAN failure.
https://lab.llvm.org/buildbot/#/builders/5/builds/7927/steps/12/logs/stdio
2021-05-24 10:11:00 +01:00
Florian Hahn
86497785d5
[VectorCombine] Scalarize vector load/extract.
This patch adds a new combine that tries to scalarize chains of
`extractelement (load %ptr), %idx` to `load (gep %ptr, %idx)`. This is
profitable when extracting only a few elements out of a large vector.

At the moment, `store (extractelement (load %ptr), %idx), %ptr`
operations on large vectors result in huge code in the backend.

This can easily be triggered by using the matrix extension, e.g.
https://clang.godbolt.org/z/qsccPdPf4

This should complement D98240.

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D100273
2021-05-24 09:29:08 +01:00