152 Commits

Author SHA1 Message Date
Matt Devereau
ee4d6c8bf0 [VectorCombine] Enable scalarizeBinopOrCmp for scalable vectors
This reverts a change to exclude scalarizeBinopOrCmp in VectorCombine for
scalable vectors which caused poor scalable Binop codegen.

Differential Revision: https://reviews.llvm.org/D138545
2022-11-23 13:17:21 +00:00
Sanjay Patel
b57819e130 [VectorCombine] widen a load with subvector insert
This adapts/copies code from the existing fold that allows
widening of load scalar+insert. It can help in IR because
it removes a shuffle, and the backend can already narrow
loads if that is profitable in codegen.

We might be able to consolidate more of the logic, but
handling this basic pattern should be enough to make a small
difference on one of the motivating examples from issue #17113.
The final goal of combining loads on those patterns is not
solved though.

Differential Revision: https://reviews.llvm.org/D137341
2022-11-10 14:11:32 -05:00
Sanjay Patel
6703d2ecf9 [VectorCombine] add test with addrspacecast; NFC
D137341
2022-11-08 08:28:56 -05:00
Sanjay Patel
b62c81b836 [VectorCombine] add test with non-canonical shuffle mask; NFC
D137341
2022-11-07 12:07:37 -05:00
Matt Devereau
a8c24d57b8 [InstCombine] Remove redundant splats in InstCombineVectorOps
Splatting the first vector element of the result of a BinOp, where any of the
BinOp's operands are the result of a first vector element splat can be simplified to
splatting the first vector element of the result of the BinOp

Differential Revision: https://reviews.llvm.org/D135876
2022-11-07 15:39:05 +00:00
Peter Waller
e1790c8c29 Revert "[InstCombine] Remove redundant splats in InstCombineVectorOps"
This reverts commit 957eed0b1af2cb88edafe1ff2643a38165c67a40.
2022-11-03 07:56:03 +00:00
Sanjay Patel
b7d7e96006 [VectorCombine] add tests for load+shuffle and update to typeless ptr; NFC 2022-11-02 16:45:46 -04:00
Matt Devereau
957eed0b1a [InstCombine] Remove redundant splats in InstCombineVectorOps
Splatting the first vector element of the result of a BinOp, where any of the
BinOp's operands are the result of a first vector element splat can be simplified to
splatting the first vector element of the result of the BinOp

Differential Revision: https://reviews.llvm.org/D135876
2022-11-02 11:57:05 +00:00
Sanjay Patel
8d76fbb5f0 [VectorCombine] fix crashing on match of non-canonical fneg
We can't assume that operand 0 is the negated operand because
the matcher handles "fsub -0.0, X" (and also +0.0 with FMF).

By capturing the extract within the match, we avoid the bug
and make the transform more robust (can't assume that this
pass will only see canonical IR).
2022-10-17 10:47:48 -04:00
Matt Devereau
be0d427a14 [VectorCombine] Add insertelement-shufflevector VectorCombine tests
This is a precommit which adds some tests to show the functionality of an
upcoming VectorCombine optimization
2022-10-13 14:10:06 +00:00
Sanjay Patel
baab4aa1ba [VectorCombine] convert scalar fneg with insert/extract to vector fneg
insertelt DestVec, (fneg (extractelt SrcVec, Index)), Index --> shuffle DestVec, (fneg SrcVec), Mask

This is a specialized form of what could be a more general fold for a binop.
It's also possible that fneg is overlooked by SLP in this kind of
insert/extract pattern since it's a unary op.

This shows up in the motivating example from #issue 58139, but it won't solve
it (that probably requires some x86-specific backend changes). There are also
some small enhancements (see TODO comments) that can be done as follow-up
patches.

Differential Revision: https://reviews.llvm.org/D135278
2022-10-10 14:59:56 -04:00
Sanjay Patel
6ace81db3a [VectorCombine] add test with out-of-bounds insert/extract index; NFC
D135278
2022-10-10 14:59:56 -04:00
Sanjay Patel
563545685f [VectorCombine] remove unused test prefixes; NFC
These would change with D135278, but for now everything is the same.
2022-10-06 09:41:46 -04:00
Sanjay Patel
b598c2c24a [VectorCombine] add tests for scalar fneg with insert/extract; NFC 2022-10-06 08:57:55 -04:00
David Green
4b7913c357 [VectorCombine] Only consider shuffle uses with the same type.
The backend getShuffleCosts do not currently handle shuffles that change
size very well. Limit the shuffles we collect to the same type to make
sure they do not cause issues as reported in D128732.
2022-07-16 13:23:39 +01:00
Sander de Smalen
519d7876cb [VectorCombine] Avoid creating shuffle for extract-extract pattern on scalable vector.
This addresses https://github.com/llvm/llvm-project/issues/56377

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D129136
2022-07-07 08:37:04 +00:00
Sander de Smalen
f45a3f7f9b [VectorCombine] NFC: rename test extract-cmp-binop.ll to extract-scalable.ll 2022-07-07 08:37:04 +00:00
David Green
5493f8fc59 [VectorCombine] Improve shuffle select shuffle-of-shuffles
This in an extension to the code added in D123911 which added vector
combine folding of shuffle-select patterns, attempting to reduce the
total amount of shuffling required in patterns like:
  %x = shuffle %i1, %i2
  %y = shuffle %i1, %i2
  %a = binop %x, %y
  %b = binop %x, %y
  shuffle %a, %b, selectmask

This patch extends the handing of shuffles that are dependent on one
another, which can arise from the SLP vectorizer, as-in:
  %x = shuffle %i1, %i2
  %y = shuffle %x

The input shuffles can also be emitted, in which case they are treated
like identity shuffles. This patch also attempts to calculate a better
ordering of input shuffles, which can help getting lower cost input
shuffles, pushing complex shuffles further down the tree.

This is a recommit with some additional checks for supported forms and
out-of-bounds mask elements, with some extra tests.

Differential Revision: https://reviews.llvm.org/D128732
2022-07-05 17:16:18 +01:00
Nikita Popov
b69c75d53f Revert "[VectorCombine] Improve shuffle select shuffle-of-shuffles"
This reverts commit 19a1e20b8a0f69da2a871eae6cbd03d1314ee02d.

Clang crashes while linking bullet from llvm-test-suite in
ReleaseLTO-g cmake configuration.
2022-07-05 09:31:20 +02:00
David Green
19a1e20b8a [VectorCombine] Improve shuffle select shuffle-of-shuffles
This in an extension to the code added in D123911 which added vector
combine folding of shuffle-select patterns, attempting to reduce the
total amount of shuffling required in patterns like:
  %x = shuffle %i1, %i2
  %y = shuffle %i1, %i2
  %a = binop %x, %y
  %b = binop %x, %y
  shuffle %a, %b, selectmask

This patch extends the handing of shuffles that are dependent on one
another, which can arise from the SLP vectorizer, as-in:
  %x = shuffle %i1, %i2
  %y = shuffle %x

The input shuffles can also be emitted, in which case they are treated
like identity shuffles. This patch also attempts to calculate a better
ordering of input shuffles, which can help getting lower cost input
shuffles, pushing complex shuffles further down the tree.

Differential Revision: https://reviews.llvm.org/D128732
2022-07-04 13:38:43 +01:00
Nikita Popov
03aceab08b [ValueTracking] Enable -branch-on-poison-as-ub by default
Now that SimpleLoopUnswitch and other transforms no longer introduce
branch on poison, enable the -branch-on-poison-as-ub option by
default. The practical impact of this is mostly better flag
preservation in SCEV, and some freeze instructions no longer being
necessary.

Differential Revision: https://reviews.llvm.org/D125299
2022-06-01 10:46:06 +02:00
David Green
4c6a070a2c [AArch64] Teach perfect shuffles tables about D-lane movs
Similar to D123386, this adds D-Movs to the AArch64 perfect shuffle
tables, slightly lowering the costs a little more. This is a rough
improvement in general, especially if you ignore mov v0.16b, v2.16b type
moves that are often artefacts of the calling convention.

The D register movs are encoded as (0x4 | LaneIdx), and to generate a D
register move we are required to bitcast into a higher type, but it is
otherwise very similar to the S-lane mov's already supported.

Differential Revision: https://reviews.llvm.org/D125477
2022-05-17 18:16:45 +01:00
David Green
6f9e1ea0ef [VectorCombine] Attempt to fold select shuffles from reductions
Given a commutative reduction leading from a shuffle, the order of the
lanes on the shuffle are not important for the result. This means we can
reorder the shuffle to something simpler, which we try shuffling the
first vector lanes first. This was D123494.

The new shuffle may not be profitable though, and if it is not we can
try the folding of select shuffles from D123911. This, with some
adjustment as the output lane ordering is now unimportant, can allow the
final shuffle to simplify given the inputs to the patterns from D123911.
Where as each transformation on their own are not profitable, the
combination is.

We can only support a single shuffle when called from reductions, but we
are able to sort the ReconstructMask, potentially allowing it to
simplify to an identity or concat mask.

Differential Revision: https://reviews.llvm.org/D125086
2022-05-08 10:32:41 +01:00
David Green
100cb9a2ba [VectorCombine] Fold shuffle select pattern
This patch adds a combine to attempt to reduce the costs of certain
select-shuffle patterns. The form of code it attempts to detect is:
  %x = shuffle ...
  %y = shuffle ...
  %a = binop %x, %y
  %b = binop %x, %y
  shuffle %a, %b, selectmask

A classic select-mask will pick items from each lane of a or b. These
do not always have a great lowering on many architectures. This patch
attempts to pack a and b into the lower elements, creating a differently
ordered shuffle for reconstructing the orignal which may be better than
the select mask. This can be better for performance, especially if less
elements of a and b need to be computed and the input shuffles are
cheaper.

Because select-masks are just one form of shuffle, we generalize to any
mask. So long as the backend has decent costmodel for the shuffles, this
can generally improve things when they come up. For more basic cost
models the folds do not appear to be profitable, not getting past the
cost checks.

Differential Revision: https://reviews.llvm.org/D123911
2022-05-06 08:13:18 +01:00
David Green
7aadfc5099 [VectorCombine] Add tests for shuffle binops patterns. NFC 2022-05-04 15:07:47 +01:00
David Green
ded8187e35 [VectorCombine] Try to reduce shuffle cost for commutative reduction operands
Given a shuffle feeding a commutative reduction, the lane ordering of
the shuffle will not alter the result. This is also true if there are a
number of operations between the reduction and the shuffle, providing
they only operate lane-wise. This patch searches for cases like that in
Vector Combine, allowing us to check the cost of the shuffle vs an
in-order identity shuffle and replace the order if possible. This only
handles a single shuffle at the moment to keep things simple, and is
able to ignore splats that produce results where every result is the
same.

This is a more powerful version of a combine that already happens in
instrcombine, capable of optimizing more cases by looking through more
instructions and being able to cost the shuffle.

Differential Revision: https://reviews.llvm.org/D123494
2022-04-28 19:46:12 +01:00
David Green
f9f2276399 [VecCombine] Add tests for removing shuffles from reductions. NFC 2022-04-28 15:06:24 +01:00
Fraser Cormack
2e44b7872b [VectorCombine] Insert addrspacecast when crossing address space boundaries
We can not bitcast pointers across different address spaces. This was
previously fixed in D89577 but then in D93229 an enhancement was added
which peeks further through the ponter operand, opening up the
possibility that address-space violations could be introduced.

Instead of bailing as the previous fix did, simply insert an
addrspacecast cast instruction.

Reviewed By: lebedev.ri

Differential Revision: https://reviews.llvm.org/D121787
2022-03-24 19:08:08 +00:00
Sanjay Patel
66d22b4da4 [VectorCombine] fold shuffle-of-binops with common operand
shuf (bo X, Y), (bo X, W) --> bo (shuf X), (shuf Y, W)

This is motivated by an example in D111800
(although that patch avoids the problem for that particular example).

The pattern is shown in reduced form with:
https://llvm.org/PR52178
https://alive2.llvm.org/ce/z/d8zB4D

There is no difference on the PhaseOrdering test from D111800
because the aarch64 cost model says that the shuffle cost is 3 while
the fadd cost is 2.

Differential Revision: https://reviews.llvm.org/D111901
2021-10-21 12:37:54 -04:00
Bjorn Pettersson
5e4dbd7a2f [NewPM][test] Use -passes syntax in VectorCombine lit tests
The legacy PM is deprecated, so use the new PM syntax in lit tests
running the vector-combine pass.
2021-10-20 15:16:17 +02:00
Sanjay Patel
8cd9c351a1 [VectorCombine] add tests for shuffle of binops; NFC 2021-10-15 11:22:06 -04:00
Roman Lebedev
690da88a95
Workaround broken FileCheck default yet another time 2021-10-08 01:24:56 +03:00
Roman Lebedev
6526fa3589
[NFC][VectorCombine] Add baseline test coverage for GEP scalarization 2021-10-08 01:16:08 +03:00
Simon Pilgrim
0dcd2b40e6 [TTI] Remove default condition type and predicate arguments from getCmpSelInstrCost
We need to be better at exposing the comparison predicate to getCmpSelInstrCost calls as some targets (e.g. X86 SSE) have very different costs for different comparisons (PR48337), and we can't always rely on the optional Instruction argument.

This initial commit requires explicit condition type and predicate arguments. The next step will be to review a lot of the existing getCmpSelInstrCost calls which have used BAD_ICMP_PREDICATE even when the predicate is known.

Differential Revision: https://reviews.llvm.org/D111024
2021-10-06 15:40:35 +01:00
Simon Pilgrim
0776924a17 [CostModel][X86] getCmpSelInstrCost - treat BAD_PREDICATEs the same as the worst case cost predicates for ICMP/FCMP instructions
As suggested on D111024, we should treat getCmpSelInstrCost calls without a specific predicate as matching the worst case predicate cost.

These regressions will be addressed with a mixture of D111024 and fixing other specific getCmpSelInstrCost calls to have realistic predicates.
2021-10-06 10:14:56 +01:00
Florian Hahn
e2f6290e06
[VectorCombine] Discard ScalarizationResult state in early exit.
ScalarizationResult's destructor makes sure ToFreeze is not ignored if
set. Currently, scalarizeLoadExtract has an early exit if the index is
not safe directly. But when it is SafeWithFreeze, we need to discard the
state first, otherwise we hit the assert in the destructor.

Fixes PR51992.
2021-09-28 12:52:16 +01:00
Florian Hahn
300870a95c
[VectorCombine] Switch to using a worklist.
This patch updates VectorCombine to use a worklist to allow iterative
simplifications where a combine enables other combines.

Suggested in D100302.

The main use case at the moment is foldSingleElementStore and
scalarizeLoadExtract working together to improve scalarization.

Note that we now also do not run SimplifyInstructionsInBlock on the
whole function if there have been changes. This means we fail to
remove/simplify instructions not related to any of the vector combines.
IMO this is fine, as simplifying the whole function seems more like a
workaround for not tracking the changed instructions.

Compile-time impact looks neutral:
NewPM-O3: +0.02%
NewPM-ReleaseThinLTO: -0.00%
NewPM-ReleaseLTO-g: -0.02%

http://llvm-compile-time-tracker.com/compare.php?from=52832cd917af00e2b9c6a9d1476ba79754dcabff&to=e66520a4637290550a945d528e3e59573485dd40&stat=instructions

Reviewed By: spatel, lebedev.ri

Differential Revision: https://reviews.llvm.org/D110171
2021-09-22 09:54:58 +01:00
Florian Hahn
5131037ea9
[ValueTracking,VectorCombine] Allow passing DT to computeConstantRange.
isValidAssumeForContext can provide better results with access to the
dominator tree in some cases. This patch adjusts computeConstantRange to
allow passing through a dominator tree.

The use VectorCombine is updated to pass through the DT to enable
additional scalarization.

Note that similar APIs like computeKnownBits already accept optional dominator
tree arguments.

Reviewed By: lebedev.ri

Differential Revision: https://reviews.llvm.org/D110175
2021-09-21 16:54:47 +01:00
Florian Hahn
ea27dd7497
[VectorCombine] Add tests which require DT to use info from assumes. 2021-09-21 13:07:06 +01:00
Florian Hahn
c24fc37e47
[VectorCombine] Support AND/UREM indices that require freezing.
38b098be6605 limited scalarization to indices that are known non-poison.
For certain patterns that restrict the range of an index, we can insert
a freeze of the original value, to prevent propagation of poison.

Reviewed By: lebedev.ri

Differential Revision: https://reviews.llvm.org/D107580
2021-09-13 11:21:45 +01:00
Florian Hahn
38b098be66
[VectorCombine] Limit scalarization known non-poison indices.
We can only trust the range of the index if it is guaranteed
non-poison.

Fixes PR50949.

Reviewed By: lebedev.ri

Differential Revision: https://reviews.llvm.org/D107364
2021-08-05 15:36:31 +01:00
Florian Hahn
8d2a8ced00
[VectorCombine] Add additional tests with freeze combinations.
Suggested in D107364.
2021-08-05 14:08:07 +01:00
Florian Hahn
ccf1038a92
[VectorCombine] Add tests where the index is guaranteed non-poison.
Tests for PR50949.
2021-08-03 16:42:30 +01:00
Simon Pilgrim
f10d4cfc23 [VectorCombine] Fix PR30986 poison test case
Thanks @xbolva00!
2021-08-02 14:16:32 +01:00
Simon Pilgrim
7942e20fc8 [VectorCombine] Add PR30986 test case 2021-08-02 13:05:47 +01:00
Eli Friedman
bdd55b2f18 Fix the default alignment of i1 vectors.
Currently, the default alignment is much larger than the actual size of
the vector in memory.  Fix this to use a sane default.

For SVE, temporarily remove lowering of load/store operations for
predicates with less than 16 elements. The layout the backend was
assuming for SVE predicates with less than 16 elements doesn't agree
with the frontend. More work probably needs to be done here.

This change is, strictly speaking, not backwards-compatible at the
bitcode level. But probably nobody is actually depending on that; i1
vectors in memory are rare, and the code that does use them probably
ends up forcing the alignment to something sane anyway.  If we think
this is a concern, I can restrict this to scalable vectors for now
(where it's actually causing issues for me at the moment).

Differential Revision: https://reviews.llvm.org/D88994
2021-07-31 14:09:59 -07:00
Roman Lebedev
48e9602c40
[NFC][VectorCombine] Load widening: add a few more negative tests 2021-07-21 15:21:37 +03:00
Roman Lebedev
a0217bda38
[NFC][VectorCombine] Add tests for widening of partial vector load 2021-07-21 00:24:47 +03:00
Philip Reames
e75a2dfe20 [tests] Stablize tests for possible change in deref semantics
There's a potential change in dereferenceability attribute semantics in the nearish future.  See llvm-dev thread "RFC: Decomposing deref(N) into deref(N) + nofree" and D99100 for context.

This change simply adds appropriate attributes to tests to keep transform logic exercised under both old and new/proposed semantics.  Note that for many of these cases, O3 would infer exactly these attributes on the test IR.

This change handles the idiomatic pattern of a dereferenceable object being passed to a call which can not free that memory.  There's a couple other tests which need more one-off attention, they'll be handled in another change.
2021-07-14 13:05:43 -07:00
Florian Hahn
96ca03493a
[VectorCombine] Limit scalarization to non-poison indices for now.
As Eli mentioned post-commit in D103378, the result of the freeze may
still be out-of-range according to Alive2. So for now, just limit the
transform to indices that are non-poison.
2021-06-14 16:40:14 +01:00