125 Commits

Author SHA1 Message Date
Simon Pilgrim
212b2bbcd1
[VectorCombine][X86] foldShuffleOfCastops - fold shuffle(cast(x),cast(y)) -> cast(shuffle(x,y)) iff cost efficient (#87510)
Based off the existing foldShuffleOfBinops fold

Fixes #67803
2024-04-04 11:22:37 +01:00
Simon Pilgrim
d53b8291bf [VectorCombine][X86] shuffle-of-casts.ll - adjust zext nneg tests to improve costs for testing
Improves SSE vs AVX test results for #87510
2024-04-03 22:27:14 +01:00
Simon Pilgrim
b15d27e249 [VectorCombine][X86] Add additional tests for #87510
Add zext nneg tests and check we don't fold casts with different src types
2024-04-03 19:29:15 +01:00
Simon Pilgrim
4d8a3f5b35 [VectorCombine][X86] Add some tests showing failure to fold shuffle(cast(x),cast(y)) -> cast(shuffle(x,y))
Part of #67803
2024-04-03 16:34:52 +01:00
Simon Pilgrim
1d06f41b72
[VectorCombine] foldBitcastShuffle - peek through any residual bitcasts before creating a new bitcast on top (#86119)
Encountered while working on #67803, wading through the chains of bitcasts that SSE intrinsics introduces - this patch helps prevents cases where the bitcast chains aren't cleared out and we can't perform further combines until after InstCombine/InstSimplify has run.
2024-04-02 10:58:45 +01:00
Simon Pilgrim
769c22f25b
[VectorCombine] Fold reduce(trunc(x)) -> trunc(reduce(x)) iff cost effective (#81852)
Vector truncations can be pretty expensive, especially on X86, whilst scalar truncations are often free.

If the cost of performing the add/mul/and/or/xor reduction is cheap enough on the pre-truncated type, then avoid the vector truncation entirely.

Fixes https://github.com/llvm/llvm-project/issues/81469
2024-02-19 11:32:23 +00:00
Nikita Popov
2d69827c5c [Transforms] Convert tests to opaque pointers (NFC) 2024-02-05 11:57:34 +01:00
Alex Richardson
e39f6c1844 [opt] Infer DataLayout from triple if not specified
There are many tests that specify a target triple/CPU flags but no
DataLayout which can lead to IR being generated that has unusual
behaviour. This commit attempts to use the default DataLayout based
on the relevant flags if there is no explicit override on the command
line or in the IR file.

One thing that is not currently possible to differentiate from a missing
datalayout `target datalayout = ""` in the IR file since the current
APIs don't allow detecting this case. If it is considered useful to
support this case (instead of passing "-data-layout=" on the command
line), I can change IR parsers to track whether they have seen such a
directive and change the callback type.

Differential Revision: https://reviews.llvm.org/D141060
2023-10-26 12:07:37 -07:00
Nabeel Omer
8e31acf8ca
[VectorCombine] Add special handling for truncating shuffles (#70013)
When dealing with a truncating shuffle, we can end up in a situation
where the type passed to getShuffleCost is the type of the result of the
shuffle, and the mask references an element which is out of bounds of
the result vector.

If dealing with truncating shuffles, pass the type of the input vectors
to `getShuffleCost()` in order to avoid an out-of-bounds assertion.
2023-10-24 15:03:43 +01:00
Alexey Bataev
c2ae16f6a7 [VectorCombine]Fix a crash during long vector analysis.
If the analysis of the single vector requested, need to use original
type to avoid crash
2023-10-09 14:22:37 -07:00
Simon Pilgrim
a16f6462d7 [TTI] improveShuffleKindFromMask - detect SK_ExtractSubvector patterns from SK_PermuteSingleSrc 2023-10-06 11:59:51 +01:00
Simon Pilgrim
94795a37e8 [VectorCombine] foldBitcastShuf - add support for length changing shuffles
Allow length changing shuffle masks in the "bitcast (shuf V, MaskC) --> shuf (bitcast V), MaskC'" fold.

It also exposes some poor shuffle mask detection for extract/insert subvector cases inside improveShuffleKindFromMask

First stage towards addressing Issue #67803
2023-10-06 11:59:51 +01:00
Simon Pilgrim
3bae69ec8c [VectorCombine][X86] Add additional length changing foldBitcastShuf tests
Made these TODO instead of negative
2023-10-06 11:59:50 +01:00
Nuno Lopes
d75fb17963 [VectorCombine] Use poison insteaf of undef as placeholder [NFC]
These vector lanes are never accessed. They are used for shifting a value into the right lane
and therefore only 1 value of the whole vector is actually used
2023-07-19 10:29:08 +01:00
Tobias Hieta
f84bac329b
[NFC][Py Reformat] Reformat lit.local.cfg python files in llvm
This is a follow-up to b71edfaa4ec3c998aadb35255ce2f60bba2940b0
since I forgot the lit.local.cfg files in that one.

Reformatting is done with `black`.

If you end up having problems merging this commit because you
have made changes to a python file, the best way to handle that
is to run git checkout --ours <yourfile> and then reformat it
with black.

If you run into any problems, post to discourse about it and
we will try to help.

RFC Thread below:

https://discourse.llvm.org/t/rfc-document-and-standardize-python-code-style

Reviewed By: barannikov88, kwk

Differential Revision: https://reviews.llvm.org/D150762
2023-05-17 17:03:15 +02:00
ManuelJBrito
8b56da5e9f [IR] Change shufflevector undef mask to poison
With this patch an undefined mask in a shufflevector will be printed as poison.
This change is done to support the new shufflevector semantics
for undefined mask elements.

Differential Revision: https://reviews.llvm.org/D149210
2023-04-27 14:41:10 +01:00
Simon Pilgrim
4060042384 [CostModel][X86] Improve i8 and vXi8 MUL costs
We were treating vXi8 multiply as the sum of a trunc(mul(extend(),extend())) which diverged from the costs from llvm-mcaonce we extended beyond legal types

Use a modified version of the D103695 script to determine more accurate throughput/latency/codesize/size-latency cost estimates

Helps address some of the regressions identified in D148806
2023-04-20 19:38:51 +01:00
Alexey Bataev
0e1312fbe0 [SLP][X86]Fix the cost of reused gathers/buildvectors and floats insert.
There are 2 problems in the cost estimation for buildvector/gather.
1. If the buildvector/gather node is the same as another one node, need
   to estimate the cost of this node as 0.
2. The cost of inserting float point register to non-poison vector is
   not 0, it should not be considered free.

Differential Revision: https://reviews.llvm.org/D148801
2023-04-20 09:34:46 -07:00
Sanjay Patel
af39acda88 [VectorCombine] fix insertion point of shuffles
As shown in issue #60649, the new shuffles were
being inserted before a phi, and that is invalid.

It seems like most test coverage for this fold
(foldSelectShuffle) lives in the AArch64 dir,
but this doesn't repro there for a base target.
2023-02-10 10:57:11 -05:00
Nikita Popov
c00ffbe02b [VectorCombine] Convert tests to opaque pointers (NFC) 2022-12-23 10:04:26 +01:00
Sanjay Patel
b57819e130 [VectorCombine] widen a load with subvector insert
This adapts/copies code from the existing fold that allows
widening of load scalar+insert. It can help in IR because
it removes a shuffle, and the backend can already narrow
loads if that is profitable in codegen.

We might be able to consolidate more of the logic, but
handling this basic pattern should be enough to make a small
difference on one of the motivating examples from issue #17113.
The final goal of combining loads on those patterns is not
solved though.

Differential Revision: https://reviews.llvm.org/D137341
2022-11-10 14:11:32 -05:00
Sanjay Patel
6703d2ecf9 [VectorCombine] add test with addrspacecast; NFC
D137341
2022-11-08 08:28:56 -05:00
Sanjay Patel
b62c81b836 [VectorCombine] add test with non-canonical shuffle mask; NFC
D137341
2022-11-07 12:07:37 -05:00
Sanjay Patel
b7d7e96006 [VectorCombine] add tests for load+shuffle and update to typeless ptr; NFC 2022-11-02 16:45:46 -04:00
Sanjay Patel
8d76fbb5f0 [VectorCombine] fix crashing on match of non-canonical fneg
We can't assume that operand 0 is the negated operand because
the matcher handles "fsub -0.0, X" (and also +0.0 with FMF).

By capturing the extract within the match, we avoid the bug
and make the transform more robust (can't assume that this
pass will only see canonical IR).
2022-10-17 10:47:48 -04:00
Sanjay Patel
baab4aa1ba [VectorCombine] convert scalar fneg with insert/extract to vector fneg
insertelt DestVec, (fneg (extractelt SrcVec, Index)), Index --> shuffle DestVec, (fneg SrcVec), Mask

This is a specialized form of what could be a more general fold for a binop.
It's also possible that fneg is overlooked by SLP in this kind of
insert/extract pattern since it's a unary op.

This shows up in the motivating example from #issue 58139, but it won't solve
it (that probably requires some x86-specific backend changes). There are also
some small enhancements (see TODO comments) that can be done as follow-up
patches.

Differential Revision: https://reviews.llvm.org/D135278
2022-10-10 14:59:56 -04:00
Sanjay Patel
6ace81db3a [VectorCombine] add test with out-of-bounds insert/extract index; NFC
D135278
2022-10-10 14:59:56 -04:00
Sanjay Patel
563545685f [VectorCombine] remove unused test prefixes; NFC
These would change with D135278, but for now everything is the same.
2022-10-06 09:41:46 -04:00
Sanjay Patel
b598c2c24a [VectorCombine] add tests for scalar fneg with insert/extract; NFC 2022-10-06 08:57:55 -04:00
Fraser Cormack
2e44b7872b [VectorCombine] Insert addrspacecast when crossing address space boundaries
We can not bitcast pointers across different address spaces. This was
previously fixed in D89577 but then in D93229 an enhancement was added
which peeks further through the ponter operand, opening up the
possibility that address-space violations could be introduced.

Instead of bailing as the previous fix did, simply insert an
addrspacecast cast instruction.

Reviewed By: lebedev.ri

Differential Revision: https://reviews.llvm.org/D121787
2022-03-24 19:08:08 +00:00
Sanjay Patel
66d22b4da4 [VectorCombine] fold shuffle-of-binops with common operand
shuf (bo X, Y), (bo X, W) --> bo (shuf X), (shuf Y, W)

This is motivated by an example in D111800
(although that patch avoids the problem for that particular example).

The pattern is shown in reduced form with:
https://llvm.org/PR52178
https://alive2.llvm.org/ce/z/d8zB4D

There is no difference on the PhaseOrdering test from D111800
because the aarch64 cost model says that the shuffle cost is 3 while
the fadd cost is 2.

Differential Revision: https://reviews.llvm.org/D111901
2021-10-21 12:37:54 -04:00
Bjorn Pettersson
5e4dbd7a2f [NewPM][test] Use -passes syntax in VectorCombine lit tests
The legacy PM is deprecated, so use the new PM syntax in lit tests
running the vector-combine pass.
2021-10-20 15:16:17 +02:00
Sanjay Patel
8cd9c351a1 [VectorCombine] add tests for shuffle of binops; NFC 2021-10-15 11:22:06 -04:00
Roman Lebedev
690da88a95
Workaround broken FileCheck default yet another time 2021-10-08 01:24:56 +03:00
Roman Lebedev
6526fa3589
[NFC][VectorCombine] Add baseline test coverage for GEP scalarization 2021-10-08 01:16:08 +03:00
Simon Pilgrim
0dcd2b40e6 [TTI] Remove default condition type and predicate arguments from getCmpSelInstrCost
We need to be better at exposing the comparison predicate to getCmpSelInstrCost calls as some targets (e.g. X86 SSE) have very different costs for different comparisons (PR48337), and we can't always rely on the optional Instruction argument.

This initial commit requires explicit condition type and predicate arguments. The next step will be to review a lot of the existing getCmpSelInstrCost calls which have used BAD_ICMP_PREDICATE even when the predicate is known.

Differential Revision: https://reviews.llvm.org/D111024
2021-10-06 15:40:35 +01:00
Simon Pilgrim
0776924a17 [CostModel][X86] getCmpSelInstrCost - treat BAD_PREDICATEs the same as the worst case cost predicates for ICMP/FCMP instructions
As suggested on D111024, we should treat getCmpSelInstrCost calls without a specific predicate as matching the worst case predicate cost.

These regressions will be addressed with a mixture of D111024 and fixing other specific getCmpSelInstrCost calls to have realistic predicates.
2021-10-06 10:14:56 +01:00
Florian Hahn
300870a95c
[VectorCombine] Switch to using a worklist.
This patch updates VectorCombine to use a worklist to allow iterative
simplifications where a combine enables other combines.

Suggested in D100302.

The main use case at the moment is foldSingleElementStore and
scalarizeLoadExtract working together to improve scalarization.

Note that we now also do not run SimplifyInstructionsInBlock on the
whole function if there have been changes. This means we fail to
remove/simplify instructions not related to any of the vector combines.
IMO this is fine, as simplifying the whole function seems more like a
workaround for not tracking the changed instructions.

Compile-time impact looks neutral:
NewPM-O3: +0.02%
NewPM-ReleaseThinLTO: -0.00%
NewPM-ReleaseLTO-g: -0.02%

http://llvm-compile-time-tracker.com/compare.php?from=52832cd917af00e2b9c6a9d1476ba79754dcabff&to=e66520a4637290550a945d528e3e59573485dd40&stat=instructions

Reviewed By: spatel, lebedev.ri

Differential Revision: https://reviews.llvm.org/D110171
2021-09-22 09:54:58 +01:00
Simon Pilgrim
f10d4cfc23 [VectorCombine] Fix PR30986 poison test case
Thanks @xbolva00!
2021-08-02 14:16:32 +01:00
Simon Pilgrim
7942e20fc8 [VectorCombine] Add PR30986 test case 2021-08-02 13:05:47 +01:00
Roman Lebedev
48e9602c40
[NFC][VectorCombine] Load widening: add a few more negative tests 2021-07-21 15:21:37 +03:00
Roman Lebedev
a0217bda38
[NFC][VectorCombine] Add tests for widening of partial vector load 2021-07-21 00:24:47 +03:00
Philip Reames
e75a2dfe20 [tests] Stablize tests for possible change in deref semantics
There's a potential change in dereferenceability attribute semantics in the nearish future.  See llvm-dev thread "RFC: Decomposing deref(N) into deref(N) + nofree" and D99100 for context.

This change simply adds appropriate attributes to tests to keep transform logic exercised under both old and new/proposed semantics.  Note that for many of these cases, O3 would infer exactly these attributes on the test IR.

This change handles the idiomatic pattern of a dereferenceable object being passed to a call which can not free that memory.  There's a couple other tests which need more one-off attention, they'll be handled in another change.
2021-07-14 13:05:43 -07:00
Kerry McLaughlin
5db52751a5 [CostModel] Return an invalid cost for memory ops with unsupported types
Fixes getTypeConversion to return `TypeScalarizeScalableVector` when a scalable vector
type cannot be legalized by widening/splitting. When this is the method of legalization
found, getTypeLegalizationCost will return an Invalid cost.

The getMemoryOpCost, getMaskedMemoryOpCost & getGatherScatterOpCost functions already call
getTypeLegalizationCost and will now also return an Invalid cost for unsupported types.

Reviewed By: sdesmalen, david-arm

Differential Revision: https://reviews.llvm.org/D102515
2021-06-08 12:07:36 +01:00
Simon Pilgrim
1ad4f887bd [CostModel][X86] Improve accuracy of vector non-uniform shift costs on XOP/AVX2 targets
By llvm-mca analysis, Haswell/Broadwell has a non-uniform vector shift recip-throughput cost of the AVX2 targets at 2 for both 128 and 256-bit vectors - XOP capable targets have better 128-bit vector shifts so improve the fallback in those cases.
2021-05-24 14:18:21 +01:00
Florian Hahn
4e8c28b6fb
Recommit "[VectorCombine] Scalarize vector load/extract."
This reverts commit 94d54155e2f38b56171811757044a3e6f643c14b.

This fixes a sanitizer failure by moving scalarizeLoadExtract(I)
before foldSingleElementStore(I), which may remove instructions.
2021-05-24 11:35:07 +01:00
Florian Hahn
94d54155e2
Revert "[VectorCombine] Scalarize vector load/extract."
This reverts commit 86497785d540e59eaca24bed4219ddec183cbc9b.

One of the tests causes an ASAN failure.
https://lab.llvm.org/buildbot/#/builders/5/builds/7927/steps/12/logs/stdio
2021-05-24 10:11:00 +01:00
Florian Hahn
86497785d5
[VectorCombine] Scalarize vector load/extract.
This patch adds a new combine that tries to scalarize chains of
`extractelement (load %ptr), %idx` to `load (gep %ptr, %idx)`. This is
profitable when extracting only a few elements out of a large vector.

At the moment, `store (extractelement (load %ptr), %idx), %ptr`
operations on large vectors result in huge code in the backend.

This can easily be triggered by using the matrix extension, e.g.
https://clang.godbolt.org/z/qsccPdPf4

This should complement D98240.

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D100273
2021-05-24 09:29:08 +01:00
Juneyoung Lee
395607af3c Reapply [ConstantFold] Fold more operations to poison
This was reverted to mitigate mitigate miscompiles caused by
the logical and/or to bitwise and/or fold. Reapply it now that
the underlying issue has been fixed by D101191.

-----

This patch folds more operations to poison.

Alive2 proof: https://alive2.llvm.org/ce/z/mxcb9G (it does not contain tests about div/rem because they fold to poison when raising UB)

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D92270
2021-05-13 16:04:12 +02:00
Juneyoung Lee
06829034ca Revert "[ConstantFold] Fold more operations to poison"
This reverts commit 53040a968dc2ff20931661e55f05da2ef8b964a0 due to its
bad interaction with select i1 -> and/or i1 transformation.

This fixes:
https://bugs.llvm.org/show_bug.cgi?id=49005
https://bugs.llvm.org/show_bug.cgi?id=48435
2021-02-04 00:24:02 +09:00