1449 Commits

Author SHA1 Message Date
Alexey Bataev
8d933ea5ac [SLP][NFC]Use SmallDensetSet for lookup instead of ArrayRef, NFC. 2023-09-06 13:17:30 -07:00
Alexey Bataev
09b8bbd6e0 [SLP][NFC]Reorder indeces instead of real values, NFC.
May save some memory/compile time.
2023-09-05 08:48:52 -07:00
Mel Chen
26aed5b9a8 [VPlan][LoopUtils] Remove unused parameter TTI
This patch removes the member TTI from VPReductionRecipe, as the
generation of reduction operations no longer requires TTI.

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D158148
2023-09-04 05:30:37 -07:00
Kazu Hirata
6da470d7f8 [llvm] Use range-based for loops (NFC) 2023-09-02 09:32:45 -07:00
Fangrui Song
111fcb0df0 [llvm] Fix duplicate word typos. NFC
Those fixes were taken from https://reviews.llvm.org/D137338
2023-09-01 18:25:16 -07:00
Philip Reames
aada8f2e54 [slp] Tweak debug costing output to include VL
This makes it much easier to understand which vector length is being considered when the same set of nodes are evaluated at multiple vector lengths.
2023-08-30 09:13:19 -07:00
Alexey Bataev
66c623bfc6 [SLP][NFC]Use TreeEntry::getOprand instead of trying to rebuild it in getOperandInfo(), NFC. 2023-08-23 13:37:36 -07:00
Alexey Bataev
a9e6295548 [SLP][NFC]Use all_of/any_of instead of loops, NFC. 2023-08-22 08:21:36 -07:00
Alexey Bataev
b51195dece [SLP]Fix PR63854: Add proper sorting of pointers for masked stores.
If the masked gathers can be reordered, it may produce strided access
pattern and the reordering does not affect common reodering, better to
try to reorder masked gathers for better performance.

Differential Revision: https://reviews.llvm.org/D157009
2023-08-22 06:14:01 -07:00
Alexey Bataev
ca2eabdb52 [SLP][NFC]Improve code to meet coding standards, NFC. 2023-08-15 11:08:25 -07:00
Alexey Bataev
63c7815faf [SLP]Fix comparator for PHI nodes comparison.
Fixed comparator for PHI nodes sorting to meet the criteria for strict
weak ordering.
2023-08-14 14:05:39 -07:00
Alexey Bataev
4f0bd8f7ac [SLP]Fix strict weak ordering for Cmp instruction comparator.
Sorting algorithms require strict weak ordering for comparators, final
fix for cmp instructions comparator.
2023-08-14 09:37:46 -07:00
Alexey Bataev
2216507171 [SLP]Fix PR64568: Crash during horizontal reduction.
If the reduced values is constant-foldable and was folded to a constant
during previous transformations, need to excluded it from the list of
the reduced values-instructions as non-matchable.
2023-08-10 07:33:16 -07:00
Alexey Bataev
42b3925d42 [SLP][NFC]Fix formatting/warnings in tryToReduce(), NFC. 2023-08-10 06:42:50 -07:00
Valery N Dmitriev
f522be63bc [SLP][NFC] Make buildShuffleEntryMask routine a TreeEntry method.
The routine uses data stored at TreeEntry node for building a mask
so it is natural to make it a method for the type. That will simplify
its interface and reduces data transfer.
The method is added as buildAltOpShuffleMask.

Differential Revision: https://reviews.llvm.org/D157545
2023-08-09 13:43:03 -07:00
Alexey Bataev
c619222ea4 [SLP]Use common logic for cost estimation of the alternate vector nodes.
We can use buildShuffleEntryMask() to build the shuffle mask correctly
not only for the alternate nodes with reuses, but also for the nodes
without reused scalars. It allows better to estimate the cost of the
node and emit better code.

Differential Revision: https://reviews.llvm.org/D157413
2023-08-09 11:50:39 -07:00
Alexey Bataev
d0e3a571e7 [SLP]Fix PR64519: Unexpected reordering of gathers.
The issue is actually related to ScatterVectorize nodes. If such node
gets reordered during bottom-to-top reordering, it may have associated
non-empty ReorderIndices. In this case, such nodes need to be handled
the same way as regular Vectorize nodes, not NeedToGather nodes. In this
case we need to reorder ReorderIndices array rather than scalars.
2023-08-08 08:07:25 -07:00
Alexey Bataev
e894c3d1a9 [SLP]Improve stores vectorization.
Use O(nlogn) instead of O(N2) (N <= 32) sorting approach and do not try
to revectorize all possible combinations of stores, if they
definitely cannot be combined because of mem/data dependencies.
Compile time (O3 + lto, skylake_avx512):
External/SPEC/CINT2006/483.xalancbmk/483.xalancbmk.test 117.15       120.11     2.5%
External/SPEC/CINT2017speed/623.xalancbmk_s/623.xalancbmk_s.test 203.67       207.42     1.8%
External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 232.43       235.01     1.1%
External/SPEC/CINT2017rate/523.xalancbmk_r/523.xalancbmk_r.test 205.49       207.25     0.9%
External/SPEC/CFP2017rate/510.parest_r/510.parest_r.test 310.46       306.23    -1.4%

Link time (O3+lto, skylake_avx512):
External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 1383.69   1475.94    6.7%

Other changes are too small, cannot rely on them.

size..text
Program                                                                                                           size..text
                                                                                                                  results     results0    diff
                                               test-suite :: SingleSource/Regression/C/Regression-C-sumarray.test      392.00     1439.00 267.1%
                                                     test-suite :: MultiSource/Applications/JM/ldecod/ldecod.test   394258.00   394818.00   0.1%
                                                     test-suite :: MultiSource/Applications/JM/lencod/lencod.test   846355.00   847075.00   0.1%
                                                test-suite :: External/SPEC/CINT2006/464.h264ref/464.h264ref.test   782816.00   783360.00   0.1%
                                               test-suite :: External/SPEC/CFP2017rate/508.namd_r/508.namd_r.test   779667.00   779923.00   0.0%
                                                   test-suite :: MultiSource/Benchmarks/mafft/pairlocalalign.test   224398.00   224446.00   0.0%
                                                        test-suite :: MultiSource/Applications/oggenc/oggenc.test   185019.00   185035.00   0.0%
                                         test-suite :: External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 12487610.00 12488010.00   0.0%
                                                    test-suite :: MultiSource/Benchmarks/7zip/7zip-benchmark.test  1051772.00  1051804.00   0.0%
                                                          test-suite :: MultiSource/Applications/SPASS/SPASS.test   529586.00   529602.00   0.0%
                                            test-suite :: External/SPEC/CINT2006/400.perlbench/400.perlbench.test  1084684.00  1084716.00   0.0%
                                                  test-suite :: MultiSource/Benchmarks/tramp3d-v4/tramp3d-v4.test  1014245.00  1014261.00   0.0%

                                          test-suite :: MultiSource/Benchmarks/MallocBench/espresso/espresso.test   223494.00   223478.00  -0.0%
                                             test-suite :: External/SPEC/CINT2017speed/625.x264_s/625.x264_s.test   660843.00   660795.00  -0.0%
                                              test-suite :: External/SPEC/CINT2017rate/525.x264_r/525.x264_r.test   660843.00   660795.00  -0.0%
                                                      test-suite :: MultiSource/Applications/ClamAV/clamscan.test   568824.00   568760.00  -0.0%

espresso - 2 more stores vectorized
x264 - small number of changes in 3-4 functions, generated a bit more
vector stores (2 4x zeroinitializer stores + some other small variations).
clamscan - emitted 32xi8 store instead of several scalar stores + several 4x-8x stores.

Differential Revision: https://reviews.llvm.org/D155246
2023-08-07 09:17:56 -07:00
Alexey Bataev
48bcaeb997 Revert "[SLP]Improve stores vectorization."
This reverts commit 58066edbb05d66e6a7512a675da778475da3bdfb reported in https://lab.llvm.org/buildbot/#/builders/252/builds/3389
2023-08-04 07:37:26 -07:00
Alexey Bataev
58066edbb0 [SLP]Improve stores vectorization.
Use O(nlogn) instead of O(N2) (N <= 32) sorting approach and do not try
to revectorize all possible combinations of stores, if they
definitely cannot be combined because of mem/data dependencies.
Compile time (O3 + lto, skylake_avx512):
External/SPEC/CINT2006/483.xalancbmk/483.xalancbmk.test 117.15       120.11     2.5%
External/SPEC/CINT2017speed/623.xalancbmk_s/623.xalancbmk_s.test 203.67       207.42     1.8%
External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 232.43       235.01     1.1%
External/SPEC/CINT2017rate/523.xalancbmk_r/523.xalancbmk_r.test 205.49       207.25     0.9%
External/SPEC/CFP2017rate/510.parest_r/510.parest_r.test 310.46       306.23    -1.4%

Link time (O3+lto, skylake_avx512):
External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 1383.69   1475.94    6.7%

Other changes are too small, cannot rely on them.

size..text
Program                                                                                                           size..text
                                                                                                                  results     results0    diff
                                               test-suite :: SingleSource/Regression/C/Regression-C-sumarray.test      392.00     1439.00 267.1%
                                                     test-suite :: MultiSource/Applications/JM/ldecod/ldecod.test   394258.00   394818.00   0.1%
                                                     test-suite :: MultiSource/Applications/JM/lencod/lencod.test   846355.00   847075.00   0.1%
                                                test-suite :: External/SPEC/CINT2006/464.h264ref/464.h264ref.test   782816.00   783360.00   0.1%
                                               test-suite :: External/SPEC/CFP2017rate/508.namd_r/508.namd_r.test   779667.00   779923.00   0.0%
                                                   test-suite :: MultiSource/Benchmarks/mafft/pairlocalalign.test   224398.00   224446.00   0.0%
                                                        test-suite :: MultiSource/Applications/oggenc/oggenc.test   185019.00   185035.00   0.0%
                                         test-suite :: External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 12487610.00 12488010.00   0.0%
                                                    test-suite :: MultiSource/Benchmarks/7zip/7zip-benchmark.test  1051772.00  1051804.00   0.0%
                                                          test-suite :: MultiSource/Applications/SPASS/SPASS.test   529586.00   529602.00   0.0%
                                            test-suite :: External/SPEC/CINT2006/400.perlbench/400.perlbench.test  1084684.00  1084716.00   0.0%
                                                  test-suite :: MultiSource/Benchmarks/tramp3d-v4/tramp3d-v4.test  1014245.00  1014261.00   0.0%

                                          test-suite :: MultiSource/Benchmarks/MallocBench/espresso/espresso.test   223494.00   223478.00  -0.0%
                                             test-suite :: External/SPEC/CINT2017speed/625.x264_s/625.x264_s.test   660843.00   660795.00  -0.0%
                                              test-suite :: External/SPEC/CINT2017rate/525.x264_r/525.x264_r.test   660843.00   660795.00  -0.0%
                                                      test-suite :: MultiSource/Applications/ClamAV/clamscan.test   568824.00   568760.00  -0.0%

espresso - 2 more stores vectorized
x264 - small number of changes in 3-4 functions, generated a bit more
vector stores (2 4x zeroinitializer stores + some other small variations).
clamscan - emitted 32xi8 store instead of several scalar stores + several 4x-8x stores.

Differential Revision: https://reviews.llvm.org/D155246
2023-08-04 06:47:16 -07:00
Alexey Bataev
2f6ca38d51 Revert "[SLP]Improve stores vectorization."
This reverts commit 58b0d7c34ddd9d2117009a8cd7bd5e34a8276082 to fix
crashes reported in https://lab.llvm.org/buildbot/#/builders/85/builds/18117.
2023-08-04 05:25:31 -07:00
Alexey Bataev
58b0d7c34d [SLP]Improve stores vectorization.
Use O(nlogn) instead of O(N2) (N <= 32) sorting approach and do not try
to revectorize all possible combinations of stores, if they
definitely cannot be combined because of mem/data dependencies.
Compile time (O3 + lto, skylake_avx512):
External/SPEC/CINT2006/483.xalancbmk/483.xalancbmk.test 117.15       120.11     2.5%
External/SPEC/CINT2017speed/623.xalancbmk_s/623.xalancbmk_s.test 203.67       207.42     1.8%
External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 232.43       235.01     1.1%
External/SPEC/CINT2017rate/523.xalancbmk_r/523.xalancbmk_r.test 205.49       207.25     0.9%
External/SPEC/CFP2017rate/510.parest_r/510.parest_r.test 310.46       306.23    -1.4%

Link time (O3+lto, skylake_avx512):
External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 1383.69   1475.94    6.7%

Other changes are too small, cannot rely on them.

size..text
Program                                                                                                           size..text
                                                                                                                  results     results0    diff
                                               test-suite :: SingleSource/Regression/C/Regression-C-sumarray.test      392.00     1439.00 267.1%
                                                     test-suite :: MultiSource/Applications/JM/ldecod/ldecod.test   394258.00   394818.00   0.1%
                                                     test-suite :: MultiSource/Applications/JM/lencod/lencod.test   846355.00   847075.00   0.1%
                                                test-suite :: External/SPEC/CINT2006/464.h264ref/464.h264ref.test   782816.00   783360.00   0.1%
                                               test-suite :: External/SPEC/CFP2017rate/508.namd_r/508.namd_r.test   779667.00   779923.00   0.0%
                                                   test-suite :: MultiSource/Benchmarks/mafft/pairlocalalign.test   224398.00   224446.00   0.0%
                                                        test-suite :: MultiSource/Applications/oggenc/oggenc.test   185019.00   185035.00   0.0%
                                         test-suite :: External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 12487610.00 12488010.00   0.0%
                                                    test-suite :: MultiSource/Benchmarks/7zip/7zip-benchmark.test  1051772.00  1051804.00   0.0%
                                                          test-suite :: MultiSource/Applications/SPASS/SPASS.test   529586.00   529602.00   0.0%
                                            test-suite :: External/SPEC/CINT2006/400.perlbench/400.perlbench.test  1084684.00  1084716.00   0.0%
                                                  test-suite :: MultiSource/Benchmarks/tramp3d-v4/tramp3d-v4.test  1014245.00  1014261.00   0.0%

                                          test-suite :: MultiSource/Benchmarks/MallocBench/espresso/espresso.test   223494.00   223478.00  -0.0%
                                             test-suite :: External/SPEC/CINT2017speed/625.x264_s/625.x264_s.test   660843.00   660795.00  -0.0%
                                              test-suite :: External/SPEC/CINT2017rate/525.x264_r/525.x264_r.test   660843.00   660795.00  -0.0%
                                                      test-suite :: MultiSource/Applications/ClamAV/clamscan.test   568824.00   568760.00  -0.0%

espresso - 2 more stores vectorized
x264 - small number of changes in 3-4 functions, generated a bit more
vector stores (2 4x zeroinitializer stores + some other small variations).
clamscan - emitted 32xi8 store instead of several scalar stores + several 4x-8x stores.

Differential Revision: https://reviews.llvm.org/D155246
2023-08-03 16:14:21 -07:00
Mel Chen
425e9e81a0 [LV] Rename the Select[I|F]Cmp reduction pattern to [I|F]AnyOf. (NFC)
Regarding this NFC change, please refer to the discussion in this thread. https://reviews.llvm.org/D150851#4467261

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D155786
2023-08-03 00:37:19 -07:00
Bjorn Pettersson
408cc94445 [LV][LSV][SLP] Drop some typed pointer bitcasts
Differential Revision: https://reviews.llvm.org/D156736
2023-08-02 12:08:37 +02:00
Alexey Bataev
0a68cd2304 [SLP]Fix PR64252: Requesting cost of invalid extending instruction.
If the actual instruction bitwidth does not match its original size,
need to reestimate the casting opcode, the compiler cannot rely on the
one, provided in the instruction.
2023-07-31 13:37:52 -07:00
Alexey Bataev
662efdee9b [SLP][NFC]Improve handling of MinBWs container, NFC.
Replaced by DenseMap instead of MapVector(the order is not important,
just lookup is used) + reduced number of lookups.
2023-07-31 07:26:55 -07:00
Alexey Bataev
85635c7f60 [SLP][NFC]Use ScalarTy consistently in getEntryCost, NFC. 2023-07-31 06:52:56 -07:00
Alexey Bataev
48bc5b0a29 [SLP][PR64099]Fix unsound undef to poison transformation when handling
insertelement instructions.

If the original vector has undef, not poison values, which are not
rewritten by later insertelement instructions, need to transform shuffle
with the undef vector, not a poison vector, and actual indices, not
PoisonMaskElem, otherwise the transformation may produce more poisons
output than the input.
2023-07-27 16:09:49 -07:00
Fangrui Song
e8e7a959c7 [SLP] Fix -Wunused-variable in -DLLVM_ENABLE_ASSERTIONS=off builds after D154891 2023-07-24 09:47:50 -07:00
Alexey Bataev
44eca64224 [SLP]Check scalars before trying scheduling.
Need to check the scalars if they can be vectorized before trying to
schedule them. It may save compile time and improve vectorization on
large functions/basic blocks.

Differential Revision: https://reviews.llvm.org/D154891
2023-07-24 09:25:19 -07:00
David Berard
8fa02db8cf [llvm][SLP] Exit early if inputs to comparator are equal
**TL;DR:** This PR modifies a comparator. The comparator is used in a subsequent call to llvm::stable_sort. Sorting comparators should follow strict weak ordering - in particular, (x < x) should return false. This PR adds a fix to avoid an infinite loop when the inputs to the comparator are equal.

**Details**:

Sometimes when two equivalent tensors passed into the comparator, we encounter infinite looping (at aae2eaae2c/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp (L4049))

Although it seems like this comparator will never be called with two equivalent pointers, some sanitizers, e.g. https://chromium.googlesource.com/chromiumos/third_party/gcc/+/refs/heads/stabilize-zako-5712.88.B/libstdc++-v3/include/bits/stl_algo.h#360, will add checks for (x < x). When this sanitizer is used with the current implementation, it triggers a comparator check for (x < x) which runs into the infinite loop

Reviewed By: ABataev

Differential Revision: https://reviews.llvm.org/D155874
2023-07-21 05:40:55 -07:00
Alexey Bataev
aae2eaae2c [SLP]Fix a crash when trying to cast scalable vector type to fixed.
Need to check for FixedVectorType, not a vector type, since later
compiler performs unconditional cast to FixedVectorType and gets the
number of elements in this type.
2023-07-19 11:53:49 -07:00
Alexey Bataev
4bbf37199c [SLP][NFC]Improve compile-time by using map {TreeEntry *, Instruction *}
in getLastInstructionInBundle(), NFC.

Instead of building EntryToLastInstruction before the vectorization,
build it automatically during the calls to getLastInstructionInBundle()
function.
2023-07-18 13:24:55 -07:00
Alexey Bataev
83ba148a8a [SLP]Include cost of the reshuffling for same nodes with resizing.
Need to account reshuffling, required for the reused elements in the
buildvector nodes, which are copies (perfect match) of other nodes, but
include reused elements.

Differential Revision: https://reviews.llvm.org/D149966
2023-07-18 06:05:15 -07:00
Arthur Eubanks
1cb3fbc713 Revert "[SLP][NFC]Improve compile-time by using map {TreeEntry *, Instruction *}"
This reverts commit 0d21b7cbdeb2f2eb5ef123a15099da0b651b24c0.

Causes broken IR, test case provided at
https://reviews.llvm.org/rG0d21b7cbdeb2f2eb5ef123a15099da0b651b24c0
2023-07-17 14:54:47 -07:00
Alexey Bataev
d8d4c99685 [SLP][NFC]Improve performance of isGatherShuffledEntry() function, NFC.
Transformed if checks to asserts and simplified some more code to
improve compile time.
2023-07-17 14:08:56 -07:00
Alexey Bataev
0d21b7cbde [SLP][NFC]Improve compile-time by using map {TreeEntry *, Instruction *}
in getLastInstructionInBundle(), NFC.

Instead of building EntryToLastInstruction before the vectorization,
build it automatically during the calls to getLastInstructionInBundle()
function.
2023-07-17 11:36:21 -07:00
Alexey Bataev
8ab962e411 [SLP]Relax assertion to check if the input scalars were extended to
match the size of base node (PR63668).

Need to adjust the check for assert and take into account case where the
original scalars are reused and were extended to match the vector factor
of the reused SLP node.
2023-07-14 07:19:49 -07:00
Alexey Bataev
bc8abb42bb Revert "[SLP]Relax assertion to check if the input scalars were extended to"
This reverts commit 6fdfc81287ecdc2a7f409d08538ec6ce2bd698da to fix the
check in the assert )need to use end, nod begin function).
2023-07-14 07:04:06 -07:00
Alexey Bataev
6fdfc81287 [SLP]Relax assertion to check if the input scalars were extended to
match the size of base node (PR63668).

Need to adjust the check for assert and take into account case where the
original scalars are reused and were extended to match the vector factor
of the reused SLP node.
2023-07-14 06:48:25 -07:00
Anna Thomas
1159266734 [SLP] Add support for fmaximum/fminimum reduction
This patch adds support for vectorized reduction of maximum/minimum
intrinsics which are under the appropriate reduction kind.

Differential Revision: https://reviews.llvm.org/D154463
2023-07-12 15:22:38 -04:00
David Green
12025cef3e [CostModel] Use min/max intrinsics for vecreduce.min/max costs
This changes the costmodelling of the vecreduce.min/max nodes to use the costs
of the relevant min/max intrinsics instead of expanding them to compare and
selects. The getMinMaxReductionCost have changed to take a Opcode for the
relevant intrinsic, dropping the IsUnsigned and CondTy parameters as they are
no longer needed.

A follow up patch will add some basic fminimum/fmaximum costmodelling.

Differential Revision: https://reviews.llvm.org/D153547
2023-07-04 15:02:30 +01:00
Valery N Dmitriev
03b118c7e4 [SLP] Fix crash on attempt to access on invalid iterator state.
The patch fixes corner case when no of scalar instructions
required scheduling for vectorized node.

Differential Revision: https://reviews.llvm.org/D154175
2023-06-30 11:40:25 -07:00
Nikita Popov
cc31d787c3 Revert "Reland [SLP] Provide an universal interface for FixedVectorType::get. NFC."
This reverts commit 19b1d3bd7eeecbeb1e45045960faf325c7bc5c64.

Both the commit and the review are missing a patch description.
2023-06-30 11:31:16 +02:00
Han-Kuan Chen
19b1d3bd7e Reland [SLP] Provide an universal interface for FixedVectorType::get. NFC.
Differential Revision: https://reviews.llvm.org/D154114
2023-06-29 23:15:52 -07:00
Arthur Eubanks
a374fb2b5e Revert "[SLP] Provide an universal interface for FixedVectorType::get. NFC."
This reverts commit fcd58ea50c218b61a58d6815b9d15bad7dbc75a3.

Causes crashes, see comments on D154114.
2023-06-29 21:49:05 -07:00
Han-Kuan Chen
fcd58ea50c [SLP] Provide an universal interface for FixedVectorType::get. NFC.
Differential Revision: https://reviews.llvm.org/D154114
2023-06-29 17:06:08 -07:00
Luke Lau
d0d864f6f4 [SLP] Explicitly pass AccessTy to getGEPCost
Building on D149889, this patch updates SLP to pass the vector type as
the AccessTy to getGEPCost.
This should have the effect of GEPs being costed for more often instead
of being treated as foldable into the address mode and thus free, as
some architectures, notably RISC-V, do not have offset+reg addressing
modes for vector memory accesses.

Note that in SLP, GEPs are costed in two places: getPointersChainCost
and GetGEPCostDiff.

Reviewed By: ABataev

Differential Revision: https://reviews.llvm.org/D153570
2023-06-29 18:42:24 +01:00
Luke Lau
a68dcd09e8 [TTI] Use users of GEP to guess access type in getGEPCost
Currently getGEPCost uses the target type of the GEP as a heuristic for
the type that will be accessed, to pass onto isLegalAddressingMode.
Targets use this to work out if a GEP can then be folded into the
load/store instruction that uses the GEP.
For example, on RISC-V loads and stores can have an offset added to a
base register folded into a single instruction, so the following GEP is
free:

%p = getelementptr i32, ptr %base, i32 42       ; getInstructionCost = 0
%x = load i32, ptr %p                           ; getInstructionCost = 1
------------------------------------------------------------------------
lw t0, a0(42)

However vector loads and stores cannot have an offset folded into them,
so the following GEP is costed:

%p = getelementptr <2 x i32>, ptr %base, i32 42 ; getInstructionCost = 1
%x = load <2 x i32>, ptr %p                     ; getInstructionCost = 1
------------------------------------------------------------------------
addi  a0, 42
vle32 v8, (a0)

The issue arises whenever there is a mismatch between the target type of
the GEP and the type that is actually accessed:

%p = getelementptr i32, ptr %base, i32 42       ; getInstructionCost = 0
%x = load <2 x i32>, ptr %p                     ; getInstructionCost = 1
------------------------------------------------------------------------
addi  a0, 42
vle32 v8, (a0)

Even though this GEP will result in an add instruction, because TTI
thinks it's loading an i32, it will think it can be folded and not
charge for it.

The target type can become mismatched with the memory access during
transformations, noticeably during SLP where a scalar base pointer will
be reused to perform a vector load or store.

This patch adds an optional AccessType argument to getGEPCost which
allows the type of memory accessed by users to be passed in as a hint,
so that we can more accurately determine if the GEP can be folded into
its users.

If AccessType is not provided, getGEPCost falls back to the old
behaviour of using the PointeeType to guess the memory access type. This
can be revisited in a later patch.

Also for now, only GEPs with exactly one user use the access type hint.
Whilst we could look through all users and use all access types to
determine if we can fold the GEP, this patch avoids doing so to prevent
O(N) behaviour.

Differential Revision: https://reviews.llvm.org/D149889
2023-06-29 13:44:37 +01:00
Alexey Bataev
5d2cc8e242 [SLP]Fix emission of buildvectors with full match.
If the buildvector node is a full match of another node, need to
correctly build the mask for the original vector value and build common
mask for the emitted node.
2023-06-28 13:47:08 -07:00