Florian Hahn
c34d049706
[LV] Re-use existing NewInsertionPoint variable for insertion (NFCI).
...
Split off independent suggestion from D157037.
2023-08-21 15:21:29 +01:00
Florian Hahn
56f5738d85
[LV] Move induction ::execute impls to VPlanRecipes.cpp (NFC).
...
All dependencies on code from LoopVectorize.cpp have been
removed/refactored. Move the ::execute implementations to other recipe
definitions in VPlanRecipes.cpp
2023-08-20 21:00:05 +01:00
Craig Topper
46eded75cd
[LoopVectorize] Replace dyn_cast with isa to suppress an unused variable warning. NFC
2023-08-19 14:41:00 -07:00
Florian Hahn
622b611f23
[VPlan] Inline buildScalarSteps in single user (NFC).
...
Other users have been refactored, remove the uneeded function.
2023-08-19 17:02:31 +01:00
Florian Hahn
ada2a455fc
[VPlan] Use VPBasicBlock to get incoming block for exit phi fixup (NFC)
...
Retrieve block via VPlan infrastructure as suggested as independent
cleanup in D150398.
2023-08-17 18:17:45 +01:00
Florian Hahn
9ee4a740e3
[LV] Remove unused MiddleVPBB argument from addUsersInExitBlock (NFC).
...
The argument is no longer used, remove it.
2023-08-17 10:36:12 +01:00
Mel Chen
463e7cb892
[LV][VPlan] Refactor VPReductionRecipe to use reference for member RdxDesc
...
This commit refactors the implementation of VPReductionRecipe to use
reference instead of pointer for member RdxDesc. Because the member
RdxDesc in VPReductionRecipe should not be a nullptr, using a reference
will provide clearer semantics.
Reviewed By: fhahn
Differential Revision: https://reviews.llvm.org/D158058
2023-08-16 19:37:49 -07:00
Alexey Bataev
ca2eabdb52
[SLP][NFC]Improve code to meet coding standards, NFC.
2023-08-15 11:08:25 -07:00
Alexey Bataev
63c7815faf
[SLP]Fix comparator for PHI nodes comparison.
...
Fixed comparator for PHI nodes sorting to meet the criteria for strict
weak ordering.
2023-08-14 14:05:39 -07:00
Florian Hahn
00bc500830
[VPlan] Store FPBinOp directly in VPDerivedIVRecipe (NFCI).
...
Address post-commit simplification suggestion for 8a56179bcd8c:
Store operator only for floating point inductions (i.e. the binary op is
a FPMathOperator).
2023-08-14 21:45:19 +01:00
Alexey Bataev
4f0bd8f7ac
[SLP]Fix strict weak ordering for Cmp instruction comparator.
...
Sorting algorithms require strict weak ordering for comparators, final
fix for cmp instructions comparator.
2023-08-14 09:37:46 -07:00
Florian Hahn
aacaf3d580
[VPlan] Simplify VPDerivedIV truncation handling (NFCI).
...
Address post-commit simplification suggestion for 8a56179bcd8c: Replace
IsTruncated by conditionally setting TruncResultTy only if truncation
is required.
2023-08-14 17:33:10 +01:00
Florian Hahn
d32e68ae53
[docs] Graduate VectorizationPlan.rst from proposal.
...
VPlan has become an integral part of the inner loop vectorizer pipeline
that has been actively developed over the previous years. Let's move
VectorizationPlan.rst from the proposal stage to bring the docs in line
and to avoid confusion when reading the docs.
Reviewed By: rengolin
Differential Revision: https://reviews.llvm.org/D157593
2023-08-10 17:15:43 +01:00
Alexey Bataev
2216507171
[SLP]Fix PR64568: Crash during horizontal reduction.
...
If the reduced values is constant-foldable and was folded to a constant
during previous transformations, need to excluded it from the list of
the reduced values-instructions as non-matchable.
2023-08-10 07:33:16 -07:00
Alexey Bataev
42b3925d42
[SLP][NFC]Fix formatting/warnings in tryToReduce(), NFC.
2023-08-10 06:42:50 -07:00
Bjorn Pettersson
e53b28c833
[llvm] Drop some bitcasts and references related to typed pointers
...
Differential Revision: https://reviews.llvm.org/D157551
2023-08-10 15:07:07 +02:00
Florian Hahn
8a56179bcd
[VPlan] Store induction kind & binop directly in VPDerviedIVRecipe(NFC)
...
Limit the information stored in VPDerivedIVRecipe to the ingredients
really needed.
2023-08-10 10:57:32 +01:00
Florian Hahn
e6d5dcf84c
[LV] Pass kind and induction binop to emitTransformedIndex (NFC).
...
Explicitly pass InductionKind and InductionBinOp to
emitTransformedIndex. Only those values are needed from the induction
descriptor. This makes explicit what is needed for the function and
allows future use cases where the a full induction descriptor object is
not available.
2023-08-10 10:35:42 +01:00
Valery N Dmitriev
f522be63bc
[SLP][NFC] Make buildShuffleEntryMask routine a TreeEntry method.
...
The routine uses data stored at TreeEntry node for building a mask
so it is natural to make it a method for the type. That will simplify
its interface and reduces data transfer.
The method is added as buildAltOpShuffleMask.
Differential Revision: https://reviews.llvm.org/D157545
2023-08-09 13:43:03 -07:00
Alexey Bataev
c619222ea4
[SLP]Use common logic for cost estimation of the alternate vector nodes.
...
We can use buildShuffleEntryMask() to build the shuffle mask correctly
not only for the alternate nodes with reuses, but also for the nodes
without reused scalars. It allows better to estimate the cost of the
node and emit better code.
Differential Revision: https://reviews.llvm.org/D157413
2023-08-09 11:50:39 -07:00
Florian Hahn
b223229e2c
[VPlan] Re-use existing step again after 34accad1feae.
...
This fixes a failing RISCV test case that was missed originally.
2023-08-08 21:42:56 +01:00
Florian Hahn
34accad1fe
[VPlan] Move logic to create VPScalarIVStepsRecipe to helper (NFC).
...
This allows for easier re-use in follow-on patches.
2023-08-08 21:25:06 +01:00
Florian Hahn
698ae66092
[VPlan] Replace FMF in VPInstruction with VPRecipeWithIRFlags (NFC).
...
Update VPInstruction to use VPRecipeWithIRFlags to manage FMFs for
VPInstruction.
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D157144
2023-08-08 20:13:11 +01:00
Alexey Bataev
d0e3a571e7
[SLP]Fix PR64519: Unexpected reordering of gathers.
...
The issue is actually related to ScatterVectorize nodes. If such node
gets reordered during bottom-to-top reordering, it may have associated
non-empty ReorderIndices. In this case, such nodes need to be handled
the same way as regular Vectorize nodes, not NeedToGather nodes. In this
case we need to reorder ReorderIndices array rather than scalars.
2023-08-08 08:07:25 -07:00
Florian Hahn
b6d994de0f
[VPlan] Address post-commit suggestions for af635a554 (NFC).
2023-08-08 12:59:34 +01:00
Florian Hahn
e18a547ce2
[VPlan] Fold if into return in prepareToExecute assertion (NFC).
...
Independent simplification suggested in D157194.
2023-08-08 12:45:55 +01:00
Florian Hahn
af635a5547
[VPlan] Model wrap flags directly, remove *NUW opcodes (NFC)
...
Model wrap flags directly using VPRecipeWithIRFlags and clean up the
duplicated *NUW opcodes.
D157144 will build on this and also model FMFs for VPInstruction.
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D157194
2023-08-08 12:12:30 +01:00
Florian Hahn
93c5bae00e
[VPlan] Use printOperands for VPInstruction.
...
Use the printOperands for printing VPInstruction's operands to be more
in line with other recipes and ensure consistent printing after D15719.
Also removes some stray spaces in print output.
2023-08-08 11:31:21 +01:00
Florian Hahn
e2851ad43d
[VPlan] Use IterT template arg directly for VPInstruction operands (NFC)
...
Makes the constructors a bit more flexible, to be used in D157194 &
D157144.
2023-08-08 09:42:17 +01:00
Alexey Bataev
e894c3d1a9
[SLP]Improve stores vectorization.
...
Use O(nlogn) instead of O(N2) (N <= 32) sorting approach and do not try
to revectorize all possible combinations of stores, if they
definitely cannot be combined because of mem/data dependencies.
Compile time (O3 + lto, skylake_avx512):
External/SPEC/CINT2006/483.xalancbmk/483.xalancbmk.test 117.15 120.11 2.5%
External/SPEC/CINT2017speed/623.xalancbmk_s/623.xalancbmk_s.test 203.67 207.42 1.8%
External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 232.43 235.01 1.1%
External/SPEC/CINT2017rate/523.xalancbmk_r/523.xalancbmk_r.test 205.49 207.25 0.9%
External/SPEC/CFP2017rate/510.parest_r/510.parest_r.test 310.46 306.23 -1.4%
Link time (O3+lto, skylake_avx512):
External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 1383.69 1475.94 6.7%
Other changes are too small, cannot rely on them.
size..text
Program size..text
results results0 diff
test-suite :: SingleSource/Regression/C/Regression-C-sumarray.test 392.00 1439.00 267.1%
test-suite :: MultiSource/Applications/JM/ldecod/ldecod.test 394258.00 394818.00 0.1%
test-suite :: MultiSource/Applications/JM/lencod/lencod.test 846355.00 847075.00 0.1%
test-suite :: External/SPEC/CINT2006/464.h264ref/464.h264ref.test 782816.00 783360.00 0.1%
test-suite :: External/SPEC/CFP2017rate/508.namd_r/508.namd_r.test 779667.00 779923.00 0.0%
test-suite :: MultiSource/Benchmarks/mafft/pairlocalalign.test 224398.00 224446.00 0.0%
test-suite :: MultiSource/Applications/oggenc/oggenc.test 185019.00 185035.00 0.0%
test-suite :: External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 12487610.00 12488010.00 0.0%
test-suite :: MultiSource/Benchmarks/7zip/7zip-benchmark.test 1051772.00 1051804.00 0.0%
test-suite :: MultiSource/Applications/SPASS/SPASS.test 529586.00 529602.00 0.0%
test-suite :: External/SPEC/CINT2006/400.perlbench/400.perlbench.test 1084684.00 1084716.00 0.0%
test-suite :: MultiSource/Benchmarks/tramp3d-v4/tramp3d-v4.test 1014245.00 1014261.00 0.0%
test-suite :: MultiSource/Benchmarks/MallocBench/espresso/espresso.test 223494.00 223478.00 -0.0%
test-suite :: External/SPEC/CINT2017speed/625.x264_s/625.x264_s.test 660843.00 660795.00 -0.0%
test-suite :: External/SPEC/CINT2017rate/525.x264_r/525.x264_r.test 660843.00 660795.00 -0.0%
test-suite :: MultiSource/Applications/ClamAV/clamscan.test 568824.00 568760.00 -0.0%
espresso - 2 more stores vectorized
x264 - small number of changes in 3-4 functions, generated a bit more
vector stores (2 4x zeroinitializer stores + some other small variations).
clamscan - emitted 32xi8 store instead of several scalar stores + several 4x-8x stores.
Differential Revision: https://reviews.llvm.org/D155246
2023-08-07 09:17:56 -07:00
Florian Hahn
0b17e9d285
[VPlan] Move VPRecipeWithIRFlags::getFastMathFlags. (NFCI)
...
Split off suggested refactoring from D157144. Also adds a assert to make
sure this is only used when OpType is FPMathOp.
2023-08-07 12:35:53 +01:00
Florian Hahn
7b14c05908
[VPlan] Move up VPRecipeWithIRFlags definition. (NFC)
...
This allows using VPRecipeWithIRFlags for VPInstruction and reduces the
diff for D157144 & D157194.
2023-08-07 11:03:41 +01:00
Florian Hahn
aac8acb115
[VPlan] Model masked assumes as replicate recipes, drop them (NFCI).
...
Replace ConditionalAssume set by treating conditional assumes like other
predicated instructions (i.e. create a VPReplicateRecipe with a mask)
and later remove any assume recipes with masks during VPlan cleanup.
This reduces coupling of VPlan construction and Legal by removing a
shared set between the 2 and results in a cleaner code structure
overall.
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D157034
2023-08-06 20:56:24 +01:00
Alexey Bataev
48bcaeb997
Revert "[SLP]Improve stores vectorization."
...
This reverts commit 58066edbb05d66e6a7512a675da778475da3bdfb reported in https://lab.llvm.org/buildbot/#/builders/252/builds/3389
2023-08-04 07:37:26 -07:00
Alexey Bataev
58066edbb0
[SLP]Improve stores vectorization.
...
Use O(nlogn) instead of O(N2) (N <= 32) sorting approach and do not try
to revectorize all possible combinations of stores, if they
definitely cannot be combined because of mem/data dependencies.
Compile time (O3 + lto, skylake_avx512):
External/SPEC/CINT2006/483.xalancbmk/483.xalancbmk.test 117.15 120.11 2.5%
External/SPEC/CINT2017speed/623.xalancbmk_s/623.xalancbmk_s.test 203.67 207.42 1.8%
External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 232.43 235.01 1.1%
External/SPEC/CINT2017rate/523.xalancbmk_r/523.xalancbmk_r.test 205.49 207.25 0.9%
External/SPEC/CFP2017rate/510.parest_r/510.parest_r.test 310.46 306.23 -1.4%
Link time (O3+lto, skylake_avx512):
External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 1383.69 1475.94 6.7%
Other changes are too small, cannot rely on them.
size..text
Program size..text
results results0 diff
test-suite :: SingleSource/Regression/C/Regression-C-sumarray.test 392.00 1439.00 267.1%
test-suite :: MultiSource/Applications/JM/ldecod/ldecod.test 394258.00 394818.00 0.1%
test-suite :: MultiSource/Applications/JM/lencod/lencod.test 846355.00 847075.00 0.1%
test-suite :: External/SPEC/CINT2006/464.h264ref/464.h264ref.test 782816.00 783360.00 0.1%
test-suite :: External/SPEC/CFP2017rate/508.namd_r/508.namd_r.test 779667.00 779923.00 0.0%
test-suite :: MultiSource/Benchmarks/mafft/pairlocalalign.test 224398.00 224446.00 0.0%
test-suite :: MultiSource/Applications/oggenc/oggenc.test 185019.00 185035.00 0.0%
test-suite :: External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 12487610.00 12488010.00 0.0%
test-suite :: MultiSource/Benchmarks/7zip/7zip-benchmark.test 1051772.00 1051804.00 0.0%
test-suite :: MultiSource/Applications/SPASS/SPASS.test 529586.00 529602.00 0.0%
test-suite :: External/SPEC/CINT2006/400.perlbench/400.perlbench.test 1084684.00 1084716.00 0.0%
test-suite :: MultiSource/Benchmarks/tramp3d-v4/tramp3d-v4.test 1014245.00 1014261.00 0.0%
test-suite :: MultiSource/Benchmarks/MallocBench/espresso/espresso.test 223494.00 223478.00 -0.0%
test-suite :: External/SPEC/CINT2017speed/625.x264_s/625.x264_s.test 660843.00 660795.00 -0.0%
test-suite :: External/SPEC/CINT2017rate/525.x264_r/525.x264_r.test 660843.00 660795.00 -0.0%
test-suite :: MultiSource/Applications/ClamAV/clamscan.test 568824.00 568760.00 -0.0%
espresso - 2 more stores vectorized
x264 - small number of changes in 3-4 functions, generated a bit more
vector stores (2 4x zeroinitializer stores + some other small variations).
clamscan - emitted 32xi8 store instead of several scalar stores + several 4x-8x stores.
Differential Revision: https://reviews.llvm.org/D155246
2023-08-04 06:47:16 -07:00
Alexey Bataev
2f6ca38d51
Revert "[SLP]Improve stores vectorization."
...
This reverts commit 58b0d7c34ddd9d2117009a8cd7bd5e34a8276082 to fix
crashes reported in https://lab.llvm.org/buildbot/#/builders/85/builds/18117 .
2023-08-04 05:25:31 -07:00
Florian Hahn
a6d6730709
[LV] Split off code to optimize initial VPlan (NFC).
...
Split up tryToBuildVPlanWithVPRecipes into intial plan creation and
optimizations, by introducing a VPLanTransform::optimize helper.
Depends on D154640.
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D154644
2023-08-04 13:21:20 +01:00
Florian Hahn
39cf210450
[LV] Remove unnecessary std::move from tryToBuildVPlanWith.. (NFC).
...
Split off D154644.
2023-08-04 11:56:05 +01:00
Florian Hahn
c30099ef0b
[LV] Return null VPlanPtr instead of std::optional for tryToBuild (NFC)
...
Cleanup in preparation for D154644. This was suggested earlier and helps
to simplify the code with D154644.
2023-08-04 11:48:24 +01:00
Alexey Bataev
58b0d7c34d
[SLP]Improve stores vectorization.
...
Use O(nlogn) instead of O(N2) (N <= 32) sorting approach and do not try
to revectorize all possible combinations of stores, if they
definitely cannot be combined because of mem/data dependencies.
Compile time (O3 + lto, skylake_avx512):
External/SPEC/CINT2006/483.xalancbmk/483.xalancbmk.test 117.15 120.11 2.5%
External/SPEC/CINT2017speed/623.xalancbmk_s/623.xalancbmk_s.test 203.67 207.42 1.8%
External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 232.43 235.01 1.1%
External/SPEC/CINT2017rate/523.xalancbmk_r/523.xalancbmk_r.test 205.49 207.25 0.9%
External/SPEC/CFP2017rate/510.parest_r/510.parest_r.test 310.46 306.23 -1.4%
Link time (O3+lto, skylake_avx512):
External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 1383.69 1475.94 6.7%
Other changes are too small, cannot rely on them.
size..text
Program size..text
results results0 diff
test-suite :: SingleSource/Regression/C/Regression-C-sumarray.test 392.00 1439.00 267.1%
test-suite :: MultiSource/Applications/JM/ldecod/ldecod.test 394258.00 394818.00 0.1%
test-suite :: MultiSource/Applications/JM/lencod/lencod.test 846355.00 847075.00 0.1%
test-suite :: External/SPEC/CINT2006/464.h264ref/464.h264ref.test 782816.00 783360.00 0.1%
test-suite :: External/SPEC/CFP2017rate/508.namd_r/508.namd_r.test 779667.00 779923.00 0.0%
test-suite :: MultiSource/Benchmarks/mafft/pairlocalalign.test 224398.00 224446.00 0.0%
test-suite :: MultiSource/Applications/oggenc/oggenc.test 185019.00 185035.00 0.0%
test-suite :: External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 12487610.00 12488010.00 0.0%
test-suite :: MultiSource/Benchmarks/7zip/7zip-benchmark.test 1051772.00 1051804.00 0.0%
test-suite :: MultiSource/Applications/SPASS/SPASS.test 529586.00 529602.00 0.0%
test-suite :: External/SPEC/CINT2006/400.perlbench/400.perlbench.test 1084684.00 1084716.00 0.0%
test-suite :: MultiSource/Benchmarks/tramp3d-v4/tramp3d-v4.test 1014245.00 1014261.00 0.0%
test-suite :: MultiSource/Benchmarks/MallocBench/espresso/espresso.test 223494.00 223478.00 -0.0%
test-suite :: External/SPEC/CINT2017speed/625.x264_s/625.x264_s.test 660843.00 660795.00 -0.0%
test-suite :: External/SPEC/CINT2017rate/525.x264_r/525.x264_r.test 660843.00 660795.00 -0.0%
test-suite :: MultiSource/Applications/ClamAV/clamscan.test 568824.00 568760.00 -0.0%
espresso - 2 more stores vectorized
x264 - small number of changes in 3-4 functions, generated a bit more
vector stores (2 4x zeroinitializer stores + some other small variations).
clamscan - emitted 32xi8 store instead of several scalar stores + several 4x-8x stores.
Differential Revision: https://reviews.llvm.org/D155246
2023-08-03 16:14:21 -07:00
Florian Hahn
deec9e7674
[VPlan] Move VPTransformState::get() to VPlan.cpp (NFC).
...
The last dependency of code defined in LoopVectorize.cpp has been
removed a while ago. Move VPTransformState::get() to VPlan.cpp where
other members are also defined.
2023-08-03 21:49:58 +01:00
Mel Chen
425e9e81a0
[LV] Rename the Select[I|F]Cmp reduction pattern to [I|F]AnyOf. (NFC)
...
Regarding this NFC change, please refer to the discussion in this thread. https://reviews.llvm.org/D150851#4467261
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D155786
2023-08-03 00:37:19 -07:00
Mel Chen
97cccdd9f3
[LV][NFC] Remove the redundant braces.
2023-08-02 20:45:04 -07:00
Florian Hahn
8ea274b46b
[VPlan] Fix in-loop reduction chains using VPlan def-use chains (NFCI)
...
Update adjustRecipesForReductions to directly use the VPlan def-use
chains for in-loop reductions to collect the reduction operations that
need adjusting.
This allows the removal of
* ReductionChainMap
* recording of recipes for instruction in the reduction chain
* removes late uses of getVPValue
* removes to need for removeVPValueFor.
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D155845
2023-08-02 17:04:29 +01:00
Florian Hahn
d1d0e135a1
[LV] Move packScalarIntoVectorValue to VPTransformState (NFC).
...
This moves packScalarIntoVectorValue from ILV to the more approriate
VPTransformState.
2023-08-02 12:36:48 +01:00
Bjorn Pettersson
408cc94445
[LV][LSV][SLP] Drop some typed pointer bitcasts
...
Differential Revision: https://reviews.llvm.org/D156736
2023-08-02 12:08:37 +02:00
Florian Hahn
707359ecf5
Recommit "[LV] Re-use existing broadcast value for live-ins."
...
This reverts commit 245ec675a4e41f7ec24dfc998720bffdc46a6c53.
Recommits eea9258648ce with a fix to only erase the instruction from the
first part if it is defined outside the loop. This fixes a
use-after-free error reported.
2023-08-01 15:54:02 +01:00
Alexey Bataev
0a68cd2304
[SLP]Fix PR64252: Requesting cost of invalid extending instruction.
...
If the actual instruction bitwidth does not match its original size,
need to reestimate the casting opcode, the compiler cannot rely on the
one, provided in the instruction.
2023-07-31 13:37:52 -07:00
Alexey Bataev
662efdee9b
[SLP][NFC]Improve handling of MinBWs container, NFC.
...
Replaced by DenseMap instead of MapVector(the order is not important,
just lookup is used) + reduced number of lookups.
2023-07-31 07:26:55 -07:00
Alexey Bataev
85635c7f60
[SLP][NFC]Use ScalarTy consistently in getEntryCost, NFC.
2023-07-31 06:52:56 -07:00