3219 Commits

Author SHA1 Message Date
Florian Hahn
cb69ba4faa
[LV] Create RT checks once VF/IC are selected, track scalar cost.
This patch updates LV to generate runtime after the VF & IC are selected. It
allows deciding whether to vectorize with runtime checks or not based on
their cost compared to the vector loop.

It also updates VectorizationFactor to include the scalar cost.

Reviewed By: lebedev.ri, dmgreen

Differential Revision: https://reviews.llvm.org/D75981
2022-06-24 17:42:11 +02:00
Florian Hahn
b18141a8f2
[VPlan] Set VFs included in plan before last set of VPTransforms (NFC).
This allows VPlanTransforms to query the VFs included in the plan in the
future.
2022-06-24 10:16:56 +02:00
Philip Reames
46ea4b5ea1 [LV] Avoid a crash when costing a uniform store which doesn't correspond to a legal scatter
If we have an unaligned uniform store, then when costing a scalable VF we can't emit code to scalarize it.  (Well, we could, but we haven't implemented that case.)  This change replaces an assert with a cost-model bailout such that we reject vectorization with the scalable VF instead of crashing.
2022-06-23 12:41:09 -07:00
Alexey Bataev
3b6edef15d [SLP]Fix a crash when reorder masked gather nodes with reused scalars.
If the masked gather nodes must be reordered, we can just reorder
scalars, just like for gather nodes. But if the node contains reused
scalars, it must be handled same way as a regular vectorizable node,
since need to reorder reused mask, not the scalars directly.

Differential Revision: https://reviews.llvm.org/D128360
2022-06-23 11:32:30 -07:00
Florian Hahn
569d84fe99
[VPlan] Remove dead recipes across whole plan.
This extends removeDeadRecipe to remove recipes across the whole plan.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D127580
2022-06-23 13:36:02 +02:00
Fangrui Song
1ffd2d99c2 Revert D115462 "[SLP]Improve shuffles cost estimation where possible."
This reverts commit cac60940b771a0685d058a5b471c84cea05fdc46.

Caused -Os -fsanitize=memory -march=haswell miscompile to pytorch/cpuinfo.
See my latest comment (may update) on D115462.
2022-06-22 23:16:31 -07:00
Fangrui Song
a411bc11d6 Revert "[SLP]Fix a crash when insert subvector is out of range."
This reverts commit f1ee2738b3d70fea803ac1f3401c2fc9f61e514a.

Revert due to the revert of a dependent commit `[SLP]Improve shuffles cost estimation where possible.`
2022-06-22 23:16:25 -07:00
Guillaume Chatelet
57ffff6db0 Revert "[NFC] Remove dead code"
This reverts commit 8ba2cbff70f2c49a8926451c59cc260d67b706cf.
2022-06-22 14:55:47 +00:00
Guillaume Chatelet
8ba2cbff70 [NFC] Remove dead code 2022-06-22 13:33:58 +00:00
Serguei Katkov
8f891b7c39 [LoopVectorize] Uninitialized phi node leads to a crash in SSAUpdater.
createInductionResumeValues creates a phi node placeholder
without filling incoming values. Then it generates the incoming values.

It includes triggering of SCEV expander which may invoke SSAUpdater.
SSAUpdater has an optimization to detect number of predecessors
basing on incoming values if there is phi node.
In case phi node is not filled with incoming values - the number of predecessors
is detected as 0 and this leads to segmentation fault.

In other words SSAUpdater expects that phi is in good shape while
LoopVectorizer breaks this requirement.

The fix is just prepare all incoming values first and then build a phi node.

Reviewed By: fhahn
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D128033
2022-06-22 10:49:27 +07:00
Vasileios Porpodas
7a9ad25769 Recommit "[SLP][X86] Improve reordering to consider alternate instruction bundles"
This reverts commit 6d6268dcbf0f48e43f6f9fe46b3a28c29ba63c7d.

Review: https://reviews.llvm.org/D125712
2022-06-21 18:35:29 -07:00
Vasileios Porpodas
6d6268dcbf Revert "[SLP][X86] Improve reordering to consider alternate instruction bundles"
This reverts commit 6f88acf410b48f3e6c1526df2dc32ed86f249685.
2022-06-21 17:07:21 -07:00
Vasileios Porpodas
6f88acf410 [SLP][X86] Improve reordering to consider alternate instruction bundles
During the reordering transformation we should try to avoid reordering bundles
like fadd,fsub because this may block them being matched into a single vector
instruction in x86.
We do this by checking if a TreeEntry is such a pattern and adding it to the
list of TreeEntries with orders that need to be considered.

Differential Revision: https://reviews.llvm.org/D125712
2022-06-21 16:44:48 -07:00
Florian Hahn
88ce403c6a
[LV] Add new block to place recurrence splice, if needed.
In some cases, a recurrence splice instructions needs to be inserted
between to regions, for example if the regions get re-arranged during
sinking.

Fixes #56146.
2022-06-21 21:54:37 +02:00
Alexey Bataev
d4ee43153d [SLP][NFC]Fix a warning in a comparison, NFC.
Fixed signedness warning.
2022-06-21 10:19:47 -07:00
Alexey Bataev
f1ee2738b3 [SLP]Fix a crash when insert subvector is out of range.
If the OffsetBeg + InsertVecSz is greater than VecSz, need to estimate
the cost as shuffle of 2 vector, not as insert of subvector. Otherwise,
the inserted subvector is out of range and compiler may crash.

Differential Revision: https://reviews.llvm.org/D128071
2022-06-21 07:16:35 -07:00
Kazu Hirata
7a47ee51a1 [llvm] Don't use Optional::getValue (NFC) 2022-06-20 22:45:45 -07:00
Kazu Hirata
e0e687a615 [llvm] Don't use Optional::hasValue (NFC) 2022-06-20 10:38:12 -07:00
Kazu Hirata
129b531c9c [llvm] Use value_or instead of getValueOr (NFC) 2022-06-18 23:07:11 -07:00
Kazu Hirata
c399b3a608 [Vectorize] Use llvm::is_contained (NFC) 2022-06-18 15:49:15 -07:00
Malhar Jajoo
6bb40552f2 [LoopVectorize] Add support for invariant stores of ordered reductions
Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D126772
2022-06-17 14:56:21 +01:00
Alexey Bataev
76782a65ee [SLP]Use original vector if need to shuffle truncated root.
If the root scalar is mapped to to the smallest bit width, the vector is
truncated and the types between original buildvector and extracted value
mismatched. For extract, we emit sext/zext instructions, for shuffles we
can reuse oringal vector instead of the truncated one.

Differential Revision: https://reviews.llvm.org/D127974
2022-06-16 10:41:18 -07:00
Florian Hahn
949c13649c
[LV] Remove widenPHIInstruction dependence on underlying instr (NFC).
Instead of using the underlying instruction and VF to get the type, use
the type of the incoming value. This removes an unnecessary dependence
on the underlying instruction and enables using the recipe without an
underlying instruction.
2022-06-16 16:03:01 +02:00
Alexey Bataev
7236d49fd5 [SLP]Extend vectorization for scatter vectorize nodes.
Currently scatter vectorize nodes can be emitted only for GEPs with
constant indices. But we can also emit such nodes for GEPs with the same
ptr and non-constant vectorizable/gathered indices, if profitable. Patch
adds support for such nodes and tries to improve handling of GEPs with
non-const indeces for such nodes.

Metric: SLP.NumVectorInstructions

Program                                                                                       SLP.NumVectorInstructions
                                                                                              results                   results0 diff
                    test-suite :: External/SPEC/CFP2017speed/638.imagick_s/638.imagick_s.test  5243.00                   5240.00  -0.1%
                     test-suite :: External/SPEC/CFP2017rate/538.imagick_r/538.imagick_r.test  5243.00                   5240.00  -0.1%
                     test-suite :: External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 27550.00                  27507.00  -0.2%
                               test-suite :: External/SPEC/CFP2006/453.povray/453.povray.test  5395.00                   5380.00  -0.3%
                       test-suite :: External/SPEC/CFP2017rate/511.povray_r/511.povray_r.test  5389.00                   5374.00  -0.3%
                    test-suite :: External/SPEC/CINT2017rate/520.omnetpp_r/520.omnetpp_r.test   961.00                    958.00  -0.3%
                   test-suite :: External/SPEC/CINT2017speed/620.omnetpp_s/620.omnetpp_s.test   961.00                    958.00  -0.3%
                               test-suite :: External/SPEC/CFP2006/447.dealII/447.dealII.test  5664.00                   5643.00  -0.4%
                       test-suite :: External/SPEC/CFP2017rate/510.parest_r/510.parest_r.test 13202.00                  13127.00  -0.6%
                                test-suite :: External/SPEC/CINT2006/445.gobmk/445.gobmk.test   212.00                    207.00  -2.4%
                                test-suite :: MultiSource/Benchmarks/7zip/7zip-benchmark.test   890.00                    850.00  -4.5%
                            test-suite :: External/SPEC/CINT2006/464.h264ref/464.h264ref.test  1695.00                   1581.00  -6.7%
                                 test-suite :: MultiSource/Applications/JM/lencod/lencod.test  2338.00                   2140.00  -8.5%
                                  test-suite :: SingleSource/UnitTests/matrix-types-spec.test    63.00                     55.00 -12.7%
                             test-suite :: SingleSource/Benchmarks/Adobe-C++/loop_unroll.test   468.00                    356.00 -23.9%
                                                                           Geomean difference                                     -0.3%

All numbers show increased number of generated vector instructions.

Diff:
SingleSource/Benchmarks/Adobe-C++/loop_unroll - better without LTO, but
need an extra analysis with LTO (with LTO compiler generates
masked_gather, while before regular loads were emitted because of extra
data, availbale at LTO time).
SingleSource/UnitTests/matrix-types-spec - more vector code.
MultiSource/Applications/JM/lencod/lencod - same.
External/SPEC/CINT2006/464.h264ref/464.h264ref - same.
MultiSource/Benchmarks/7zip/7zip-benchmark - same.
External/SPEC/CINT2006/445.gobmk/445.gobmk - no changes.
External/SPEC/CFP2017rate/510.parest_r/510.parest_r - more vector code.
External/SPEC/CFP2006/447.dealII/447.dealII - same
External/SPEC/CINT2017speed/620.omnetpp_s/620.omnetpp_s - same
External/SPEC/CINT2017rate/520.omnetpp_r/520.omnetpp - same
External/SPEC/CFP2017rate/511.povray_r/511.povray - same
External/SPEC/CFP2006/453.povray/453.povray - same
External/SPEC/CFP2017rate/526.blender_r/526.blender_r - same
External/SPEC/CFP2017rate/538.imagick_r/538.imagick_r - same
External/SPEC/CFP2017speed/638.imagick_s/638.imagick_s - same

Differential Revision: https://reviews.llvm.org/D127219
2022-06-16 06:05:48 -07:00
Florian Hahn
5ff5b460d9
[LV] Remove unneeded CustomBuilder arg from setDebugLocFromInst (NFC).
The only user that passed in a custom builder was passing in
VPTransformState::Builder, which is the same as ILV::Builder.
2022-06-15 18:48:02 +01:00
Alexey Bataev
c60c13f7eb [SLP] Improve reordering in presence of constant only nodes.
We can skip the analysis of the constant nodes, their order should not
affect the ordering of the trees/subtrees.

Differential Revision: https://reviews.llvm.org/D127775
2022-06-15 06:17:34 -07:00
Florian Hahn
9129e7bb54
[LV] Replace OrigPHIsToFix in native with VPlan traversal. (NFC)
OrigPHIsToFix is only used in the native path. Collecting phis can be
replaced by iterating over the plan. This also removes another
unnecessary use of a late getVPValue.

This also reduces the coupling between ILV and the VPlan utilities.
2022-06-13 22:20:58 +01:00
Hubert Tong
5efb380c26 [NFC] Undo AIX build compiler workaround
Removes the workaround from https://reviews.llvm.org/D98509#2732628 for
an AIX build compiler issue.

The AIX build compiler product that caused the issue has since been
fixed. Also, the AIX build compiler has been changed to one based on
LLVM.
2022-06-13 17:00:33 -04:00
Florian Hahn
763f2bdba5
[VPlan] Remove dead OrigLoop argument from removeDeadRecipes (NFC).
The use of the argument has been remove a while ago. Remove the dead
argument.
2022-06-11 23:36:47 +01:00
Florian Hahn
85983ca42e
[VPlan] Replace remaining use of needsScalarIV.
All information is already available in VPlan. Note that there are some
test changes, because we now can correctly look through instructions
like truncates to analyze the actual users.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D123541
2022-06-09 12:05:37 +01:00
Florian Hahn
cedfd7a2e5
Recommit "[VPlan] Remove uneeded needsVectorIV check."
This reverts commit 266ea446ab747671eb6c736569c3c9c5f3c53d11.

The reasons for the revert have been addressed by cleaning up condition
handling in VPlan and properly marking VPBranchOnMaskRecipe as using
scalars.

The test case for the revert from D123720 has been added in 3d663308a5d.
2022-06-08 14:06:45 +01:00
Florian Hahn
b0c9a71be0
[VPlan] Handle VPInst without underlying instr in VPInterleavedAccess.
This violation is hidden while `cast` is missing an isa assertion after
D123901.
2022-06-07 21:00:49 +01:00
David Sherwood
997ecb0036 [LoopVectorize] Add FastMathFlags to the select used for reductions with tail-folding
Based on reviewer comments on https://reviews.llvm.org/D126692 I've
added FastMathFlags to the select instruction used when tail-folding
with reductions. These flags can then be used by InstCombine to
decide upon the most optimal floating point identity value for
fadd/fsub. Doing so unlocks further optimisations, such as folding
selects into masked loads.

Differential Revision: https://reviews.llvm.org/D126778
2022-06-07 10:21:31 +01:00
Florian Hahn
eaf48dd9b0
[VPlan] Replace BranchOnCount with BranchOnCond if TC <= UF * VF.
Try to simplify BranchOnCount to `BranchOnCond true` if TC <= UF * VF.

This is an alternative to D121899 which simplifies the VPlan directly
instead of doing so late in code-gen.

The potential benefit of doing this in VPlan is that this may help
cost-modeling in the future. The reason this is done in prepareToExecute
at the moment is that a single plan may be used for multiple VFs/UFs.

There are further simplifications that can be applied as follow ups:

1. Replace inductions with constants
2. Replace vector region with regular block.

Fixes #55354.

Depends on D126679.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D126680
2022-06-06 09:38:53 +01:00
Florian Hahn
416a5080d8
[VPlan] Update vector latch terminator edge to exit block after execution.
Instead of setting the successor to the exit using CFG.ExitBB, set it to
nullptr initially. The successor to the exit block is later set either
through createEmptyBasicBlock or after VPlan execution (because at the
moment, no block is created by VPlan for the exit block, the existing
one is reused).

This also enables BranchOnCond to be used as terminator for the exiting
block of the topmost vector region.

Depends on D126618.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D126679
2022-06-04 21:22:32 +01:00
Alexey Bataev
cac60940b7 [SLP]Improve shuffles cost estimation where possible.
Improved/fixed cost modeling for shuffles by providing masks, improved
cost model for non-identity insertelements.

Differential Revision: https://reviews.llvm.org/D115462
2022-06-03 08:06:22 -07:00
Benjamin Kramer
a8d2a381a2 [VPlan] Silence another unused variable warning in release builds 2022-06-03 14:07:56 +02:00
Benjamin Kramer
6b7c186390 [VPlan] Inline variable into assertion. NFC.
Avoids a warning in release builds
llvm/lib/Transforms/Vectorize/VPlanHCFGBuilder.cpp:311:14: warning: unused variable 'BrCond' [-Wunused-variable]
      Value *BrCond = Br->getCondition();
2022-06-03 13:59:48 +02:00
Florian Hahn
a5bb4a3b4d
[VPlan] Replace CondBit with BranchOnCond VPInstruction.
This patch removes CondBit and Predicate from VPBasicBlock. To do so,
the patch introduces a new branch-on-cond VPInstruction opcode to model
a branch on a condition explicitly.

This addresses a long-standing TODO/FIXME that blocks shouldn't be users
of VPValues. Those extra users can cause issues for VPValue-based
analyses that don't expect blocks. Addressing this fixme should allow us
to re-introduce 266ea446ab7476.

The generic branch opcode can also be used in follow-up patches.

Depends on D123005.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D126618
2022-06-03 11:48:31 +01:00
Fangrui Song
df0f30dc36 Revert "[SLP]Improve shuffles cost estimation where possible."
This reverts commit 9980c9971892378ea82475e000de8df210a58e69.

Caused assertion failures: https://reviews.llvm.org/D115462#3555350
2022-06-03 00:30:34 -07:00
Alexey Bataev
9980c99718 [SLP]Improve shuffles cost estimation where possible.
Improved/fixed cost modeling for shuffles by providing masks, improved
cost model for non-identity insertelements.

Differential Revision: https://reviews.llvm.org/D115462
2022-06-02 11:18:14 -07:00
Florian Hahn
4f1c86e3d5
[VPlan] Remove dead VPlan-native special case from BranchOnCount (NFC).
After 05776122b682684ad this special case doesn't exist any longer.
2022-06-02 12:07:54 +01:00
Alexey Bataev
73020b4540 Revert "[SLP]Improve shuffles cost estimation where possible."
This reverts commit fd5a6ce9dcb77b7821c95355d73af0b3b2020647 to fix
a crash detected by a buildbot
https://lab.llvm.org/buildbot/#/builders/179/builds/3805/steps/11/logs/stdio.
2022-06-01 15:44:51 -07:00
Florian Hahn
08482830eb
[LV] Update var name to Exiting, in line with terminology (NFC)
Recently the terminology used has been changed from Exit->Exiting in
line with common LLVM loop terminology. Update a remaining use of the
old terminology.
2022-06-01 22:13:29 +01:00
Alexey Bataev
fd5a6ce9dc [SLP]Improve shuffles cost estimation where possible.
Improved/fixed cost modeling for shuffles by providing masks, improved
cost model for non-identity insertelements.

Differential Revision: https://reviews.llvm.org/D115462
2022-06-01 11:01:37 -07:00
Alexey Bataev
fe4949942d [SLP]Fix PR55796: insert point for extractelements from different basic blocks.
Extractelement instructions may come from different basic blocks, need
to take it into account when looking for a last instruction in the
bundle to prevent compiler crash.

Differential Revision: https://reviews.llvm.org/D126777
2022-06-01 09:44:53 -07:00
Florian Hahn
05776122b6
[VPlan] Use region for each loop in native path.
This patch updates the VPlan native path to use VPRegionBlocks for all
loops in a loop nest. Up to now, only the outermost loop used a region.

This is a step towards unifying both paths and keep things consistent
between them. It also prepares various code-gen parts for modeling the
pre-header in the inner loop vectorizer (D121624).

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D123005
2022-06-01 10:41:05 +01:00
Florian Hahn
d157019482
[VPlan] Remove unused native utilities incompatible with nested regions.
The implementations of VPlanDominatorTree, VPlanLoopInfo and VPlanPredicator
are all incompatible with modeling loops in VPlans as region without
explicit back-edges.

Those pieces are not actively used and only exercised by a few gtest
unit tests. They are at the moment blocking progress towards unifying
the native and inner-loop vectorizer paths in D121624 and D123005.

I think we should not block forward progress on unused pieces of code,
so this patch removes the utilities for now. The plan is to re-introduce
them as needed in a way that is compatible with the unified VPlan scheme
used in both the inner loop vectorizer and the native path.

Reviewed By: sguggill

Differential Revision: https://reviews.llvm.org/D123017
2022-06-01 09:32:59 +01:00
Mel Chen
b0fc765350 [NFC] Change LoopVectorizationCostModel::useOrderedReductions() to be a const function.
Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D126200
2022-05-31 05:39:13 -07:00
Florian Hahn
6abce17fc2
[VPlan] Use Exiting-block instead of Exit-block terminology (NFC).
In LLVM's common loop terminology, an exit block is a block outside a
loop with a predecessor inside the loop. An exiting block is a block
inside the loop which branches to an exit block outside the loop.

This patch updates a few places where VPlan was using ExitBlock for a
block exiting a region. Those instances have been updated to use
ExitingBlock.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D126173
2022-05-28 21:16:05 +01:00