1248 Commits

Author SHA1 Message Date
skc7
e98501e27e [SLP][NFC] Added test to check resulting mask in shufflevector as per order of phinodes
Reviewed By: ABataev

Differential Revision: https://reviews.llvm.org/D136553
2022-10-23 17:20:47 +00:00
Alexey Bataev
456951dcd3 [SLP][NFC]Add a test for possible reordering gap in SLP, NFC. 2022-10-19 08:22:07 -07:00
Alexey Bataev
087dadfd37 [SLP]Generalize cost model.
Generalized the cost model estimation. Improved cost model estimation
for repeated scalars (no need to count their cost anymore), improved
  cost model for extractelement instructions.

cpu2017
   511.povray_r             0.57
   520.omnetpp_r           -0.98
   521.wrf_r               -0.01
   525.x264_r               3.59 <+
   526.blender_r           -0.12
   531.deepsjeng_r         -0.07
   538.imagick_r           -1.42
Geometric mean:  0.21

Differential Revision: https://reviews.llvm.org/D115757
2022-10-18 11:55:59 -07:00
Alexey Bataev
62267e8de0 Revert "[SLP]Generalize cost model."
This reverts commit f12fb91188b836e1bddb36bacbbdb8e4ab70b9b6 and
f5c747bfbe36b8f53e6fe2d85ffcaecba6d7153c to fix detected non-initialized
var use.
2022-10-18 11:25:59 -07:00
Arthur Eubanks
df92b05f1b [test] Remove redundant -passes flags 2022-10-18 09:57:06 -07:00
Alexey Bataev
f12fb91188 [SLP]Generalize cost model.
Generalized the cost model estimation. Improved cost model estimation
for repeated scalars (no need to count their cost anymore), improved
  cost model for extractelement instructions.

cpu2017
   511.povray_r             0.57
   520.omnetpp_r           -0.98
   521.wrf_r               -0.01
   525.x264_r               3.59 <+
   526.blender_r           -0.12
   531.deepsjeng_r         -0.07
   538.imagick_r           -1.42
Geometric mean:  0.21

Differential Revision: https://reviews.llvm.org/D115757
2022-10-18 08:49:32 -07:00
Alexey Bataev
c787986cdd [SLP]Improve costs of vectorized loads/stores by analyzing GEPs.
When generating masked gathers nodes, SLP vectorizer accounts the cost
of the GEPs for loads as part of the scalar-vector transformation cost
estimation. But it does not do it for vectorized loads/stores, while it
may completely remove some of the GEPs completely. Because of this in
some cases masked gather operation can be much more profitable rather
than regular vectorization (masked-gather cost + vector GEP - scalar
loads + GEPs comparing to vectorized loads - scalar loads).
Added the analysis of the removed scalarGEPs for vectorized load/store nodes for better cost estimation.

Differential Revision: https://reviews.llvm.org/D135282
2022-10-13 07:20:41 -07:00
Bjorn Pettersson
3be72f4029 [test][SLPVectorizer] Use -passes syntax in RUN lines. NFC 2022-10-13 10:44:38 +02:00
Alexey Bataev
d71ad41080 [SLP]Fix insertpoint of the extractellements instructions to avoid reshuffle crash.
Need to set the insertpoint for extractelement to point to the first
instruction in the node to avoid possible crash during external uses
combine  process. Without it we may endup with the incorrect
transformation.

Differential Revision: https://reviews.llvm.org/D135591
2022-10-12 08:18:30 -07:00
Alexey Bataev
1be3428ea0 [SLP]Fix PR58177: Improve isUndefVector function to avoid extra freeze.
Freeze instruction in some cases makes codegen worse, so need to be very
careful when emitting it. Instead improve analysis in isUndefVector
function to generate mask of unused elements and use it in the analysis.

Differential Revision: https://reviews.llvm.org/D135382
2022-10-12 07:32:54 -07:00
Arthur Eubanks
f3a928e233 [opt] Don't translate legacy -analysis flag to require<analysis>
Tests relying on this should explicitly use -passes='require<analysis>,foo'.
2022-10-07 14:54:34 -07:00
Alexey Bataev
323ed2308a [SLP]Improve/fix CSE analysis of the blocks/instructions.
Added analysis for invariant extractelement instructions and improved
detection of the CSE blocks for generated extractelement instructions.

Differential Revision: https://reviews.llvm.org/D135279
2022-10-06 12:08:48 -07:00
Alexey Bataev
d7c85d7e34 [SLP][NFC]Add a test for CSE for extractelements. 2022-10-05 07:55:25 -07:00
Alexey Bataev
ab9a81f736 [SLP]Try to emit canonical shuffle with undef operand.
In the canonical form of the shuffle the poison/undef operand is the
second operand, the patch tries to emit canonical form for partial
vectorization of the buildvector sequence.
Also, this patch starts emitting freeze instruction for shuffles with undef indices if the second shuffle operan is undef, not poison. It is an initial step to D93818, where undef mask element are treated as returning poison value.

Differential Revision: https://reviews.llvm.org/D134377
2022-10-04 08:16:07 -07:00
Arthur Eubanks
e23aee7175 [test] Update some legacy PM tests 2022-09-30 11:31:02 -07:00
Simon Pilgrim
5fc7bbfaa3 [SLP][X86] Add test coverage for Issue #58054 2022-09-30 13:26:31 +01:00
Simon Pilgrim
5849fcb635 Revert rG1b7089fe67b924bdd5ecef786a34bdba7a88778f "[SLP] Add ScalarizationOverheadBuilder helper to track vector extractions"
Revert rGef89409a59f3b79ae143b33b7d8e6ee6285aa42f "Fix 'unused-lambda-capture' gcc warning. NFCI."
Revert rG926ccfef032d206dcbcdf74ca1e3a9ebf4d1be45 "[SLP] ScalarizationOverheadBuilder - demand all elements for scalarization if the extraction index is unknown / out of bounds"

Revert ScalarizationOverheadBuilder sequence from D134605 - when accumulating extraction costs by Type (instead of specific Value), we are not distinguishing enough when they are coming from the same source or not, and we always just count the cost once. This needs addressing before we can use getScalarizationOverhead properly.
2022-09-30 11:22:48 +01:00
Simon Pilgrim
19782a46f8 [SLP][X86] Add test case for crash reported on D134605 2022-09-30 11:07:54 +01:00
Simon Pilgrim
28a3fc39a6 [SLP][AArch64] Add test case for vectorization regression case reported on D134605 2022-09-30 11:07:54 +01:00
Philip Reames
02bfe2de7c [RISCV] Adjust vector immediate store materialization cost
This change updates the costs to make constant pool loads match their actual cost, and adds the broadcast special case to avoid too many regressions. We really need more information about the constants being rematerialized, but this is an incremental improvement.

Differential Revision: https://reviews.llvm.org/D134746
2022-09-29 07:37:13 -07:00
Philip Reames
17f2ee804a [RISCV][SLP] Add test coverage for stores of constants 2022-09-27 07:53:33 -07:00
Simon Pilgrim
1b7089fe67 [SLP] Add ScalarizationOverheadBuilder helper to track vector extractions
Instead of accumulating all extraction costs separately and then adjusting for repeated subvector extractions, this patch collects all the extractions and then converts to calls to getScalarizationOverhead to improve the accuracy of the costs.

I'm not entirely satisfied with the getExtractWithExtendCost handling yet - this still just adds all the getExtractWithExtendCost costs together - it really needs to be replaced with a "getScalarizationOverheadWithExtend", but that will require further refactoring first.

This replaces my initial attempt in D124769.

Differential Revision: https://reviews.llvm.org/D134605
2022-09-27 14:49:07 +01:00
Simon Pilgrim
200fe18c5c [SLP][X86] Add missing triple from pr49081.ll test 2022-09-25 16:22:42 +01:00
Simon Pilgrim
a6e9141505 [TTI] Add OperandValueProperties::OP_NegatedPowerOf2 enum (PR51436)
The mul by constant costmodels handle power-of-2 constants, but not negated-power-of-2, despite the backends handling both.

This patch adds the OperandValueProperties::OP_NegatedPowerOf2 enum and wires it for use for basic mul cost analysis and SLP handling.

Fixes #50778

Differential Revision: https://reviews.llvm.org/D111968
2022-09-23 14:03:18 +01:00
Simon Pilgrim
b2cd8118d0 [CostModel][X86] Add CostKinds handling for smax/smin/umax/umin instructions
This was achieved with the 'cost-tables vs llvm-mca' script D103695
2022-09-22 10:19:23 +01:00
Alexey Bataev
e664dea182 [SLP]Fix write-after-bounds.
Mask might be larger than the NumElts-OffsetBeg, need to use actual
indices to avoid acces out of bounds.
2022-09-21 08:00:15 -07:00
Vitaly Buka
bbef90ace4 [IRBuilder] Use PoisonValue in CreateMasked*
Followup to 72b776168c7c80d2035c7226488462dcffc97e75

Reviewed By: nlopes

Differential Revision: https://reviews.llvm.org/D133967
2022-09-19 11:01:41 -07:00
Sebastian Peryt
99c9b37d11 [NFC][1/n] Remove -enable-new-pm=0 flags from lit tests
This is the first patch in a series intended for removing flag
-enable-new-pm=0 from lit tests. This is part of a bigger
effort of completely removing legacy code related to legacy
pass manager in favor of currently default new pass manager.

In this patch flag has been removed only from tests where no significant
change has been required because checks has been duplicated for
both PMs.

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D134150
2022-09-19 09:57:37 -07:00
Simon Pilgrim
2538adde5c [CostModel][X86] Add CostKinds handling for cttz
This was achieved with the 'cost-tables vs llvm-mca' script D103695
2022-09-19 15:57:03 +01:00
Simon Pilgrim
d90a42d64c [CostModel][X86] Add CTLZ_ZERO_UNDEF/CTTZ_ZERO_UNDEF cost handling
Without LZCNT/BMI, the *_ZERO_UNDEF costs are cheaper as they can avoid the zero handling.
2022-09-19 14:06:33 +01:00
Simon Pilgrim
5665d0941a [SLP][X86] Add AVX512 test coverage to CTLZ/CTTZ tests
Only AVX512 has decent CTTZ/CTLZ vector ops, add tests to ensure we definitely vectorize these
2022-09-19 13:07:55 +01:00
Alexey Bataev
5d13b12674 [SLP]Improve isUndefVector function by adding insertelement analysis.
Added the mask and the analysis of the buildvector sequence in the
isUndefVector function, improves codegen and cost estimation.

Metric: SLP.NumVectorInstructions

Program                                                                                       SLP.NumVectorInstructions
                                                                          results                   results0 diff
test-suite :: External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 27362.00                  27360.00 -0.0%

Metric: size..text

Program                                                                                                           size..text
                                                                   results     results0    diff
test-suite :: External/SPEC/CFP2017rate/508.namd_r/508.namd_r.test   805299.00   806035.00  0.1%

526.blender_r - some extra code is vectorized.
508.namd_r - some extra code is optimized out.

Differential Revision: https://reviews.llvm.org/D133891
2022-09-16 14:36:38 -07:00
Simon Pilgrim
23cb1c42cd [CostModel][X86] Update throughput costs for CTLZ ops
This was achieved with an updated version of the 'cost-tables vs llvm-mca' script D103695 (and recent fixes to the bdver2 + alderlake models)

Adding full CostKinds costs are affecting some other tests as they make assumptions about SizeLatency costs, so they need addressing first
2022-09-16 16:56:49 +01:00
Simon Pilgrim
94620e4fc3 [CostModel][X86] Add CostKinds handling for vector shift by generic/non-uniform shift amounts
These are the worst case generic vector shift costs, where nothing is known about the shift amounts - in particular this should stop us using the default sizelatency cost of 1 for so many pre-AVX2 vector shifts that can often actually expand during lowering to +20 uops, just for 128-bit vectors, resulting in some horrible inline/unroll decisions.

This was achieved with an updated version of the 'cost-tables vs llvm-mca' script D103695 (I'll update the patch soon for reference)
2022-09-15 16:51:58 +01:00
Valery N Dmitriev
18dde772d6 [SLP] Unify main/alternate selection for CmpInst instructions
Make main/alternate operation selection logic for CmpInst
consistent across SLP vectorizer.

Differential Revision: https://reviews.llvm.org/D133430
2022-09-13 09:20:25 -07:00
Florian Hahn
3fd1cc2574
[SLP] Add Preheader to CSE blocks after hoisting CSE-able instrs.
Adding the pre-header to CSEBlocks ensures instructions are CSE'd even
after hoisting.

This was original discovered by @atrick a while ago.

Reviewed By: ABataev

Differential Revision: https://reviews.llvm.org/D133649
2022-09-12 15:53:31 +01:00
Alexey Bataev
dfe1e9dd79 [SLP]Improve reordering of clustered reused scalars.
If the reused scalars are clustered, i.e. each part of the reused mask
contains all elements of the original scalars exactly once, we can
reorder those clusters to improve the whole ordering of of the clustered
vectors.

Differential Revision: https://reviews.llvm.org/D133524
2022-09-12 06:52:25 -07:00
Florian Hahn
3ff7892005
[SLP] Add test case showing missing CSE in hoisted instructions. 2022-09-10 20:55:04 +01:00
Simon Pilgrim
10edf88458 [CostModel][X86] Update CTPOP costs
With the bdver2 model updates, many of the AVX1 costs were far too high - it also helped expose some costs mismatches for Atom/Silvermont
2022-09-10 17:57:20 +01:00
Alexey Bataev
982d9ef1c1 [SLP]Fix PR55734: SLP vectorizer's reduce_and formation introduces poison.
Need either follow the original order of the operands for bool logical
ops, or emit freeze instruction to avoid poison propagation.

Differential Revision: https://reviews.llvm.org/D126877
2022-09-01 05:34:45 -07:00
Florian Hahn
e68594ca20
[SLP] Add FMA test case with missing or partial fast-math flags.
Add extra FMA tests with missing or partial fast-math flags.
2022-08-31 16:25:18 +01:00
Alexey Bataev
ec06df9459 [SLP]Fix PR57447: Assertion `!getTreeEntry(V) && "Scalar already in tree!"' failed.
The pointer operands for the ScatterVectorize node may contain
non-instruction values and they are not checked for "already being
vectorized". Need to check that such pointers are already vectorized and
gather them instead of trying to build vectorize node to avoid compiler
crash.

Differential Revision: https://reviews.llvm.org/D132949
2022-08-30 12:30:14 -07:00
Valery N Dmitriev
329b972d41 [SLP] Try to match reductions before trying to vectorize a vector build sequence.
This patch changes order of searching for reductions vs other vectorization possibilities.
The idea is if we do not match a reduction it won't be harmful for further attempts to
find vectorizable operations on a vector build sequences. But doing it in the opposite
order we have good chance to ruin opportunity to match a reduction later.
We also don't want to try vectorizing binary operations too early as 2-way vectorization
may effectively prohibit wider ones leading to producing less effective code.

Differential Revision: https://reviews.llvm.org/D132590
2022-08-29 13:32:14 -07:00
Alexey Bataev
beacf9bd9e [SLP]Fix PR57322: vectorize constant float stores.
Stores for constant floats must be vectorized, improve analysis in SLP
vectorizer for stores.

Differential Revision: https://reviews.llvm.org/D132750
2022-08-29 11:02:53 -07:00
Florian Hahn
3c5e24a51c
[SLP] Add tests showing over-eager SLP when scalar fma can be used.
Add test cases for AArch64 that show over-eager SLP vectorization on
AArch64, where keeping the things scalar allows efficient lowering using
scalar fmas.
2022-08-29 18:58:56 +01:00
Alexey Bataev
e6345bf644 [SLP]Improve lookup of the buildvector top insertelement instruction.
When estimating the cost of the in-tree vectorized scalars in
buildvector sequences, need to take into account the vectorized
insertelement instruction. The top of the buildvector seuences is the
topmost vectorized insertelement instruction, because it will have
> than 1 use after the vectorization.

For the affected test case improves througput from 21 to 16 (per
llvm-mca).

Differential Revision: https://reviews.llvm.org/D132740
2022-08-29 08:19:52 -07:00
Philip Reames
a310637132 [RISCV] Disable SLP vectorization by default due to unresolved profitability issues
This change implements a TTI query with the goal of disabling slp vectorization on RISCV. The current default configuration disables SLP already, but its current tied to the ability to lower fixed length vectors. Over in D131508, I want to enable fixed length vectors for purposes of LoopVectorizer, but preliminary analysis has revealed a couple of SLP specific issues we need to resolve before enabling it by default. This change exists to allow us to enable LV without SLP.

Differential Revision: https://reviews.llvm.org/D132680
2022-08-26 14:11:22 -07:00
Alexey Bataev
c7b815a510 [SLP][NFC]Add a test for vectorization of stores with float constants,
NFC.
2022-08-26 10:22:07 -07:00
Alexey Bataev
af8477f2bc [SLP][NFC]Add a test for incorrectly calculated cost for extracted
buildvector sequence, NFC.
2022-08-26 06:29:15 -07:00
Valery N Dmitriev
f9ceb71542 [SLP][NFC] Add a coverage test for horizontal reductions.
Reduction feeds single insertelement instruction.
2022-08-25 16:02:22 -07:00