2019 Commits

Author SHA1 Message Date
Han-Kuan Chen
ead3a2f598
[SLP][REVEC] getScalarizationOverhead should not be used when ScalarTy is FixedVectorType. (#117536) 2024-11-26 22:05:54 +08:00
Alexey Bataev
76f0ff8210 [SLP]Add an extra check to avoid infinite vectorization attempts
Added extra check for the cost of the buildvector if the -slp-threshold
option is used. Prevents infinite vectorization attempts.
2024-11-25 14:27:44 -08:00
Alexey Bataev
f953b5eb72 [SLP]Relax assertion about subvectors mask size
SubVectorsMask might be less than CommonMask, if the vectors with larger
number of elements are permuted or reused elements are used. Need to
consider this when estimation/building the vector to avoid compiler
crash

Fixes #117518
2024-11-25 08:31:42 -08:00
Alexey Bataev
57bbdbd7ae [SLP]Relax assertion in mask combine for non-power-of-2 number of elements
The nodes may contain non-power-of-2 number of elements. Need to relax
the assertion to avoid possible compiler crash

Fixes #117517
2024-11-25 07:58:19 -08:00
Alexey Bataev
7523086a05
[SLP]Use getExtendedReduction cost and fix reduction cost calculations
Patch uses getExtendedReduction for reductions of ext-based nodes + adds
cost estimation for ctpop-kind reductions into basic implementation and
RISCV-V specific vcpop cost estimation.

Reviewers: RKSimon, preames

Reviewed By: preames

Pull Request: https://github.com/llvm/llvm-project/pull/117350
2024-11-22 16:12:53 -05:00
Alexey Bataev
b8703369da
[SLP] Match poison as instruction with the same opcode
Patch allows to vector scalar instruction + poison values as if poisons
are instructions with the same opcode. It allows better vectorization of
the repeated values, reduces number of insertelement instructions and
serves as a base ground for copyable elements vectorization

AVX512, -O3 + LTO

JM/ldecod - better vector code
Applications/oggenc - better vectorization
CINT2017speed/625.x264_s
CINT2017rate/525.x264_r - better vector code
CFP2017rate/526.blender_r - better vector code
CFP2006/447.dealII - small variations
Benchmarks/Bullet - extra vector code
CFP2017rate/510.parest_r - better vectorization
CINT2017rate/502.gcc_r
CINT2017speed/602.gcc_s - extra vector code
Benchmarks/tramp3d-v4 - small variations
CFP2006/453.povray - extra vector code
JM/lencod - better vector code
CFP2017rate/511.povray_r - extra vector code
MemFunctions/MemFunctions - extra vector code
LoopVectorization/LoopVectorizationBenchmarks - extra vector code
XRay/FDRMode - extra vector code
XRay/ReturnReference - extra vector code
LCALS/SubsetCLambdaLoops - extra vector code
LCALS/SubsetCRawLoops - extra vector code
LCALS/SubsetARawLoops - extra vector code
LCALS/SubsetALambdaLoops - extra vector code
DOE-ProxyApps-C++/miniFE - extra vector code
LoopVectorization/LoopInterleavingBenchmarks - extra vector code
LCALS/SubsetBLambdaLoops - extra vector code
MicroBenchmarks/harris - extra vector code
ImageProcessing/Dither - extra vector code
MicroBenchmarks/SLPVectorization - extra vector code
ImageProcessing/Blur - extra vector code
ImageProcessing/Dilate - extra vector code
Builtins/Int128 - extra vector code
ImageProcessing/Interpolation - extra vector code
ImageProcessing/BilateralFiltering - extra vector code
ImageProcessing/AnisotropicDiffusion - extra vector code
MicroBenchmarks/LoopInterchange - extra code vectorized
LCALS/SubsetBRawLoops - extra code vectorized
CINT2006/464.h264ref - extra vectorization with wider vectors
CFP2017rate/508.namd_r - small variations, extra phis vectorized
CFP2006/444.namd - 2 2 x phi replaced by 4 x phi
DOE-ProxyApps-C/SimpleMOC - extra code vectorized
CINT2017rate/541.leela_r
CINT2017speed/641.leela_s - the function better vectorized and inlined
Benchmarks/Misc/oourafft - 2 4 x bit reductions replaced by 2 x vector code
FreeBench/fourinarow - better vectorization

Reviewers: RKSimon

Reviewed By: RKSimon

Pull Request: https://github.com/llvm/llvm-project/pull/115946
2024-11-22 16:10:17 -05:00
Alexey Bataev
9c9e030fba [SLP][NFC]Add a test with the RISCV ctpop-based reduction 2024-11-22 09:25:00 -08:00
Han-Kuan Chen
39913ae095
[SLP][REVEC] Make reorderTopToBottom support ShuffleVectorInst. (#117310)
We don't want reorderTopToBottom to reorder ShuffleVectorInst (because
ShuffleVectorInst currently supports only a limited set of patterns).
Either we make ShuffleVectorInst support more patterns, or we let
ReorderIndices reorder the result of the vectorization of
ShuffleVectorInst. We choose the latter solution.
2024-11-23 01:20:57 +08:00
Alexey Bataev
14bdcefbd8 [SLP]Model reduction_add(ext(<n x i1>)) as ext(ctpop(bitcast <n x i1> to int n))
Currently sequences reduction_add(ext(<n x i1>)) are modeled as vector
extensions + reduction add, but later instcombiner transforms it into
ext(ctcpop(bitcast <n x i1> to int n)). Patch adds direct support for
this in SLP vectorizer, which enables better cost estimation.

AVX512, -O3+LTO

CINT2006/445.gobmk - extra vector code
Prolangs-C/bison - extra vector code
Benchmarks/NPB-serial/is - 16 x + 8 x reductions vectorized as 24
x reduction

Reviewers: RKSimon

Reviewed By: RKSimon

Pull Request: https://github.com/llvm/llvm-project/pull/116875
2024-11-22 06:50:25 -08:00
Alexey Bataev
07507cb591 [SLP]Fix shuffling of entries of the different sizes
Need to choose the size of vector factor for mask based on the entries
vector factors, not mask size, to generate correct code.

Fixes #117170
2024-11-21 13:08:27 -08:00
Alexey Bataev
b62557aaeb Revert "[SLP]Model reduction_add(ext(<n x i1>)) as ext(ctpop(bitcast <n x i1> to int n))"
This reverts commit 0298c5921d3b9fbeb5fefc2555321ea82ade6090 to fix
a buildbot crash reported by https://lab.llvm.org/buildbot/#/builders/113/builds/4079.
2024-11-21 12:52:55 -08:00
Alexey Bataev
0298c5921d
[SLP]Model reduction_add(ext(<n x i1>)) as ext(ctpop(bitcast <n x i1> to int n))
Currently sequences reduction_add(ext(<n x i1>)) are modeled as vector
extensions + reduction add, but later instcombiner transforms it into
ext(ctcpop(bitcast <n x i1> to int n)). Patch adds direct support for
this in SLP vectorizer, which enables better cost estimation.

AVX512, -O3+LTO

CINT2006/445.gobmk - extra vector code
Prolangs-C/bison - extra vector code
Benchmarks/NPB-serial/is - 16 x + 8 x reductions vectorized as 24
x reduction

Reviewers: RKSimon

Reviewed By: RKSimon

Pull Request: https://github.com/llvm/llvm-project/pull/116875
2024-11-21 13:21:00 -05:00
Alexey Bataev
58c8d73172 [SLP][NFC]Add a test with multi reductions, NFC 2024-11-21 09:48:19 -08:00
Sushant Gokhale
197fb270cc
[AArch64][NFC] NFC for const vector as Instruction operand (#116790)
Current cost-modelling does not take into account cost of materializing
const vector. This results in some cases, as the test shows, being
vectorized but this may not always be profitable. Future patch will try
to address this issue.
2024-11-21 10:23:05 +05:30
Han-Kuan Chen
a62c5497c9
[SLP][REVEC] The vectorized result for ShuffleVector may not be ShuffleVectorInst. (#116940) 2024-11-20 23:59:23 +08:00
Alexey Bataev
79682c4d57 [SLP]Check if the buildvector root is not a part of the graph before deletion
If the buildvector root has no uses, it might be still needed as a part
of the graph, so need to check that it is not a part of the graph before
deletion.

Fixes #116852
2024-11-19 11:31:40 -08:00
Sushant Gokhale
7e85cb8a8a
[AArch64][NFC] Add test as a representative of scalarizing a vector i… (#114107)
…nteger division

The last resort to vectorize a bundle of integer divisions is considered
scalarizing it. Currently, the cost estimates for scalarizing a vector
division can be considerably overestimated as is the scenario with this
motivating test case i.e. vector cost should not deviate much from the
scalar cost.

Future patch will try to improve the scalarization cost.
2024-11-19 13:52:56 +05:30
Alexey Bataev
ad9c0b369e [SLP]Check if the gathered loads form full vector before attempting build it
Need to check that the number of gathered loads in the slice forms the
build vector to avoid compiler crash.

Fixes #116691
2024-11-18 14:09:31 -08:00
Alexey Bataev
f6e1d64458
[SLP]Enable interleaved stores support
Enables interaleaved stores, results in better estimation for segmented
stores for RISC-V

Reviewers: preames, topperc, RKSimon

Reviewed By: RKSimon

Pull Request: https://github.com/llvm/llvm-project/pull/115354
2024-11-15 11:01:57 -05:00
Alexey Bataev
af3295bd3d
[SLP]Enable splat ordering for loads
Enables splat support for loads with lanes> 2 or number of operands> 2.

Allows better detect splats of loads and reduces number of shuffles in
some cases.

X86, AVX512, -O3+LTO

Metric: size..text
                                                                          results     results0    diff
               test-suite :: External/SPEC/CFP2006/433.milc/433.milc.test   154867.00   156723.00  1.2%
 test-suite :: External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 12467735.00 12468023.00  0.0%

Better vectorization quality

Reviewers: RKSimon

Reviewed By: RKSimon

Pull Request: https://github.com/llvm/llvm-project/pull/115173
2024-11-15 10:29:43 -05:00
Alexey Bataev
058ac837bc [SLP]Use generic createShuffle for buildvector
Use generic createShuffle function, which know how to adjust the vectors
correctly, to avoid compiler crash when trying to build a buildvector as
a shuffle

Fixes #115732
2024-11-11 10:49:39 -08:00
Han-Kuan Chen
3cdd86bb47
[SLP][REVEC] Make GetMinMaxCost support FixedVectorType when REVEC is enabled. (#115417) 2024-11-10 13:53:15 +08:00
Tex Riddell
818d715989
[Analysis] atan2: isTriviallyVectorizable; add to massv and accelerate veclibs (#113637)
This change is part of this proposal:
https://discourse.llvm.org/t/rfc-all-the-math-intrinsics/78294

- Return true for atan2 from isTriviallyVectorizable
- Add atan2 to VecFuncs.def for massv and accelerate libraries.
- Add atan2 to hasOptimizedCodeGen
- Add atan2 support in llvm/lib/Analysis/ValueTracking.cpp
llvm::getIntrinsicForCallSite and update vectorization tests
- Add atan2 name check to isLoweredToCall in
llvm/include/llvm/Analysis/TargetTransformInfoImpl.h
- Note: there's no test coverage for these names in isLoweredToCall, except that Transforms/TailCallElim/inf-recursion.ll is impacted by the "fabs" case

Thanks to @jroelofs for the atan2 accelerate veclib and associated test
additions, plus the hasOptimizedCodeGen addition.

Part of: Implement the atan2 HLSL Function #70096.
2024-11-08 16:07:38 -08:00
Alexey Bataev
77bec78878 [SLP]Do not look for last instruction in schedule block for buildvectors
If looking for the insertion point for the node and the node is
a buildvector node, the compiler should not use scheduling info for such
nodes, they may contain only partial info, which is not fully correct
and may cause compiler crash.

Fixes #114082
2024-11-08 06:55:29 -08:00
Alexey Bataev
62db1c8a07 [SLP]Better decision making on whether to try stores packs for vectorization
Since the stores are sorted by distance, comparing the indices in the
original array and early exit, if the index is less than the index of
the last store, not always the best strategy. Better to remove such
stores explicitly to try better to check for the vectorization
opportunity.

Fixes #115008
2024-11-07 14:23:15 -08:00
Alexey Bataev
dec3839979 [SLP][NFC]Add a test with the missed vectorization opportunity for stores with same address 2024-11-07 13:53:23 -08:00
Kazu Hirata
22b4b1ab10 Revert "[SLP][REVEC] Make GetMinMaxCost support FixedVectorType when REVEC is enabled. (#114946)"
This reverts commit f58757b8dc167809b69ec00f9b5ab59281df0902.

Failing buildbots:
https://lab.llvm.org/buildbot/#/builders/174/builds/8058
https://lab.llvm.org/buildbot/#/builders/127/builds/1357
2024-11-07 10:43:11 -08:00
Han-Kuan Chen
f58757b8dc
[SLP][REVEC] Make GetMinMaxCost support FixedVectorType when REVEC is enabled. (#114946) 2024-11-08 00:52:59 +08:00
Alexey Bataev
79fd615759 [SLP][NFC]Add a test with the segmented loads, NFC 2024-11-07 07:08:24 -08:00
Luke Lau
343a810725
[RISCV] Allow f16/bf16 with zvfhmin/zvfbfmin as legal strided access (#115264)
This is also split off from the zvfhmin/zvfbfmin
isLegalElementTypeForRVV work.

Enabling this will cause SLP and RISCVGatherScatterLowering to emit
@llvm.experimental.vp.strided.{load,store} intrinsics, and codegen
support for this was added in #109387 and #114750.
2024-11-07 14:40:15 +08:00
Han-Kuan Chen
c6091cdbed
[SLP][REVEC] Make shufflevector can be vectorized with ReorderIndices and ReuseShuffleIndices. (#114965) 2024-11-07 11:04:34 +08:00
Alexey Bataev
76422385c3
[SLP]Support reordered buildvector nodes for better clustering
Patch adds reordering of the buildvector nodes for better clustering of
the compatible operations and future vectorization. Includes basic cost
estimation and if the transformation is not profitable - reverts it.

AVX512, -O3+LTO
Metric: size..text

Program                                                                          size..text
                                                                                       results     results0    diff
                        test-suite :: External/SPEC/CINT2006/401.bzip2/401.bzip2.test    74565.00    75701.00  1.5%
                test-suite :: External/SPEC/CINT2017rate/541.leela_r/541.leela_r.test    75773.00    76397.00  0.8%
               test-suite :: External/SPEC/CINT2017speed/641.leela_s/641.leela_s.test    75773.00    76397.00  0.8%
               test-suite :: External/SPEC/CFP2017rate/510.parest_r/510.parest_r.test  2014462.00  2024494.00  0.5%
                         test-suite :: MultiSource/Applications/JM/ldecod/ldecod.test   395219.00   396979.00  0.4%
                         test-suite :: MultiSource/Applications/JM/lencod/lencod.test   857795.00   859667.00  0.2%
                    test-suite :: External/SPEC/CINT2006/464.h264ref/464.h264ref.test   800472.00   802440.00  0.2%
                       test-suite :: External/SPEC/CFP2006/447.dealII/447.dealII.test   590699.00   591403.00  0.1%
        test-suite :: MultiSource/Benchmarks/MiBench/consumer-lame/consumer-lame.test   203006.00   203102.00  0.0%
            test-suite :: MultiSource/Benchmarks/DOE-ProxyApps-C/miniGMG/miniGMG.test    42408.00    42424.00  0.0%
            test-suite ::  External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 12451575.00  12451927.00  0.0%
            test-suite :: External/SPEC/CFP2017speed/638.imagick_s/638.imagick_s.test  1396480.00  1396448.00 -0.0%
             test-suite :: External/SPEC/CFP2017rate/538.imagick_r/538.imagick_r.test  1396480.00  1396448.00 -0.0%
                        test-suite :: MultiSource/Benchmarks/7zip/7zip-benchmark.test  1047708.00  1047580.00 -0.0%
        test-suite :: MultiSource/Benchmarks/MiBench/consumer-jpeg/consumer-jpeg.test   111344.00   111328.00 -0.0%
                test-suite :: External/SPEC/CINT2006/400.perlbench/400.perlbench.test  1087660.00  1087500.00 -0.0%
       test-suite :: MultiSource/Benchmarks/Prolangs-C/TimberWolfMC/timberwolfmc.test   280664.00   280616.00 -0.0%
                          test-suite :: MultiSource/Applications/sqlite3/sqlite3.test   502646.00   502006.00 -0.1%
                      test-suite :: MultiSource/Benchmarks/tramp3d-v4/tramp3d-v4.test  1033135.00  1031567.00 -0.2%
        test-suite :: External/SPEC/CINT2017rate/500.perlbench_r/500.perlbench_r.test  2070917.00  2065845.00 -0.2%
       test-suite :: External/SPEC/CINT2017speed/600.perlbench_s/600.perlbench_s.test  2070917.00  2065845.00 -0.2%
                        test-suite :: External/SPEC/CINT2006/473.astar/473.astar.test    33893.00    33797.00 -0.3%
          test-suite :: MultiSource/Benchmarks/MiBench/telecomm-gsm/telecomm-gsm.test    39677.00    39549.00 -0.3%
                 test-suite :: MultiSource/Benchmarks/mediabench/gsm/toast/toast.test    39674.00    39546.00 -0.3%
test-suite :: MultiSource/Benchmarks/MiBench/security-blowfish/security-blowfish.test    11560.00    11512.00 -0.4%
                 test-suite :: External/SPEC/CINT2017speed/625.x264_s/625.x264_s.test   653867.00   649275.00 -0.7%
                  test-suite :: External/SPEC/CINT2017rate/525.x264_r/525.x264_r.test   653867.00   649275.00 -0.7%

CINT2006/401.bzip2 - extra code vectorized
CINT2017rate/541.leela_r
CINT2017speed/641.leela_s - function
_ZN9FastBoard25get_pattern3_augment_specEiib not inlined anymore, better
vectorization
CFP2017rate/510.parest_r - better vectorization
JM/ldecod - better vectorization
JM/lencod - same
CINT2006/464.h264ref - extra code vectorized
CFP2006/447.dealII - extra vector code
MiBench/consumer-lame - vectorized 2 loops previously scalar
DOE-ProxyApps-C/miniGMG - small changes
Benchmarks/7zip - extra code vectorized, better vectorization
CFP2017rate/526.blender_r - extra vectorization
CFP2017speed/638.imagick_s
CFP2017rate/538.imagick_r - extra vectorization
MiBench/consumer-jpeg - extra vectorization
CINT2006/400.perlbench - extra vectorization
Prolangs-C/TimberWolfMC - small variations
Applications/sqlite3 - extra function vectorized and inlined
Benchmarks/tramp3d-v4 - extra code vectorized
CINT2017rate/500.perlbench_r
CINT2017speed/600.perlbench_s - extra code vectorized, function digcpy gets
vectorized and inlined
CINT2006/473.astar - extra code vectorized
MiBench/telecomm-gsm - extra code vectorized, better vector code
mediabench/gsm - same
MiBench/security-blowfish - extra code vectorized
CINT2017speed/625.x264_s
CINT2017rate/525.x264_r - sub4x4_dct function vectorized and gets
inlined

RISCV-V, SiFive-p670, O3+LTO

CFP2017rate/510.parest_r - extra vectorization
CFP2017rate/526.blender_r - extra vectorization
MiBench/consumer-lame - extra vectorized code

Reviewers: RKSimon

Reviewed By: RKSimon

Pull Request: https://github.com/llvm/llvm-project/pull/114284
2024-11-06 10:51:15 -05:00
Paul Walker
38fffa630e
[LLVM][IR] Use splat syntax when printing Constant[Data]Vector. (#112548) 2024-11-06 11:53:33 +00:00
Alexey Bataev
c1cec8c0dc [SLP][NFC]Add a test with missed splat ordering for loads, NFC 2024-11-05 14:08:17 -08:00
Alexey Bataev
0c18def2c1 [SLP]Allow interleaving check only if it is less than number of elements
Need to check if the interleaving factor is less than total number of
elements in loads slice to handle it correctly and avoid compiler crash.

Fixes report https://github.com/llvm/llvm-project/pull/112361#issuecomment-2457227670
2024-11-05 07:06:15 -08:00
Alexey Bataev
899336735a [SLP]Be more pessimistic about poisonous reductions
Consider all possible reductions ops as being non-poisoning boolean
logical operations, which require freeze to be fully correct.

https://alive2.llvm.org/ce/z/TKWDMP

Fixes #114738
2024-11-04 06:13:52 -08:00
Alexey Bataev
a15bf88d53 [SLP][NFC]Add a test with missing freeze instruction before reduction, NFC 2024-11-04 04:38:09 -08:00
Simon Pilgrim
ac1869aa70
[CostModel][X86] Add initial costs for non-lane-crossing one/two input shuffles (#114680)
Most of the x86 shuffle instructions operate within each 128-bit subvector lane, but our shuffle costs struggle to handle this and have to fallback to worst case shuffles that reference elements from any lane.

This patch detects shuffle masks that we know are "inlane" and enable us to assume a cheaper shuffle cost.
2024-11-04 10:19:02 +00:00
Han-Kuan Chen
a795a18bba
[SLP][REVEC] VF should be scaled when ScalarTy is FixedVectorType. (#114551) 2024-11-02 03:03:52 +08:00
Han-Kuan Chen
e4aeeba84c
[SLP][REVEC] When ScalarTy is FixedVectorType, the insertion index should consider the number of elements of ScalarTy. (#114526) 2024-11-01 21:17:57 +08:00
Alexey Bataev
e05def081e
[SLP]Do not vectorize code in EH and non-returning blocks
The code in EH and non-returning blocks can be skipped by the
vectorizer, since it does not add to the perfromance, just consumes
compile/link time.

Reviewers: RKSimon

Reviewed By: RKSimon

Pull Request: https://github.com/llvm/llvm-project/pull/112221
2024-10-31 13:50:02 -04:00
Alexey Bataev
19a34dded7
[SLP]Do not account external uses in EH block and in non-returning blocks
No need to account the cost of the external uses in EH and non-returning
basic blocks.

Reviewers: RKSimon

Reviewed By: RKSimon

Pull Request: https://github.com/llvm/llvm-project/pull/112045
2024-10-31 13:23:43 -04:00
Alexey Bataev
e7080fd735 [SLP]Extra check if the intruction matked for removal, must be replaced in reduction ops
If the instruction is vectorized and it is a part of the reduced values
gather/buildvector node, it should replaced in reduced operation
instructions before removal properly, to avoid compiler crash.

Fixes #114371
2024-10-31 09:59:35 -07:00
Matthias Braun
255e441613
X86: Do not return invalid cost for fp16 conversion (#114128)
Returning invalid instruction costs when converting from/to fp16 in
`X86TTIImpl::getCastInstrCost` when there is no hardware support
available was triggering asserts. This changes the code to return a
large (arbitrary) number to model the fact that libcalls are used to
implement the conversion.

This also simplifies the code by only reporting costs for the scalar
fp16 conversion; vectorized costs being left to the fallback assuming
scalarization.

This is a follow-up to assertion issues reported for the changes in
#113195
2024-10-29 17:16:17 -07:00
Sushant Gokhale
c9f01f699c
[SLP][AArch64][NFC] Add more tests for SLP vectorization of div (#113876)
Currently, we dont have much tests that show SLP outcome for integer
divisions. This patch adds tests for same.

In certain scenarios, for Neon, vectorization is profitable. An attempt
would be made in future to improve the cost-model for the same.
2024-10-28 20:37:41 +05:30
Alexey Bataev
7152bf3bc8 [SLP]Do not create new vector node if scalars fully overlap with the existing one
If the list of scalars vectorized as the part of the same vector node,
no need to generate vector node again, it will be handled as part of
overlapping matching.

Fixes #113810
2024-10-28 06:59:41 -07:00
Matthias Braun
054c23d78f
X86: Improve cost model of fp16 conversion (#113195)
Improve cost-modeling for x86 __fp16 conversions so the SLPVectorizer
transforms the patterns:

- Override `X86TTIImpl::getStoreMinimumVF` to report a minimum VF of 4 (SSE
  register can hold 4xfloat converted/stored to 4xf16) this is necessary as
  fp16 stores are neither modeled as trunc-stores nor can we mark direct Xxfp16
  stores as legal as we generally expand fp16 operations).
- Add missing cost entries to `X86TTIImpl::getCastInstrCost`
  conversion from/to fp16. Note that conversion from f64 to f16 is not
  supported by an X86 instruction.
2024-10-25 16:22:24 -07:00
Jonas Paulsson
aba39c3974
[System] Precommit of test for #112491 (#113704) 2024-10-25 17:40:00 +02:00
Alexey Bataev
e914421d7f [SLP]Do correct signedness analysis for externally used scalars
If the scalars is used externally is in the root node, it may have
incorrect signedness info because of the conflict with the demanded bits
analysis. Need to perform exact signedness analysis and compute it
rather than rely on the precomputed value, which might be incorrect for
alternate zext/sext nodes.

Fixes #113520
2024-10-24 08:59:24 -07:00
Alexey Bataev
d2e7ee77d3 [SLP]Do not check for clustered loads only
Since SLP support "clusterization" of the non-load instructions, the
restriction for reduced values for loads only should be removed to avoid
compiler crash.

Fixes #113516
2024-10-24 08:16:42 -07:00