29888 Commits

Author SHA1 Message Date
Noah Goldstein
37932643ab [SimplifyCFG] Deduce paths unreachable if they cause div/rem UB
Same we way mark a path unreachable if it may cause a nullptr
dereference, div/rem by zero or signed div/rem of INT_MIN by -1 cause
immediate UB.

Closes #109008
2024-09-18 12:59:52 -05:00
Noah Goldstein
f5d62d7647 [SimplifyCFG] Add tests for deducing paths unreachable if they cause div/rem UB; NFC 2024-09-18 12:59:52 -05:00
Nikita Popov
13b4d1bfea [SimplifyCFG][LICM] Add additional speculation tests
These are related to https://github.com/llvm/llvm-project/issues/108854.
2024-09-18 14:48:58 +02:00
Shih-Po Hung
ffcff2f465
[VPlan][NFC] Fix the value name of VECTOR_GEP (#107544)
This patch passes the string `"vector.gep"` to CreateGEP instead of
CreateMul.
2024-09-18 19:22:36 +08:00
Yingwei Zheng
872932b7a9
[InstCombine] Generalize icmp (shl nuw C2, Y), C -> icmp Y, C3 (#104696)
The motivation of this patch is to fold more generalized patterns like
`icmp ult (shl nuw 16, X), 64 -> icmp ult X, 2`.

Alive2: https://alive2.llvm.org/ce/z/gyqjQH
2024-09-18 19:10:41 +08:00
Benjamin Maxwell
43c9203d49
[TLI] Support inferring function attributes for sincos[f|l] (#108554) 2024-09-18 09:40:29 +01:00
David Green
112aac4e89
[InstCombine] Fold fmod to frem if we know it does not set errno. (#107912)
fmod will be folded to frem in clang under -fno-math-errno and can be constant
folded in llvm if the operands are known. It can be relatively common to have
fp code that handles special values before doing some calculation:
```
if (isnan(f))
  return handlenan;
if (isinf(f))
  return handleinf;
..
fmod(f, 2.0)
```

This patch enables the folding of fmod to frem in instcombine if the first
parameter is not inf and the second is not zero. Other combinations do not set
errno.

The same transform is performed for fmod with the nnan flag, which implies the
input is known to not be inf/zero.
2024-09-18 09:38:28 +01:00
Jay Foad
d2d947b7e2
[AMDGPU] Fold llvm.amdgcn.cvt.pkrtz when either operand is fpext (#108237)
This also generalizes the Undef handling and adds Poison handling.
2024-09-18 09:37:04 +01:00
Chengjun
94a98cf5dc
[InstCombine] Remove dead phi web (#108876)
In current visitPHINode function during InstCombine, it can remove dead
phi cycles (all phis have one use, which is another phi). However, it
cannot deal with the case when the phis form a web (all phis have one or
more uses, and all the uses are phi). This change extends the algorithm
so that it can also deal with the dead phi web.
2024-09-18 10:04:49 +02:00
LiqinWeng
a2994b2999
[LV][NFC] Unify printing for WidenEVLReicpe with other EVL recipes (#108177) 2024-09-18 15:03:37 +08:00
Alex MacLean
790f2eb16a
[InstCombine] Avoid simplifying bitcast of undef to a zeroinitializer vector (#108872)
In some cases, if an undef value is the product of another instcombine
simplification, a bitcast of undef is simplified to a zeroinitializer
vector instead of undef.
2024-09-17 15:31:28 -07:00
Noah Goldstein
419c53477e [SimplifyCFG] Mark div/rem as not-cheap to sink if we are replacing const denominator
Close #109007
2024-09-17 12:04:34 -05:00
Noah Goldstein
ae8d0200b0 [SimplifyCFG] Add test for sinking div/rem with const remainder; NFC 2024-09-17 12:04:34 -05:00
Andreas Jonson
a0d00c94c2
[SimplifyCFG] Swap range metadata to attribute for calls. (#108984)
Among the last usages of range metadata for call before being able to
deprecate and only have the range attribute for calls.
2024-09-17 18:25:53 +02:00
Nikita Popov
848cec11f5 Revert "[SLP]Vectorize gathered loads"
This reverts commit de1f5b96adcea52bf7c9670c46123fe1197050d2.

This has a very large compile-time impact in some cases, in
particular lencod. See:
http://llvm-compile-time-tracker.com/compare.php?from=b1339abb713063363e7804124b8fb3d84143a003&to=de1f5b96adcea52bf7c9670c46123fe1197050d2&stat=instructions:u
2024-09-17 16:45:25 +02:00
Alexey Bataev
de1f5b96ad
[SLP]Vectorize gathered loads
Final gather/buildvector nodes may have scalar loads, which are not
vectorized (since they are part of the gather nodes) but may form full
vector loads, being combined. This patch walks over all gather nodes,
"gathering" and sorting gathered scalar loads and then tries to build
vector loads, which later are reshuffled between the gather nodes.
It allows later to add support for segmented loads (kind of AOS to SOA
load kind for RISC-V RVV) and may help with the removal of the alternat
e opcodes support.
Currently, alternate nodes may depend on each other because of the
consecutive loads between their operands. Because of that we cannot
simply remove alternate vectorization. But this approach may help to
remove most of the stuff for it, since we'll be able to vectorize loads
in between lanes.

Metric: size..text, AVX512

Program                                                                                                                                                size..text
                                                                                 test-suite :: MultiSource/Benchmarks/ASCI_Purple/SMG2000/smg2000.test   238381.00   250669.00  5.2%
                                                                  test-suite :: SingleSource/UnitTests/Vectorizer/VPlanNativePath/outer-loop-vect.test    25753.00    26329.00  2.2%
                                                                  test-suite :: SingleSource/UnitTests/Vector/AVX512BWVL/Vector-AVX512BWVL-psadbw.test     3028.00     3092.00  2.1%
                                                                                     test-suite :: MultiSource/Benchmarks/Rodinia/hotspot/hotspot.test     4243.00     4275.00  0.8%
                                                                                  test-suite :: External/SPEC/CINT2017speed/625.x264_s/625.x264_s.test   649765.00   653877.00  0.6%
                                                                                   test-suite :: External/SPEC/CINT2017rate/525.x264_r/525.x264_r.test   649765.00   653877.00  0.6%
                                                                                       test-suite :: SingleSource/Benchmarks/BenchmarkGame/n-body.test     4199.00     4222.00  0.5%
                                                             test-suite :: SingleSource/UnitTests/Vector/AVX512BWVL/Vector-AVX512BWVL-mask_set_bw.test    12933.00    12997.00  0.5%
                                                                                                 test-suite :: SingleSource/Benchmarks/Misc/flops.test     8282.00     8314.00  0.4%
                                                            test-suite :: SingleSource/UnitTests/Vector/AVX512BWVL/Vector-AVX512BWVL-unpack_msasm.test    10065.00    10097.00  0.3%
                                                                                         test-suite :: SingleSource/Benchmarks/Misc-C++/Large/ray.test     5160.00     5176.00  0.3%
                                                                              test-suite :: External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 12472220.00 12509612.00  0.3%
                                                                                      test-suite :: MultiSource/Benchmarks/Prolangs-C++/city/city.test     6908.00     6924.00  0.2%
                                                                         test-suite :: MultiSource/Benchmarks/MiBench/consumer-lame/consumer-lame.test   202830.00   203278.00  0.2%
                                                                                       test-suite :: SingleSource/Benchmarks/CoyoteBench/fftbench.test     9133.00     9149.00  0.2%
                                                                                           test-suite :: MultiSource/Benchmarks/Olden/power/power.test     6792.00     6803.00  0.2%
                                                                              test-suite :: External/SPEC/CFP2017rate/538.imagick_r/538.imagick_r.test  1395585.00  1397473.00  0.1%
                                                                             test-suite :: External/SPEC/CFP2017speed/638.imagick_s/638.imagick_s.test  1395585.00  1397473.00  0.1%
                                                                        test-suite :: External/SPEC/CINT2017speed/631.deepsjeng_s/631.deepsjeng_s.test    97662.00    97758.00  0.1%
                                                                                        test-suite :: External/SPEC/CFP2006/447.dealII/447.dealII.test   595179.00   595739.00  0.1%
                                                                             test-suite :: MultiSource/Benchmarks/DOE-ProxyApps-C/miniAMR/miniAMR.test    70603.00    70667.00  0.1%
                                                                            test-suite :: MultiSource/Benchmarks/Prolangs-C/unix-smail/unix-smail.test    19877.00    19893.00  0.1%
                                                                           test-suite :: MultiSource/Benchmarks/DOE-ProxyApps-C++/PENNANT/PENNANT.test    90231.00    90279.00  0.1%
                                                                                         test-suite :: External/SPEC/CINT2006/473.astar/473.astar.test    33738.00    33754.00  0.0%
                                                                                     test-suite :: External/SPEC/CFP2017speed/619.lbm_s/619.lbm_s.test    13262.00    13268.00  0.0%
                                                                                        test-suite :: External/SPEC/CFP2006/453.povray/453.povray.test  1139964.00  1140460.00  0.0%
                                                                                          test-suite :: MultiSource/Applications/JM/lencod/lencod.test   849507.00   849875.00  0.0%
                                                                                test-suite :: External/SPEC/CFP2017rate/511.povray_r/511.povray_r.test  1158379.00  1158859.00  0.0%
                                                                                   test-suite :: MultiSource/Benchmarks/DOE-ProxyApps-C/CoMD/CoMD.test    38724.00    38740.00  0.0%
                                                                                              test-suite :: External/SPEC/CFP2006/470.lbm/470.lbm.test    15180.00    15186.00  0.0%
                                                                                      test-suite :: External/SPEC/CFP2017rate/519.lbm_r/519.lbm_r.test    15484.00    15490.00  0.0%
                                                                                         test-suite :: External/SPEC/CINT2006/456.hmmer/456.hmmer.test   167391.00   167455.00  0.0%
                                                                        test-suite :: MultiSource/Benchmarks/TSVC/ControlFlow-dbl/ControlFlow-dbl.test   137448.00   137496.00  0.0%
                                                                                test-suite :: External/SPEC/CFP2017rate/510.parest_r/510.parest_r.test  2030254.00  2030766.00  0.0%
                                                                              test-suite :: MicroBenchmarks/LCALS/SubsetALambdaLoops/lcalsALambda.test   302870.00   302934.00  0.0%
                                                                                    test-suite :: MicroBenchmarks/LCALS/SubsetARawLoops/lcalsARaw.test   303126.00   303190.00  0.0%
                                                                                            test-suite :: External/SPEC/CFP2006/444.namd/444.namd.test   241107.00   241155.00  0.0%
                                                                                      test-suite :: External/SPEC/CFP2006/482.sphinx3/482.sphinx3.test   162974.00   163006.00  0.0%
                                                                                                 test-suite :: MultiSource/Applications/siod/siod.test   167168.00   167200.00  0.0%
                                                                                         test-suite :: MultiSource/Benchmarks/7zip/7zip-benchmark.test  1048796.00  1048988.00  0.0%
                                                                               test-suite :: MultiSource/Benchmarks/DOE-ProxyApps-C++/CLAMR/CLAMR.test   201623.00   201655.00  0.0%
                                                                                           test-suite :: MultiSource/Applications/sqlite3/sqlite3.test   501734.00   501798.00  0.0%
test-suite :: MultiSource/Applications/ClamAV/clamscan.test   580888.00   580952.00  0.0%
                                                                                           test-suite :: MultiSource/Benchmarks/MallocBench/gs/gs.test   168319.00   168335.00  0.0%
                                                                        test-suite :: MicroBenchmarks/ImageProcessing/Interpolation/Interpolation.test   226022.00   226038.00  0.0%
                                                        test-suite :: MultiSource/Benchmarks/TSVC/StatementReordering-flt/StatementReordering-flt.test   118011.00   118015.00  0.0%
                                                                                     test-suite :: External/SPEC/CINT2006/471.omnetpp/471.omnetpp.test   550589.00   550605.00  0.0%
                                                                                             test-suite :: External/SPEC/CINT2006/403.gcc/403.gcc.test  3072477.00  3072541.00  0.0%
                                                                                 test-suite :: External/SPEC/CINT2006/483.xalancbmk/483.xalancbmk.test  2385563.00  2385579.00  0.0%
                                                                                          test-suite :: MultiSource/Applications/JM/ldecod/ldecod.test   389171.00   389155.00 -0.0%
                                                                                                   test-suite :: MultiSource/Applications/lua/lua.test   234764.00   234748.00 -0.0%
                                                                                        test-suite :: MultiSource/Benchmarks/mafft/pairlocalalign.test   227694.00   227678.00 -0.0%
                                                                    test-suite :: MultiSource/Benchmarks/TSVC/NodeSplitting-flt/NodeSplitting-flt.test   119819.00   119807.00 -0.0%
                                                                        test-suite :: MultiSource/Benchmarks/TSVC/Recurrences-flt/Recurrences-flt.test   117995.00   117983.00 -0.0%
                                                            test-suite :: MultiSource/Benchmarks/TSVC/InductionVariable-flt/InductionVariable-flt.test   123610.00   123594.00 -0.0%
                                                                                       test-suite :: MultiSource/Benchmarks/FreeBench/pifft/pifft.test    81414.00    81398.00 -0.0%
                                                                                     test-suite :: External/SPEC/CINT2006/464.h264ref/464.h264ref.test   782040.00   781880.00 -0.0%
                                                                                    test-suite :: External/SPEC/CINT2017speed/602.gcc_s/602.gcc_s.test  9597420.00  9595292.00 -0.0%
                                                                                     test-suite :: External/SPEC/CINT2017rate/502.gcc_r/502.gcc_r.test  9597420.00  9595292.00 -0.0%
                                                                                         test-suite :: External/SPEC/CINT2006/445.gobmk/445.gobmk.test   911832.00   911608.00 -0.0%
                                                                                             test-suite :: MultiSource/Applications/oggenc/oggenc.test   192507.00   192459.00 -0.0%
                                                            test-suite :: MultiSource/Benchmarks/TSVC/LoopRestructuring-flt/LoopRestructuring-flt.test   122843.00   122811.00 -0.0%
                                                          test-suite :: MultiSource/Benchmarks/TSVC/CrossingThresholds-flt/CrossingThresholds-flt.test   122292.00   122260.00 -0.0%
                                                                                    test-suite :: External/SPEC/CFP2017rate/508.namd_r/508.namd_r.test   777363.00   777155.00 -0.0%
                                                                            test-suite :: MultiSource/Benchmarks/TSVC/Expansion-flt/Expansion-flt.test   123265.00   123205.00 -0.0%
                                                                                               test-suite :: MultiSource/Benchmarks/Bullet/bullet.test   315534.00   315358.00 -0.1%
                                                                        test-suite :: MultiSource/Benchmarks/TSVC/ControlFlow-flt/ControlFlow-flt.test   128163.00   128083.00 -0.1%
                                                                           test-suite :: MultiSource/Benchmarks/mediabench/g721/g721encode/encode.test     6562.00     6555.00 -0.1%
                                                                                test-suite :: MultiSource/Benchmarks/Prolangs-C/compiler/compiler.test    23428.00    23396.00 -0.1%
                                                                             test-suite :: MultiSource/Benchmarks/FreeBench/fourinarow/fourinarow.test    22749.00    22717.00 -0.1%
                                                                           test-suite :: MultiSource/Benchmarks/MiBench/telecomm-gsm/telecomm-gsm.test    39549.00    39485.00 -0.2%
                                                                                  test-suite :: MultiSource/Benchmarks/mediabench/gsm/toast/toast.test    39546.00    39482.00 -0.2%
                                                                                    test-suite :: MultiSource/Benchmarks/Prolangs-C/bison/mybison.test    57214.00    57118.00 -0.2%
                                                                                      test-suite :: SingleSource/Benchmarks/Adobe-C++/loop_unroll.test   413668.00   412804.00 -0.2%
                                                                                       test-suite :: MultiSource/Benchmarks/tramp3d-v4/tramp3d-v4.test  1044047.00  1041487.00 -0.2%
                                                                                            test-suite :: MultiSource/Benchmarks/McCat/18-imp/imp.test    12414.00    12382.00 -0.3%
                                                                                      test-suite :: MultiSource/Benchmarks/Prolangs-C/gnugo/gnugo.test    31161.00    30969.00 -0.6%
                                                                               test-suite :: MultiSource/Benchmarks/MallocBench/espresso/espresso.test   224726.00   223254.00 -0.7%
                                                                             test-suite :: MultiSource/Benchmarks/DOE-ProxyApps-C++/miniFE/miniFE.test    93512.00    92824.00 -0.7%
                                                                        test-suite :: MultiSource/Benchmarks/Prolangs-C/TimberWolfMC/timberwolfmc.test   281151.00   278463.00 -1.0%
                                                                                               test-suite :: MultiSource/Benchmarks/Olden/tsp/tsp.test     2820.00     2788.00 -1.1%
                                                                                            test-suite :: External/SPEC/CFP2006/433.milc/433.milc.test   156819.00   154739.00 -1.3%
                                                                 test-suite :: MultiSource/Benchmarks/MiBench/security-blowfish/security-blowfish.test    11560.00    11160.00 -3.5%
                                                                                          test-suite :: MultiSource/Benchmarks/McCat/08-main/main.test     6734.00     6382.00 -5.2%
                                                                                                                                                       results     results0    diff

ASCI_Purple/SMG2000 - extra vector code
VPlanNativePath/outer-loop-vect - extra vectorization, better vector
code
AVX512BWVL/Vector-AVX512BWVL-psadbw - better vector code
Rodinia/hotspot - small variations
CINT2017speed/625.x264_s
CINT2017rate/525.x264_r - extra vector code, better vectorization
BenchmarkGame/n-body - better vector code.
AVX512BWVL/Vector-AVX512BWVL-unpack_msasm - small variations
Misc/flops - extra vector code
AVX512BWVL/Vector-AVX512BWVL-mask_set_bw - small variations
Misc-C++/Large - better vector code
CFP2017rate/526.blender_r - extra vector code
Prolangs-C++/city - extra vector code
MiBench/consumer-lame - extra vector code
CoyoteBench/fftbench - extra vector code
Olden/power - better vector code
CFP2017rate/538.imagick_r
CFP2017speed/638.imagick_s - extra vector code
CINT2017rate/531.deepsjeng_r - extra vector code
CFP2006/447.dealII - small variations
DOE-ProxyApps-C/miniAMR - small variations
Prolangs-C/unix-smail - small variations
DOE-ProxyApps-C++/PENNANT - small variations
CINT2006/473.astar - small variations
CFP2006/453.povray - small variations
JM/lencod - extra vector code
CFP2017rate/511.povray_r - small variations
DOE-ProxyApps-C/CoMD - small variations
CFP2006/470.lbm - extra vector code
CFP2017speed/619.lbm_s
CFP2017rate/519.lbm_r - extra vector code
CINT2006/456.hmmer - extra code vectorized
TSVC/ControlFlow-dbl - extra vector code
CFP2017rate/510.parest_r - better vector code
LCALS/SubsetALambdaLoops - extra code vectorized
LCALS/SubsetARawLoops - extra code vectorized
CFP2006/444.namd - extra code vectorized
CFP2006/482.sphinx3 - better vector code
Applications/siod - better vector code
Benchmarks/7zip - better vector code
DOE-ProxyApps-C++/CLAMR - extra code vectorized
Applications/sqlite3 - extra code vectorized
Applications/ClamAV - smaller vector code
MallocBench/gs - small variations
MicroBenchmarks/ImageProcessing - small variations
TSVC/StatementReordering-flt - extra code vectorized
CINT2006/471.omnetpp - small variations
CINT2006/403.gcc - extra code vectorized
CINT2006/483.xalancbmk - extra code vectorized
JM/ldecod - small variations
Applications/lua - extra code vectorized
mafft/pairlocalalign - small variations
TSVC/NodeSplitting-flt - extra code vectorized
TSVC/Recurrences-flt - extra code vectorized
TSVC/InductionVariable-flt - extra code vectorized
FreeBench/pifft - small variations
CINT2006/464.h264ref - extra code vectorized
CINT2017speed/602.gcc_s
CINT2017rate/502.gcc_r - some extra code vectorized, extra code inlined
CINT2006/445.gobmk - small variations
Applications/oggenc - small variations
TSVC/LoopRestructuring-flt - extra code vectorized
TSVC/CrossingThresholds-flt - extra code vectorized
CFP2017rate/508.namd_r - small variations
TSVC/ControlFlow-flt - extra code vectorized
mediabench/g721 - small variations
Prolangs-C/compiler - small variations
FreeBench/fourinarow - better vector code
MiBench/telecomm-gsm - small variation in vector code
mediabench/gsm - same
Prolangs-C/bison - small variations
Adobe-C++/loop_unroll - extra code vectorized
Benchmarks/tramp3d-v4 - extra code gets inlined, small changes in vetor
code
McCat/18-imp - variations in vector code
Prolangs-C/gnugo - variations in vector code
MallocBench/espresso - extra code vectorized
DOE-ProxyApps-C++/miniFE - small variations in vector code
Prolangs-C/TimberWolfMC - extra code vectorized, small changes in
previously vectorized code.
Olden/tsp - small changes in vector code
CFP2006/433.milc - extra code gets inlined, vectorized 2 x stores to 4 x stores
MiBench/security-blowfish - extra code vectorized
McCat/08-main - better vector code.

Metric: size..text, RISCV, sifive-p670

Program                                                                                                                                                size..text
                                                                                                                                                       results    results0   diff
                                                                             test-suite :: MultiSource/Benchmarks/DOE-ProxyApps-C++/miniFE/miniFE.test   63580.00   64020.00  0.7%
                                                                   test-suite :: MultiSource/Benchmarks/MiBench/automotive-susan/automotive-susan.test   21388.00   21406.00  0.1%
                                                                                               test-suite :: MultiSource/Benchmarks/Bullet/bullet.test  296992.00  297088.00  0.0%
                                                                                test-suite :: External/SPEC/CFP2017rate/511.povray_r/511.povray_r.test  968112.00  968208.00  0.0%
                                                        test-suite :: MultiSource/Benchmarks/TSVC/StatementReordering-dbl/StatementReordering-dbl.test   45160.00   45164.00  0.0%
                                                                         test-suite :: External/SPEC/CINT2017rate/523.xalancbmk_r/523.xalancbmk_r.test 2635902.00 2635854.00 -0.0%
                                                                        test-suite :: External/SPEC/CINT2017speed/623.xalancbmk_s/623.xalancbmk_s.test 2635902.00 2635854.00 -0.0%
                                                                                     test-suite :: External/SPEC/CINT2017rate/502.gcc_r/502.gcc_r.test 7568730.00 7568578.00 -0.0%
                                                                                    test-suite :: External/SPEC/CINT2017speed/602.gcc_s/602.gcc_s.test 7568730.00 7568578.00 -0.0%
                                                          test-suite :: MultiSource/Benchmarks/TSVC/CrossingThresholds-flt/CrossingThresholds-flt.test   49764.00   49762.00 -0.0%
                                                                                           test-suite :: MultiSource/Applications/sqlite3/sqlite3.test  449132.00  449108.00 -0.0%
                                                                                          test-suite :: MultiSource/Applications/JM/lencod/lencod.test  695932.00  695892.00 -0.0%
                                                                                   test-suite :: External/SPEC/CINT2017rate/525.x264_r/525.x264_r.test  508820.00  508788.00 -0.0%
                                                                                  test-suite :: External/SPEC/CINT2017speed/625.x264_s/625.x264_s.test  508820.00  508788.00 -0.0%
                                                                              test-suite :: External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 9594152.00 9593336.00 -0.0%
                                                                                 test-suite :: MultiSource/Benchmarks/ASCI_Purple/SMG2000/smg2000.test  166522.00  166490.00 -0.0%
                                                                                    test-suite :: External/SPEC/CFP2017rate/508.namd_r/508.namd_r.test  722252.00  722092.00 -0.0%
                                                                             test-suite :: MultiSource/Benchmarks/DOE-ProxyApps-C/miniGMG/miniGMG.test   27554.00   27546.00 -0.0%
                                                                  test-suite :: SingleSource/UnitTests/Vectorizer/VPlanNativePath/outer-loop-vect.test   10900.00   10896.00 -0.0%
                                                          test-suite :: MultiSource/Benchmarks/TSVC/CrossingThresholds-dbl/CrossingThresholds-dbl.test   46754.00   46732.00 -0.0%
                                                                                       test-suite :: MultiSource/Benchmarks/tramp3d-v4/tramp3d-v4.test  631570.00  631226.00 -0.1%
                                                                                         test-suite :: MultiSource/Benchmarks/7zip/7zip-benchmark.test  850698.00  850218.00 -0.1%
                                                                           test-suite :: MultiSource/Benchmarks/MiBench/telecomm-gsm/telecomm-gsm.test   24816.00   24800.00 -0.1%
                                                                                  test-suite :: MultiSource/Benchmarks/mediabench/gsm/toast/toast.test   24814.00   24798.00 -0.1%
                                                                                test-suite :: External/SPEC/CFP2017rate/510.parest_r/510.parest_r.test 1599946.00 1598394.00 -0.1%
                                                                                                   test-suite :: MultiSource/Applications/hbd/hbd.test   27236.00   27204.00 -0.1%
                                                                                          test-suite :: MultiSource/Applications/JM/ldecod/ldecod.test  293848.00  293480.00 -0.1%
                                                                                test-suite :: MultiSource/Benchmarks/Prolangs-C/compiler/compiler.test   20160.00   20048.00 -0.6%
                                                                               test-suite :: MultiSource/Benchmarks/MallocBench/espresso/espresso.test  182088.00  181040.00 -0.6%
                                                                           test-suite :: MultiSource/Benchmarks/mediabench/g721/g721encode/encode.test    4788.00    4748.00 -0.8%

DOE-ProxyApps-C++/miniFE - extra vector code
MiBench/automotive-susan - small variations
Benchmarks/Bullet - extra vector code
CFP2017rate/511.povray_r - slightly better vector code
TSVC/StatementReordering-dbl - small variations
CINT2017rate/523.xalancbmk_r
CINT2017speed/623.xalancbmk_s - extra vector code
CINT2017rate/502.gcc_r
CINT2017speed/602.gcc_s - extra vector code
TSVC/CrossingThresholds-flt - small variations
Applications/sqlite3 - extra vector code
JM/lencod - extra vector code, small variations
CINT2017rate/525.x264_r
CINT2017speed/625.x264_s - small variations
CFP2017rate/526.blender_r - extra vector code, small variations
DOE-ProxyApps-C/miniGMG - small variations
Vectorizer/VPlanNativePath/outer-loop-vect - small variations
TSVC/CrossingThresholds-dbl - small variations
Benchmarks/tramp3d-v4 - small variations
Benchmarks/7zip - extra vector code
MiBench/telecomm-gsm - small variations
mediabench/gsm/toast - small variations
CFP2017rate/510.parest_r - extra vector code
Applications/hbd - extra vector code
JM/ldecod - better vector code
Prolangs-C/compiler - extra vector code
MallocBench/espresso - extra vector code
mediabench/g721/g721encode - extra vectorization

Reviewers: RKSimon

Reviewed By: RKSimon

Pull Request: https://github.com/llvm/llvm-project/pull/107461
2024-09-17 06:57:47 -04:00
Florian Hahn
b1339abb71
[InstCombine] Add tests for folding align assumes into load metadata. 2024-09-17 11:52:22 +01:00
Csanád Hajdú
bc8a5d104c
[Patchpoint] Add immarg attributes to patchpoint arguments (#97276) 2024-09-17 14:00:24 +04:00
Florian Hahn
3c5c61a414
[LV] Add first order rec test where hoisting can improve over sinking. 2024-09-17 09:25:39 +01:00
Florian Hahn
c48a1ebec1
[LV] Remove force-vector-width/force-vector-interleave from X86 test.
Update target-specific test to not force VF/UF, but instead use the
cost-model. There are similar tests arleady outside X86 and those force
VF & UF.

With this change, the target specific test checks the cost model.
Changes in picked VF/UF are limited to test_pr62954_scalar_epilogue_required,
and should preserve the original spirit of the test.
2024-09-17 08:59:24 +01:00
Luke Lau
30d7dcc1db [RISCV] Add asserts requirement to loop vectorizer tests
Hopefully this fixes a buildbot failure on fuchsia where opt doesn't
have -debug-only
2024-09-17 14:18:36 +08:00
Luke Lau
41f1b467a2
[RISCV] Account for zvfhmin and zvfbfmin promotion in register usage (#108370)
A half with only zvfhmin or bfloat will end up getting promoted to a f32
for most instructions.

Unless the loop consists only of memory ops and permutation instructions
which don't need promoted (is this common?), we'll end up using double
the LMUL than what's currently being returned by getRegUsageForType.

Since this is used by the loop vectorizer, it seems better to be
conservative and assume that any usage of a zvfhmin half/bfloat will end
up being widened to a f32
2024-09-17 13:50:19 +08:00
Alexey Bataev
18ef467d73 [SLP]Fix PR108709: postpone buildvector clustered nodes, if required
The "clustered" nodes for buildvector nodes must be postponed in
accordance with the global flag, otherwise it may cause crash because of
the dependency between phi nodes.
2024-09-16 09:53:46 -07:00
Alexey Bataev
f564a48f0e [SLP]Fix PR108700: correctly identify id of the operand node
If the operand node for truncs is not created during construction, but
one of the previous ones is reused instead, need to correctly identify
its index, to correctly emit the code.

Fixes https://github.com/llvm/llvm-project/issues/108700
2024-09-16 09:44:47 -07:00
Phoebe Wang
af5a45b34b
[X86,SimplifyCFG] Use passthru to reduce select (#108754) 2024-09-16 20:20:36 +08:00
Nikita Popov
b7e51b4f13
[IPSCCP] Infer attributes on arguments (#107114)
During inter-procedural SCCP, also infer attributes on arguments, not
just return values. This allows other non-interprocedural passes to make
use of the information later.
2024-09-16 10:23:41 +02:00
Florian Hahn
6749f2bbfe
[LV] Add pointer induction test variant with inbounds, remove TODO.
The function doesn't crash any more with inbounds, add a variant with
inbounds.
2024-09-15 21:48:18 +01:00
Yingwei Zheng
87663fdab9
[VectorCombine] Don't shrink lshr if the shamt is not less than bitwidth (#108705)
Consider the following case:
```
define <2 x i32> @test(<2 x i64> %vec.ind16, <2 x i32> %broadcast.splat20) {
  %19 = icmp eq <2 x i64> %vec.ind16, zeroinitializer
  %20 = zext <2 x i1> %19 to <2 x i32>
  %21 = lshr <2 x i32> %20, %broadcast.splat20
  ret <2 x i32> %21
}
```
After https://github.com/llvm/llvm-project/pull/104606, we shrink the
lshr into:
```
define <2 x i32> @test(<2 x i64> %vec.ind16, <2 x i32> %broadcast.splat20) {
  %1 = icmp eq <2 x i64> %vec.ind16, zeroinitializer
  %2 = trunc <2 x i32> %broadcast.splat20 to <2 x i1>
  %3 = lshr <2 x i1> %1, %2
  %4 = zext <2 x i1> %3 to <2 x i32>
  ret <2 x i32> %4
}
```
It is incorrect since `lshr i1 X, 1` returns `poison`.
This patch adds additional check on the shamt operand. The lshr will get
shrunk iff we ensure that the shamt is less than bitwidth of the smaller
type. As `computeKnownBits(&I, *DL).countMaxActiveBits() > BW` always
evaluates to true for `lshr(zext(X), Y)`, this check will only apply to
bitwise logical instructions.

Alive2: https://alive2.llvm.org/ce/z/j_RmTa
Fixes https://github.com/llvm/llvm-project/issues/108698.
2024-09-15 18:38:06 +08:00
c8ef
86f0399c1f
[InstCombine] Fold expression using basic properties of floor and ceiling function (#107107)
alive2: ~~https://alive2.llvm.org/ce/z/Ag3Ki7~~
https://alive2.llvm.org/ce/z/ywP5t2
related: #76438

This patch adds the following foldings: `floor(x) <= x --> true` and `x
<= ceil(x) --> true`. We leverage the properties of these math functions
and ensure there is no floating point input of `nan`.

---------

Co-authored-by: Yingwei Zheng <dtcxzyw@qq.com>
2024-09-15 14:25:00 +04:00
Florian Hahn
012dbec604
[VPlan] Handle ForceTargetInstructionCost in during precomputeCosts.
Make sure ForceTargetInstruction is respected in precomputeCosts.
2024-09-15 10:53:43 +01:00
Florian Hahn
f0c5caa814
[VPlan] Add VPIRInstruction, use for exit block live-outs. (#100735)
Add a new VPIRInstruction recipe to wrap existing IR instructions not to
be modified during execution, execept for PHIs. For PHIs, a single
VPValue
operand is allowed, and it is used to add a new incoming value for the
single predecessor VPBB. Expect PHIs, VPIRInstructions cannot have any
operands.

Depends on https://github.com/llvm/llvm-project/pull/100658.

PR: https://github.com/llvm/llvm-project/pull/100735
2024-09-14 21:21:55 +01:00
Mircea Trofin
82266d3a2b
[nfc][ctx_prof] Factor the callsite instrumentation exclusion criteria (#108471)
Reusing this in the logic fetching the instrumentation in `CtxProfAnalysis`.
2024-09-13 21:25:47 -07:00
Alexey Bataev
1e3536ef31 [SLP]Fix PR108620: Need to check, if the reduced value was transformed
Before trying to include the scalar into the list of
ExternallyUsedValues, need to check, if it was transformed in previous
iteration and use the transformed value, not the original one, to avoid
compiler crash when building external uses.

Fixes https://github.com/llvm/llvm-project/issues/108620
2024-09-13 15:43:06 -07:00
Felipe de Azevedo Piovezan
ddcc601353
[CoroSplit][DebugInfo] Adjust heuristic for moving DIScope of funclets (#108611)
CoroSplit has a heuristic where the scope line for funclets is adjusted
to match the line of the suspend intrinsic that caused the split. This
is useful as it avoids a jump on the line table from the original
function declaration to the line where the split happens.

However, very often using the line of the split is not ideal: if we can
avoid it, we should not have a line entry for the split location, as
this would cause breakpoints by line to match against two functions: the
funclet before and the funclet after the split.

This patch adjusts the heuristics to look for the first instruction with
a non-zero line number after the split. In other words, this patch makes
breakpoints on `await foo()` lines behave much more like a regular
function call.
2024-09-13 15:25:11 -07:00
vporpo
5130f3236f
[SandboxVec] User-defined pass pipeline (#108625)
This patch adds support for a user-defined pass-pipeline that overrides
the default pipeline of the vectorizer.
This will commonly be used by lit tests.
2024-09-13 13:14:06 -07:00
Volodymyr Vasylkun
21e3a212c5
[InstCombine] Replace an integer comparison of a phi node with multiple ucmp/scmp operands and a constant with phi of individual comparisons of original intrinsic's arguments (#107769)
When we have a `phi` instruction with more than one of its incoming
values being a call to `ucmp` or `scmp`, which is then compared with an
integer constant, we can move the comparison through the `phi` into the
incoming basic blocks because we know that a comparison of `ucmp`/`scmp`
with a constant will be simplified by the next iteration of InstCombine.

There's a high chance that other similar patterns can be identified, in
which case they can be easily handled by the same code by moving the
check for "simplifiable" instructions into a lambda.
2024-09-13 19:50:27 +01:00
Alexey Bataev
c13bf6d4a8 [SLP]Return proper value for phi vectorized node
Should not return the original phi vector instruction, need to return
actual vectorized value as a result.
2024-09-13 11:30:29 -07:00
Alexey Bataev
98b1d01b42 [SLP][NFC]Test with incorrect value for phi node with reused scalars, NFC 2024-09-13 11:26:00 -07:00
vporpo
39f2d2f156
[SandboxVec] Boilerplate for vectorization passes (#108603)
This patch implements a new empty pass for the Bottom-up vectorizer and
creates a pass pipeline that includes it.
The SandboxVectorizer LLVM pass runs the Sandbox IR pass pipeline.
2024-09-13 11:22:24 -07:00
Ganesh
02e4186d0b
[X86] AMD Zen 5 Initial enablement (#107964)
This patch enables the basic skeleton enablement of AMD next gen zen5 CPUs.
2024-09-13 17:45:33 +01:00
Nikita Popov
1c298c9274 [InstCombine] Preserve nuw flags when merging geps
These transforms all perform a variant of (gep (gep p, x), y)
to (gep p, (x + y)). We can preserve both inbounds and nuw
during such transforms (https://alive2.llvm.org/ce/z/Stu4cN), but
not nusw, which would require proving that the new add is nsw.

For the constant offset case, I've conservatively retained the
logic that checks for negative intermediate offsets, though I'm
not sure it's still reachable nowadays.
2024-09-13 11:15:22 +02:00
Igor Kirillov
1b57cbcf25
[VectorCombine] Refactor Insertion Point setting in shrinkType (#108398) 2024-09-13 10:03:31 +01:00
Nikita Popov
940f89255e [InstCombine] Do not modify GEP in place
This was modifying the GEP in place, with code to adjust the
inbounds flag. This was correct at the time, but now fails to
account for other GEP flags like nuw, leading to miscompilations.

Remove the special case, and always create a new GEP instruction.
Logic for preserving nuw in the cases where it is valid will be
added in a followup patch.
2024-09-13 10:04:39 +02:00
David Green
c0e308ba3d
[InstCombine] Pass DomTree and DomTreeCacheto LibCallSimplifier (#108446)
This allows any combines to pick up Known states from dominating
conditions.
2024-09-13 08:36:48 +01:00
Yingwei Zheng
2ca75df1d1
[ValueTracking] Infer is-power-of-2 from dominating conditions (#107994)
Addresses downstream rustc issue:
https://github.com/rust-lang/rust/issues/129795
2024-09-13 08:54:29 +08:00
Florian Hahn
08d294df55
[VPlan] Simplify VPBuilder insert point when adding users in exit block.
Simplifies setting the insert point, addressing a TODO.
2024-09-12 22:47:03 +01:00
Alexey Bataev
5d7cf504ce [SLP]Fix PR108421: Correctly deduce VF from the masks
Need to select the max of CommonMask and V1 Mask size to correctly
perform reshuffling of the vectors, otherwise incorrect result is
generated.

Fixes https://github.com/llvm/llvm-project/issues/108421
2024-09-12 13:43:44 -07:00
Alexey Bataev
de0fdcb2b0 [SLP][NFC]Add a test for incorrectly combined extracts with the buildvector 2024-09-12 13:39:37 -07:00
David Green
ad3ad15229 [InstCombine] Test for fmod -> frem folding. NFC 2024-09-12 21:10:40 +01:00
Sushant Gokhale
d37d05795d
[SLP][AArch64] Fix test failure for PR #106507 (#108442)
Updating the failing test in this patch.
2024-09-13 00:51:49 +05:30