37539 Commits

Author SHA1 Message Date
Jorge Gorbe Moya
71e434d302
[SandboxVec] Reapply "Add barebones Region class. (#108899)" (#109059)
A `#ifndef NDEBUG` in the wrong place caused an error in release builds.
2024-09-18 11:36:45 -07:00
Noah Goldstein
37932643ab [SimplifyCFG] Deduce paths unreachable if they cause div/rem UB
Same we way mark a path unreachable if it may cause a nullptr
dereference, div/rem by zero or signed div/rem of INT_MIN by -1 cause
immediate UB.

Closes #109008
2024-09-18 12:59:52 -05:00
Florian Hahn
0d736e296c
[VPlan] Add getSCEVExprForVPValue util, use to get trip count SCEV (NFC) (#94464)
Add a new getSCEVExprForVPValue utility which can be used to get a SCEV
expression for a VPValue. The initial implementation only returns SCEVs
for live-in IR values (by constructing a SCEV based on the live-in IR
value) and VPExpandSCEVRecipe. This is enough to serve its first use,
getting a SCEV for a VPlan's trip count, but will be extended in the
future.

It also removes createTripCountSCEV, as the new helper can be used to
retrieve the SCEV from the VPlan.

PR: https://github.com/llvm/llvm-project/pull/94464
2024-09-18 14:41:56 +01:00
David Green
403897484f [InstCombine] Return FRem, as opposed to substituteInParent.
This attempts to fix the ASan buildbot, which is detecting that CI is used
after it is removed in substituteInParent. The idea was to make sure it was
removed even if it had side-effects writing errno, but that appears to happen
if we return FRem directly as usual.
2024-09-18 12:32:47 +01:00
Shih-Po Hung
ffcff2f465
[VPlan][NFC] Fix the value name of VECTOR_GEP (#107544)
This patch passes the string `"vector.gep"` to CreateGEP instead of
CreateMul.
2024-09-18 19:22:36 +08:00
Yingwei Zheng
872932b7a9
[InstCombine] Generalize icmp (shl nuw C2, Y), C -> icmp Y, C3 (#104696)
The motivation of this patch is to fold more generalized patterns like
`icmp ult (shl nuw 16, X), 64 -> icmp ult X, 2`.

Alive2: https://alive2.llvm.org/ce/z/gyqjQH
2024-09-18 19:10:41 +08:00
Benjamin Maxwell
43c9203d49
[TLI] Support inferring function attributes for sincos[f|l] (#108554) 2024-09-18 09:40:29 +01:00
David Green
112aac4e89
[InstCombine] Fold fmod to frem if we know it does not set errno. (#107912)
fmod will be folded to frem in clang under -fno-math-errno and can be constant
folded in llvm if the operands are known. It can be relatively common to have
fp code that handles special values before doing some calculation:
```
if (isnan(f))
  return handlenan;
if (isinf(f))
  return handleinf;
..
fmod(f, 2.0)
```

This patch enables the folding of fmod to frem in instcombine if the first
parameter is not inf and the second is not zero. Other combinations do not set
errno.

The same transform is performed for fmod with the nnan flag, which implies the
input is known to not be inf/zero.
2024-09-18 09:38:28 +01:00
Chengjun
94a98cf5dc
[InstCombine] Remove dead phi web (#108876)
In current visitPHINode function during InstCombine, it can remove dead
phi cycles (all phis have one use, which is another phi). However, it
cannot deal with the case when the phis form a web (all phis have one or
more uses, and all the uses are phi). This change extends the algorithm
so that it can also deal with the dead phi web.
2024-09-18 10:04:49 +02:00
LiqinWeng
a2994b2999
[LV][NFC] Unify printing for WidenEVLReicpe with other EVL recipes (#108177) 2024-09-18 15:03:37 +08:00
Mircea Trofin
b2d3c315d5
[ctx_prof] Fix checks in PGOCtxprofFlattening (#108467)
The assertion that all out-edges of a BB can't be 0 is incorrect: they
can be, if that branch is on a cold subgraph.

Added validators and asserts about the expected proprerties of the
propagated counters.
2024-09-17 18:19:20 -07:00
vporpo
42c5a301f5
[SandboxVec] Legality boilerplate (#108650)
This patch adds the basic API for the Legality component of the
vectorizer. It also adds some very basic code in the bottom-up
vectorizer that uses the API.
2024-09-17 17:06:29 -07:00
Jorge Gorbe Moya
aa2e6b8734
Revert "[SandboxVec] Add barebones Region class." (#109058)
Reverts llvm/llvm-project#108899

It broke the llvm-clang-x86_64-win-fast buildbot.
2024-09-17 15:47:30 -07:00
Jorge Gorbe Moya
3aecf41c2b
[SandboxVec] Add barebones Region class. (#108899)
A region identifies a set of vector instructions generated by
vectorization passes. The vectorizer can then run a series of
RegionPasses on the region, evaluate the cost, and commit/reject the
transforms on a region-by-region basis, instead of an entire basic
block.

This is heavily based ov @vporpo's prototype. In particular, the doc
comment for the Region class is all his. The rest of this commit is
mostly boilerplate around a SetVector: getters, iterators, and some
debug helpers.
2024-09-17 15:40:24 -07:00
Alex MacLean
790f2eb16a
[InstCombine] Avoid simplifying bitcast of undef to a zeroinitializer vector (#108872)
In some cases, if an undef value is the product of another instcombine
simplification, a bitcast of undef is simplified to a zeroinitializer
vector instead of undef.
2024-09-17 15:31:28 -07:00
vporpo
318d2f5e5d
[SandboxVec][DAG] Boilerplate (#108862)
This patch adds a very basic implementation of the Dependency Graph to
be used by the vectorizer.
2024-09-17 12:03:52 -07:00
Noah Goldstein
419c53477e [SimplifyCFG] Mark div/rem as not-cheap to sink if we are replacing const denominator
Close #109007
2024-09-17 12:04:34 -05:00
Andreas Jonson
a0d00c94c2
[SimplifyCFG] Swap range metadata to attribute for calls. (#108984)
Among the last usages of range metadata for call before being able to
deprecate and only have the range attribute for calls.
2024-09-17 18:25:53 +02:00
Farzon Lotfi
0f97b4824a
[Scalarizer][DirectX] Add support for scalarization of Target intrinsics (#108776)
Since we are using the Scalarizer pass in the backend we needed a way to
allow this pass to operate on Target intrinsics.
We achieved this by adding `TargetTransformInfo ` to the Scalarizer
pass. This allowed us to call a function available to the DirectX
backend to know if an intrinsic is a target intrinsic that should be
scalarized.
2024-09-17 11:35:42 -04:00
Nikita Popov
848cec11f5 Revert "[SLP]Vectorize gathered loads"
This reverts commit de1f5b96adcea52bf7c9670c46123fe1197050d2.

This has a very large compile-time impact in some cases, in
particular lencod. See:
http://llvm-compile-time-tracker.com/compare.php?from=b1339abb713063363e7804124b8fb3d84143a003&to=de1f5b96adcea52bf7c9670c46123fe1197050d2&stat=instructions:u
2024-09-17 16:45:25 +02:00
Nikita Popov
34e16b6b9c
[IndVars] Fix strict weak ordering violation (#108947)
The sort used the block name as a tie-breaker, which will not work for
unnamed blocks and can result in a strict weak ordering violation.

Fix this by checking that all exiting blocks dominate the latch first,
which means that we have a total dominance order. This makes the code
structure here align with what optimizeLoopExits() does.

Fixes https://github.com/llvm/llvm-project/issues/108618.
2024-09-17 15:33:23 +02:00
David Sherwood
b84c42944a
[NFC][LoopVectorize] Rename variable in replaceVPBBWithIRVPBB (#108543)
I've renamed the variable in replaceVPBBWithIRVPBB from IRMiddleVPBB ->
IRVPBB, since the function is used for more than just replacing the
middle VP block.
2024-09-17 12:54:55 +01:00
Alexey Bataev
de1f5b96ad
[SLP]Vectorize gathered loads
Final gather/buildvector nodes may have scalar loads, which are not
vectorized (since they are part of the gather nodes) but may form full
vector loads, being combined. This patch walks over all gather nodes,
"gathering" and sorting gathered scalar loads and then tries to build
vector loads, which later are reshuffled between the gather nodes.
It allows later to add support for segmented loads (kind of AOS to SOA
load kind for RISC-V RVV) and may help with the removal of the alternat
e opcodes support.
Currently, alternate nodes may depend on each other because of the
consecutive loads between their operands. Because of that we cannot
simply remove alternate vectorization. But this approach may help to
remove most of the stuff for it, since we'll be able to vectorize loads
in between lanes.

Metric: size..text, AVX512

Program                                                                                                                                                size..text
                                                                                 test-suite :: MultiSource/Benchmarks/ASCI_Purple/SMG2000/smg2000.test   238381.00   250669.00  5.2%
                                                                  test-suite :: SingleSource/UnitTests/Vectorizer/VPlanNativePath/outer-loop-vect.test    25753.00    26329.00  2.2%
                                                                  test-suite :: SingleSource/UnitTests/Vector/AVX512BWVL/Vector-AVX512BWVL-psadbw.test     3028.00     3092.00  2.1%
                                                                                     test-suite :: MultiSource/Benchmarks/Rodinia/hotspot/hotspot.test     4243.00     4275.00  0.8%
                                                                                  test-suite :: External/SPEC/CINT2017speed/625.x264_s/625.x264_s.test   649765.00   653877.00  0.6%
                                                                                   test-suite :: External/SPEC/CINT2017rate/525.x264_r/525.x264_r.test   649765.00   653877.00  0.6%
                                                                                       test-suite :: SingleSource/Benchmarks/BenchmarkGame/n-body.test     4199.00     4222.00  0.5%
                                                             test-suite :: SingleSource/UnitTests/Vector/AVX512BWVL/Vector-AVX512BWVL-mask_set_bw.test    12933.00    12997.00  0.5%
                                                                                                 test-suite :: SingleSource/Benchmarks/Misc/flops.test     8282.00     8314.00  0.4%
                                                            test-suite :: SingleSource/UnitTests/Vector/AVX512BWVL/Vector-AVX512BWVL-unpack_msasm.test    10065.00    10097.00  0.3%
                                                                                         test-suite :: SingleSource/Benchmarks/Misc-C++/Large/ray.test     5160.00     5176.00  0.3%
                                                                              test-suite :: External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 12472220.00 12509612.00  0.3%
                                                                                      test-suite :: MultiSource/Benchmarks/Prolangs-C++/city/city.test     6908.00     6924.00  0.2%
                                                                         test-suite :: MultiSource/Benchmarks/MiBench/consumer-lame/consumer-lame.test   202830.00   203278.00  0.2%
                                                                                       test-suite :: SingleSource/Benchmarks/CoyoteBench/fftbench.test     9133.00     9149.00  0.2%
                                                                                           test-suite :: MultiSource/Benchmarks/Olden/power/power.test     6792.00     6803.00  0.2%
                                                                              test-suite :: External/SPEC/CFP2017rate/538.imagick_r/538.imagick_r.test  1395585.00  1397473.00  0.1%
                                                                             test-suite :: External/SPEC/CFP2017speed/638.imagick_s/638.imagick_s.test  1395585.00  1397473.00  0.1%
                                                                        test-suite :: External/SPEC/CINT2017speed/631.deepsjeng_s/631.deepsjeng_s.test    97662.00    97758.00  0.1%
                                                                                        test-suite :: External/SPEC/CFP2006/447.dealII/447.dealII.test   595179.00   595739.00  0.1%
                                                                             test-suite :: MultiSource/Benchmarks/DOE-ProxyApps-C/miniAMR/miniAMR.test    70603.00    70667.00  0.1%
                                                                            test-suite :: MultiSource/Benchmarks/Prolangs-C/unix-smail/unix-smail.test    19877.00    19893.00  0.1%
                                                                           test-suite :: MultiSource/Benchmarks/DOE-ProxyApps-C++/PENNANT/PENNANT.test    90231.00    90279.00  0.1%
                                                                                         test-suite :: External/SPEC/CINT2006/473.astar/473.astar.test    33738.00    33754.00  0.0%
                                                                                     test-suite :: External/SPEC/CFP2017speed/619.lbm_s/619.lbm_s.test    13262.00    13268.00  0.0%
                                                                                        test-suite :: External/SPEC/CFP2006/453.povray/453.povray.test  1139964.00  1140460.00  0.0%
                                                                                          test-suite :: MultiSource/Applications/JM/lencod/lencod.test   849507.00   849875.00  0.0%
                                                                                test-suite :: External/SPEC/CFP2017rate/511.povray_r/511.povray_r.test  1158379.00  1158859.00  0.0%
                                                                                   test-suite :: MultiSource/Benchmarks/DOE-ProxyApps-C/CoMD/CoMD.test    38724.00    38740.00  0.0%
                                                                                              test-suite :: External/SPEC/CFP2006/470.lbm/470.lbm.test    15180.00    15186.00  0.0%
                                                                                      test-suite :: External/SPEC/CFP2017rate/519.lbm_r/519.lbm_r.test    15484.00    15490.00  0.0%
                                                                                         test-suite :: External/SPEC/CINT2006/456.hmmer/456.hmmer.test   167391.00   167455.00  0.0%
                                                                        test-suite :: MultiSource/Benchmarks/TSVC/ControlFlow-dbl/ControlFlow-dbl.test   137448.00   137496.00  0.0%
                                                                                test-suite :: External/SPEC/CFP2017rate/510.parest_r/510.parest_r.test  2030254.00  2030766.00  0.0%
                                                                              test-suite :: MicroBenchmarks/LCALS/SubsetALambdaLoops/lcalsALambda.test   302870.00   302934.00  0.0%
                                                                                    test-suite :: MicroBenchmarks/LCALS/SubsetARawLoops/lcalsARaw.test   303126.00   303190.00  0.0%
                                                                                            test-suite :: External/SPEC/CFP2006/444.namd/444.namd.test   241107.00   241155.00  0.0%
                                                                                      test-suite :: External/SPEC/CFP2006/482.sphinx3/482.sphinx3.test   162974.00   163006.00  0.0%
                                                                                                 test-suite :: MultiSource/Applications/siod/siod.test   167168.00   167200.00  0.0%
                                                                                         test-suite :: MultiSource/Benchmarks/7zip/7zip-benchmark.test  1048796.00  1048988.00  0.0%
                                                                               test-suite :: MultiSource/Benchmarks/DOE-ProxyApps-C++/CLAMR/CLAMR.test   201623.00   201655.00  0.0%
                                                                                           test-suite :: MultiSource/Applications/sqlite3/sqlite3.test   501734.00   501798.00  0.0%
test-suite :: MultiSource/Applications/ClamAV/clamscan.test   580888.00   580952.00  0.0%
                                                                                           test-suite :: MultiSource/Benchmarks/MallocBench/gs/gs.test   168319.00   168335.00  0.0%
                                                                        test-suite :: MicroBenchmarks/ImageProcessing/Interpolation/Interpolation.test   226022.00   226038.00  0.0%
                                                        test-suite :: MultiSource/Benchmarks/TSVC/StatementReordering-flt/StatementReordering-flt.test   118011.00   118015.00  0.0%
                                                                                     test-suite :: External/SPEC/CINT2006/471.omnetpp/471.omnetpp.test   550589.00   550605.00  0.0%
                                                                                             test-suite :: External/SPEC/CINT2006/403.gcc/403.gcc.test  3072477.00  3072541.00  0.0%
                                                                                 test-suite :: External/SPEC/CINT2006/483.xalancbmk/483.xalancbmk.test  2385563.00  2385579.00  0.0%
                                                                                          test-suite :: MultiSource/Applications/JM/ldecod/ldecod.test   389171.00   389155.00 -0.0%
                                                                                                   test-suite :: MultiSource/Applications/lua/lua.test   234764.00   234748.00 -0.0%
                                                                                        test-suite :: MultiSource/Benchmarks/mafft/pairlocalalign.test   227694.00   227678.00 -0.0%
                                                                    test-suite :: MultiSource/Benchmarks/TSVC/NodeSplitting-flt/NodeSplitting-flt.test   119819.00   119807.00 -0.0%
                                                                        test-suite :: MultiSource/Benchmarks/TSVC/Recurrences-flt/Recurrences-flt.test   117995.00   117983.00 -0.0%
                                                            test-suite :: MultiSource/Benchmarks/TSVC/InductionVariable-flt/InductionVariable-flt.test   123610.00   123594.00 -0.0%
                                                                                       test-suite :: MultiSource/Benchmarks/FreeBench/pifft/pifft.test    81414.00    81398.00 -0.0%
                                                                                     test-suite :: External/SPEC/CINT2006/464.h264ref/464.h264ref.test   782040.00   781880.00 -0.0%
                                                                                    test-suite :: External/SPEC/CINT2017speed/602.gcc_s/602.gcc_s.test  9597420.00  9595292.00 -0.0%
                                                                                     test-suite :: External/SPEC/CINT2017rate/502.gcc_r/502.gcc_r.test  9597420.00  9595292.00 -0.0%
                                                                                         test-suite :: External/SPEC/CINT2006/445.gobmk/445.gobmk.test   911832.00   911608.00 -0.0%
                                                                                             test-suite :: MultiSource/Applications/oggenc/oggenc.test   192507.00   192459.00 -0.0%
                                                            test-suite :: MultiSource/Benchmarks/TSVC/LoopRestructuring-flt/LoopRestructuring-flt.test   122843.00   122811.00 -0.0%
                                                          test-suite :: MultiSource/Benchmarks/TSVC/CrossingThresholds-flt/CrossingThresholds-flt.test   122292.00   122260.00 -0.0%
                                                                                    test-suite :: External/SPEC/CFP2017rate/508.namd_r/508.namd_r.test   777363.00   777155.00 -0.0%
                                                                            test-suite :: MultiSource/Benchmarks/TSVC/Expansion-flt/Expansion-flt.test   123265.00   123205.00 -0.0%
                                                                                               test-suite :: MultiSource/Benchmarks/Bullet/bullet.test   315534.00   315358.00 -0.1%
                                                                        test-suite :: MultiSource/Benchmarks/TSVC/ControlFlow-flt/ControlFlow-flt.test   128163.00   128083.00 -0.1%
                                                                           test-suite :: MultiSource/Benchmarks/mediabench/g721/g721encode/encode.test     6562.00     6555.00 -0.1%
                                                                                test-suite :: MultiSource/Benchmarks/Prolangs-C/compiler/compiler.test    23428.00    23396.00 -0.1%
                                                                             test-suite :: MultiSource/Benchmarks/FreeBench/fourinarow/fourinarow.test    22749.00    22717.00 -0.1%
                                                                           test-suite :: MultiSource/Benchmarks/MiBench/telecomm-gsm/telecomm-gsm.test    39549.00    39485.00 -0.2%
                                                                                  test-suite :: MultiSource/Benchmarks/mediabench/gsm/toast/toast.test    39546.00    39482.00 -0.2%
                                                                                    test-suite :: MultiSource/Benchmarks/Prolangs-C/bison/mybison.test    57214.00    57118.00 -0.2%
                                                                                      test-suite :: SingleSource/Benchmarks/Adobe-C++/loop_unroll.test   413668.00   412804.00 -0.2%
                                                                                       test-suite :: MultiSource/Benchmarks/tramp3d-v4/tramp3d-v4.test  1044047.00  1041487.00 -0.2%
                                                                                            test-suite :: MultiSource/Benchmarks/McCat/18-imp/imp.test    12414.00    12382.00 -0.3%
                                                                                      test-suite :: MultiSource/Benchmarks/Prolangs-C/gnugo/gnugo.test    31161.00    30969.00 -0.6%
                                                                               test-suite :: MultiSource/Benchmarks/MallocBench/espresso/espresso.test   224726.00   223254.00 -0.7%
                                                                             test-suite :: MultiSource/Benchmarks/DOE-ProxyApps-C++/miniFE/miniFE.test    93512.00    92824.00 -0.7%
                                                                        test-suite :: MultiSource/Benchmarks/Prolangs-C/TimberWolfMC/timberwolfmc.test   281151.00   278463.00 -1.0%
                                                                                               test-suite :: MultiSource/Benchmarks/Olden/tsp/tsp.test     2820.00     2788.00 -1.1%
                                                                                            test-suite :: External/SPEC/CFP2006/433.milc/433.milc.test   156819.00   154739.00 -1.3%
                                                                 test-suite :: MultiSource/Benchmarks/MiBench/security-blowfish/security-blowfish.test    11560.00    11160.00 -3.5%
                                                                                          test-suite :: MultiSource/Benchmarks/McCat/08-main/main.test     6734.00     6382.00 -5.2%
                                                                                                                                                       results     results0    diff

ASCI_Purple/SMG2000 - extra vector code
VPlanNativePath/outer-loop-vect - extra vectorization, better vector
code
AVX512BWVL/Vector-AVX512BWVL-psadbw - better vector code
Rodinia/hotspot - small variations
CINT2017speed/625.x264_s
CINT2017rate/525.x264_r - extra vector code, better vectorization
BenchmarkGame/n-body - better vector code.
AVX512BWVL/Vector-AVX512BWVL-unpack_msasm - small variations
Misc/flops - extra vector code
AVX512BWVL/Vector-AVX512BWVL-mask_set_bw - small variations
Misc-C++/Large - better vector code
CFP2017rate/526.blender_r - extra vector code
Prolangs-C++/city - extra vector code
MiBench/consumer-lame - extra vector code
CoyoteBench/fftbench - extra vector code
Olden/power - better vector code
CFP2017rate/538.imagick_r
CFP2017speed/638.imagick_s - extra vector code
CINT2017rate/531.deepsjeng_r - extra vector code
CFP2006/447.dealII - small variations
DOE-ProxyApps-C/miniAMR - small variations
Prolangs-C/unix-smail - small variations
DOE-ProxyApps-C++/PENNANT - small variations
CINT2006/473.astar - small variations
CFP2006/453.povray - small variations
JM/lencod - extra vector code
CFP2017rate/511.povray_r - small variations
DOE-ProxyApps-C/CoMD - small variations
CFP2006/470.lbm - extra vector code
CFP2017speed/619.lbm_s
CFP2017rate/519.lbm_r - extra vector code
CINT2006/456.hmmer - extra code vectorized
TSVC/ControlFlow-dbl - extra vector code
CFP2017rate/510.parest_r - better vector code
LCALS/SubsetALambdaLoops - extra code vectorized
LCALS/SubsetARawLoops - extra code vectorized
CFP2006/444.namd - extra code vectorized
CFP2006/482.sphinx3 - better vector code
Applications/siod - better vector code
Benchmarks/7zip - better vector code
DOE-ProxyApps-C++/CLAMR - extra code vectorized
Applications/sqlite3 - extra code vectorized
Applications/ClamAV - smaller vector code
MallocBench/gs - small variations
MicroBenchmarks/ImageProcessing - small variations
TSVC/StatementReordering-flt - extra code vectorized
CINT2006/471.omnetpp - small variations
CINT2006/403.gcc - extra code vectorized
CINT2006/483.xalancbmk - extra code vectorized
JM/ldecod - small variations
Applications/lua - extra code vectorized
mafft/pairlocalalign - small variations
TSVC/NodeSplitting-flt - extra code vectorized
TSVC/Recurrences-flt - extra code vectorized
TSVC/InductionVariable-flt - extra code vectorized
FreeBench/pifft - small variations
CINT2006/464.h264ref - extra code vectorized
CINT2017speed/602.gcc_s
CINT2017rate/502.gcc_r - some extra code vectorized, extra code inlined
CINT2006/445.gobmk - small variations
Applications/oggenc - small variations
TSVC/LoopRestructuring-flt - extra code vectorized
TSVC/CrossingThresholds-flt - extra code vectorized
CFP2017rate/508.namd_r - small variations
TSVC/ControlFlow-flt - extra code vectorized
mediabench/g721 - small variations
Prolangs-C/compiler - small variations
FreeBench/fourinarow - better vector code
MiBench/telecomm-gsm - small variation in vector code
mediabench/gsm - same
Prolangs-C/bison - small variations
Adobe-C++/loop_unroll - extra code vectorized
Benchmarks/tramp3d-v4 - extra code gets inlined, small changes in vetor
code
McCat/18-imp - variations in vector code
Prolangs-C/gnugo - variations in vector code
MallocBench/espresso - extra code vectorized
DOE-ProxyApps-C++/miniFE - small variations in vector code
Prolangs-C/TimberWolfMC - extra code vectorized, small changes in
previously vectorized code.
Olden/tsp - small changes in vector code
CFP2006/433.milc - extra code gets inlined, vectorized 2 x stores to 4 x stores
MiBench/security-blowfish - extra code vectorized
McCat/08-main - better vector code.

Metric: size..text, RISCV, sifive-p670

Program                                                                                                                                                size..text
                                                                                                                                                       results    results0   diff
                                                                             test-suite :: MultiSource/Benchmarks/DOE-ProxyApps-C++/miniFE/miniFE.test   63580.00   64020.00  0.7%
                                                                   test-suite :: MultiSource/Benchmarks/MiBench/automotive-susan/automotive-susan.test   21388.00   21406.00  0.1%
                                                                                               test-suite :: MultiSource/Benchmarks/Bullet/bullet.test  296992.00  297088.00  0.0%
                                                                                test-suite :: External/SPEC/CFP2017rate/511.povray_r/511.povray_r.test  968112.00  968208.00  0.0%
                                                        test-suite :: MultiSource/Benchmarks/TSVC/StatementReordering-dbl/StatementReordering-dbl.test   45160.00   45164.00  0.0%
                                                                         test-suite :: External/SPEC/CINT2017rate/523.xalancbmk_r/523.xalancbmk_r.test 2635902.00 2635854.00 -0.0%
                                                                        test-suite :: External/SPEC/CINT2017speed/623.xalancbmk_s/623.xalancbmk_s.test 2635902.00 2635854.00 -0.0%
                                                                                     test-suite :: External/SPEC/CINT2017rate/502.gcc_r/502.gcc_r.test 7568730.00 7568578.00 -0.0%
                                                                                    test-suite :: External/SPEC/CINT2017speed/602.gcc_s/602.gcc_s.test 7568730.00 7568578.00 -0.0%
                                                          test-suite :: MultiSource/Benchmarks/TSVC/CrossingThresholds-flt/CrossingThresholds-flt.test   49764.00   49762.00 -0.0%
                                                                                           test-suite :: MultiSource/Applications/sqlite3/sqlite3.test  449132.00  449108.00 -0.0%
                                                                                          test-suite :: MultiSource/Applications/JM/lencod/lencod.test  695932.00  695892.00 -0.0%
                                                                                   test-suite :: External/SPEC/CINT2017rate/525.x264_r/525.x264_r.test  508820.00  508788.00 -0.0%
                                                                                  test-suite :: External/SPEC/CINT2017speed/625.x264_s/625.x264_s.test  508820.00  508788.00 -0.0%
                                                                              test-suite :: External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 9594152.00 9593336.00 -0.0%
                                                                                 test-suite :: MultiSource/Benchmarks/ASCI_Purple/SMG2000/smg2000.test  166522.00  166490.00 -0.0%
                                                                                    test-suite :: External/SPEC/CFP2017rate/508.namd_r/508.namd_r.test  722252.00  722092.00 -0.0%
                                                                             test-suite :: MultiSource/Benchmarks/DOE-ProxyApps-C/miniGMG/miniGMG.test   27554.00   27546.00 -0.0%
                                                                  test-suite :: SingleSource/UnitTests/Vectorizer/VPlanNativePath/outer-loop-vect.test   10900.00   10896.00 -0.0%
                                                          test-suite :: MultiSource/Benchmarks/TSVC/CrossingThresholds-dbl/CrossingThresholds-dbl.test   46754.00   46732.00 -0.0%
                                                                                       test-suite :: MultiSource/Benchmarks/tramp3d-v4/tramp3d-v4.test  631570.00  631226.00 -0.1%
                                                                                         test-suite :: MultiSource/Benchmarks/7zip/7zip-benchmark.test  850698.00  850218.00 -0.1%
                                                                           test-suite :: MultiSource/Benchmarks/MiBench/telecomm-gsm/telecomm-gsm.test   24816.00   24800.00 -0.1%
                                                                                  test-suite :: MultiSource/Benchmarks/mediabench/gsm/toast/toast.test   24814.00   24798.00 -0.1%
                                                                                test-suite :: External/SPEC/CFP2017rate/510.parest_r/510.parest_r.test 1599946.00 1598394.00 -0.1%
                                                                                                   test-suite :: MultiSource/Applications/hbd/hbd.test   27236.00   27204.00 -0.1%
                                                                                          test-suite :: MultiSource/Applications/JM/ldecod/ldecod.test  293848.00  293480.00 -0.1%
                                                                                test-suite :: MultiSource/Benchmarks/Prolangs-C/compiler/compiler.test   20160.00   20048.00 -0.6%
                                                                               test-suite :: MultiSource/Benchmarks/MallocBench/espresso/espresso.test  182088.00  181040.00 -0.6%
                                                                           test-suite :: MultiSource/Benchmarks/mediabench/g721/g721encode/encode.test    4788.00    4748.00 -0.8%

DOE-ProxyApps-C++/miniFE - extra vector code
MiBench/automotive-susan - small variations
Benchmarks/Bullet - extra vector code
CFP2017rate/511.povray_r - slightly better vector code
TSVC/StatementReordering-dbl - small variations
CINT2017rate/523.xalancbmk_r
CINT2017speed/623.xalancbmk_s - extra vector code
CINT2017rate/502.gcc_r
CINT2017speed/602.gcc_s - extra vector code
TSVC/CrossingThresholds-flt - small variations
Applications/sqlite3 - extra vector code
JM/lencod - extra vector code, small variations
CINT2017rate/525.x264_r
CINT2017speed/625.x264_s - small variations
CFP2017rate/526.blender_r - extra vector code, small variations
DOE-ProxyApps-C/miniGMG - small variations
Vectorizer/VPlanNativePath/outer-loop-vect - small variations
TSVC/CrossingThresholds-dbl - small variations
Benchmarks/tramp3d-v4 - small variations
Benchmarks/7zip - extra vector code
MiBench/telecomm-gsm - small variations
mediabench/gsm/toast - small variations
CFP2017rate/510.parest_r - extra vector code
Applications/hbd - extra vector code
JM/ldecod - better vector code
Prolangs-C/compiler - extra vector code
MallocBench/espresso - extra vector code
mediabench/g721/g721encode - extra vectorization

Reviewers: RKSimon

Reviewed By: RKSimon

Pull Request: https://github.com/llvm/llvm-project/pull/107461
2024-09-17 06:57:47 -04:00
Antonio Frighetto
942e872d5b [Instrumentation] Do not request sanitizers for naked functions
Sanitizers instrumentation may be incompatible with naked functions,
which lack of standard prologue/epilogue.
2024-09-17 09:23:39 +02:00
Alexey Bataev
18ef467d73 [SLP]Fix PR108709: postpone buildvector clustered nodes, if required
The "clustered" nodes for buildvector nodes must be postponed in
accordance with the global flag, otherwise it may cause crash because of
the dependency between phi nodes.
2024-09-16 09:53:46 -07:00
Alexey Bataev
f564a48f0e [SLP]Fix PR108700: correctly identify id of the operand node
If the operand node for truncs is not created during construction, but
one of the previous ones is reused instead, need to correctly identify
its index, to correctly emit the code.

Fixes https://github.com/llvm/llvm-project/issues/108700
2024-09-16 09:44:47 -07:00
Kolya Panchenko
b592917eec
[LV] Added verification of EVL recipes (#107630) 2024-09-16 11:58:29 -04:00
Kazu Hirata
f4a3309c9a
[IPO] Avoid repeated hash lookups (NFC) (#108796) 2024-09-16 06:44:34 -07:00
Phoebe Wang
af5a45b34b
[X86,SimplifyCFG] Use passthru to reduce select (#108754) 2024-09-16 20:20:36 +08:00
Nikita Popov
b7e51b4f13
[IPSCCP] Infer attributes on arguments (#107114)
During inter-procedural SCCP, also infer attributes on arguments, not
just return values. This allows other non-interprocedural passes to make
use of the information later.
2024-09-16 10:23:41 +02:00
David Sherwood
b29c5b66fd
[NFC][LoopVectorize] Dont pass LLVMContext to VPTypeAnalysis constructor (#108540)
We already pass a Type object into the VPTypeAnalysis constructor, which
can be used to obtain the context. While in the same area it also made
sense to avoid passing the context into the VPTransformState and
VPCostContext constructors.
2024-09-16 09:12:11 +01:00
Antonio Frighetto
2ae968a0d9
[Instrumentation] Move out to Utils (NFC) (#108532)
Utility functions have been moved out to Utils. Minor opportunity to
drop the header where not needed.
2024-09-15 21:07:40 -07:00
Yingwei Zheng
87663fdab9
[VectorCombine] Don't shrink lshr if the shamt is not less than bitwidth (#108705)
Consider the following case:
```
define <2 x i32> @test(<2 x i64> %vec.ind16, <2 x i32> %broadcast.splat20) {
  %19 = icmp eq <2 x i64> %vec.ind16, zeroinitializer
  %20 = zext <2 x i1> %19 to <2 x i32>
  %21 = lshr <2 x i32> %20, %broadcast.splat20
  ret <2 x i32> %21
}
```
After https://github.com/llvm/llvm-project/pull/104606, we shrink the
lshr into:
```
define <2 x i32> @test(<2 x i64> %vec.ind16, <2 x i32> %broadcast.splat20) {
  %1 = icmp eq <2 x i64> %vec.ind16, zeroinitializer
  %2 = trunc <2 x i32> %broadcast.splat20 to <2 x i1>
  %3 = lshr <2 x i1> %1, %2
  %4 = zext <2 x i1> %3 to <2 x i32>
  ret <2 x i32> %4
}
```
It is incorrect since `lshr i1 X, 1` returns `poison`.
This patch adds additional check on the shamt operand. The lshr will get
shrunk iff we ensure that the shamt is less than bitwidth of the smaller
type. As `computeKnownBits(&I, *DL).countMaxActiveBits() > BW` always
evaluates to true for `lshr(zext(X), Y)`, this check will only apply to
bitwise logical instructions.

Alive2: https://alive2.llvm.org/ce/z/j_RmTa
Fixes https://github.com/llvm/llvm-project/issues/108698.
2024-09-15 18:38:06 +08:00
c8ef
86f0399c1f
[InstCombine] Fold expression using basic properties of floor and ceiling function (#107107)
alive2: ~~https://alive2.llvm.org/ce/z/Ag3Ki7~~
https://alive2.llvm.org/ce/z/ywP5t2
related: #76438

This patch adds the following foldings: `floor(x) <= x --> true` and `x
<= ceil(x) --> true`. We leverage the properties of these math functions
and ensure there is no floating point input of `nan`.

---------

Co-authored-by: Yingwei Zheng <dtcxzyw@qq.com>
2024-09-15 14:25:00 +04:00
Florian Hahn
012dbec604
[VPlan] Handle ForceTargetInstructionCost in during precomputeCosts.
Make sure ForceTargetInstruction is respected in precomputeCosts.
2024-09-15 10:53:43 +01:00
Florian Hahn
f66509bf52
[VPlan] Clarify comment for replaceVPBBWithIRVPBB and add assert (NFCI).
Follow-up to suggestion during
https://github.com/llvm/llvm-project/pull/100735.

More specifically
9a40ed0919 (diff-6d0b73adfa9f8465923d2225ab6674ddcdeab71666f7a73dfaec7fa1246b3a1f)
2024-09-14 21:51:19 +01:00
Florian Hahn
cfe3f5fa61
[VPlan] Remove unneeded ExitBB variable after f0c5caa814.
Fix buildbot failures due to an unused variable, e.g.
https://lab.llvm.org/buildbot/#/builders/186/builds/2329
2024-09-14 21:35:45 +01:00
Florian Hahn
f0c5caa814
[VPlan] Add VPIRInstruction, use for exit block live-outs. (#100735)
Add a new VPIRInstruction recipe to wrap existing IR instructions not to
be modified during execution, execept for PHIs. For PHIs, a single
VPValue
operand is allowed, and it is used to add a new incoming value for the
single predecessor VPBB. Expect PHIs, VPIRInstructions cannot have any
operands.

Depends on https://github.com/llvm/llvm-project/pull/100658.

PR: https://github.com/llvm/llvm-project/pull/100735
2024-09-14 21:21:55 +01:00
Mircea Trofin
82266d3a2b
[nfc][ctx_prof] Factor the callsite instrumentation exclusion criteria (#108471)
Reusing this in the logic fetching the instrumentation in `CtxProfAnalysis`.
2024-09-13 21:25:47 -07:00
Teresa Johnson
12d4769cb8
Revert "[MemProf] Streamline and avoid unnecessary context id duplication (#107918)" (#108652)
This reverts commit 524a028f69cdf25503912c396ebda7ebf0065ed2, but
manually so that follow on PR108086 /
ae5f1a78d3a930466f927989faac8e0b9d820a7b
is retained (NFC patch to convert tuple to a struct).
2024-09-13 16:20:43 -07:00
Alexey Bataev
1e3536ef31 [SLP]Fix PR108620: Need to check, if the reduced value was transformed
Before trying to include the scalar into the list of
ExternallyUsedValues, need to check, if it was transformed in previous
iteration and use the transformed value, not the original one, to avoid
compiler crash when building external uses.

Fixes https://github.com/llvm/llvm-project/issues/108620
2024-09-13 15:43:06 -07:00
Felipe de Azevedo Piovezan
ddcc601353
[CoroSplit][DebugInfo] Adjust heuristic for moving DIScope of funclets (#108611)
CoroSplit has a heuristic where the scope line for funclets is adjusted
to match the line of the suspend intrinsic that caused the split. This
is useful as it avoids a jump on the line table from the original
function declaration to the line where the split happens.

However, very often using the line of the split is not ideal: if we can
avoid it, we should not have a line entry for the split location, as
this would cause breakpoints by line to match against two functions: the
funclet before and the funclet after the split.

This patch adjusts the heuristics to look for the first instruction with
a non-zero line number after the split. In other words, this patch makes
breakpoints on `await foo()` lines behave much more like a regular
function call.
2024-09-13 15:25:11 -07:00
Florian Hahn
c3fda44147
[VPlan] Use VPBuilder to create scalar IV steps and derived IV (NFCI).
Extend VPBuilder to allow creating VPDerivedIVRecipe, VPScalarCastRecipe
and VPScalarIVStepsRecipe.

Use them to simplify the code to create scalar IV steps slightly.
2024-09-13 22:19:36 +01:00
vporpo
5130f3236f
[SandboxVec] User-defined pass pipeline (#108625)
This patch adds support for a user-defined pass-pipeline that overrides
the default pipeline of the vectorizer.
This will commonly be used by lit tests.
2024-09-13 13:14:06 -07:00
Ramkumar Ramachandra
75a57edadc
VPlan/Builder: inline VPBuilder::createICmp (NFC) (#105650)
Inline VPBuilder::createICmp in the header, in line with the other
VPBuilder functions.
2024-09-13 20:08:11 +01:00
Volodymyr Vasylkun
21e3a212c5
[InstCombine] Replace an integer comparison of a phi node with multiple ucmp/scmp operands and a constant with phi of individual comparisons of original intrinsic's arguments (#107769)
When we have a `phi` instruction with more than one of its incoming
values being a call to `ucmp` or `scmp`, which is then compared with an
integer constant, we can move the comparison through the `phi` into the
incoming basic blocks because we know that a comparison of `ucmp`/`scmp`
with a constant will be simplified by the next iteration of InstCombine.

There's a high chance that other similar patterns can be identified, in
which case they can be easily handled by the same code by moving the
check for "simplifiable" instructions into a lambda.
2024-09-13 19:50:27 +01:00
Alexey Bataev
c13bf6d4a8 [SLP]Return proper value for phi vectorized node
Should not return the original phi vector instruction, need to return
actual vectorized value as a result.
2024-09-13 11:30:29 -07:00
vporpo
39f2d2f156
[SandboxVec] Boilerplate for vectorization passes (#108603)
This patch implements a new empty pass for the Bottom-up vectorizer and
creates a pass pipeline that includes it.
The SandboxVectorizer LLVM pass runs the Sandbox IR pass pipeline.
2024-09-13 11:22:24 -07:00
Tyler Nowicki
4c040c0275
[Coroutines] Move Shape to its own header (#108242)
* To create custom ABIs plugin libraries need access to CoroShape.
* As a step in enabling plugin libraries, move Shape into its own header
* The header will eventually be moved into include/llvm/Transforms/Coroutines

See RFC for more info:
https://discourse.llvm.org/t/rfc-abi-objects-for-coroutines/81057
2024-09-13 14:11:30 -04:00
Florian Hahn
76fd69be74
[VPlan] Simplify VPBuilder insert point when live outs for FORs.
Simplifies setting the insert point, addressing a TODO.
2024-09-13 13:21:23 +01:00