llvm-project

Author	SHA1	Message	Date
Jorge Gorbe Moya	71e434d302	[SandboxVec] Reapply "Add barebones Region class. (#108899 )" (#109059 ) A `#ifndef NDEBUG` in the wrong place caused an error in release builds.	2024-09-18 11:36:45 -07:00
Noah Goldstein	37932643ab	[SimplifyCFG] Deduce paths unreachable if they cause div/rem UB Same we way mark a path unreachable if it may cause a nullptr dereference, div/rem by zero or signed div/rem of INT_MIN by -1 cause immediate UB. Closes #109008	2024-09-18 12:59:52 -05:00
Florian Hahn	0d736e296c	[VPlan] Add getSCEVExprForVPValue util, use to get trip count SCEV (NFC) (#94464 ) Add a new getSCEVExprForVPValue utility which can be used to get a SCEV expression for a VPValue. The initial implementation only returns SCEVs for live-in IR values (by constructing a SCEV based on the live-in IR value) and VPExpandSCEVRecipe. This is enough to serve its first use, getting a SCEV for a VPlan's trip count, but will be extended in the future. It also removes createTripCountSCEV, as the new helper can be used to retrieve the SCEV from the VPlan. PR: https://github.com/llvm/llvm-project/pull/94464	2024-09-18 14:41:56 +01:00
David Green	403897484f	[InstCombine] Return FRem, as opposed to substituteInParent. This attempts to fix the ASan buildbot, which is detecting that CI is used after it is removed in substituteInParent. The idea was to make sure it was removed even if it had side-effects writing errno, but that appears to happen if we return FRem directly as usual.	2024-09-18 12:32:47 +01:00
Shih-Po Hung	ffcff2f465	[VPlan][NFC] Fix the value name of VECTOR_GEP (#107544 ) This patch passes the string `"vector.gep"` to CreateGEP instead of CreateMul.	2024-09-18 19:22:36 +08:00
Yingwei Zheng	872932b7a9	[InstCombine] Generalize `icmp (shl nuw C2, Y), C -> icmp Y, C3` (#104696 ) The motivation of this patch is to fold more generalized patterns like `icmp ult (shl nuw 16, X), 64 -> icmp ult X, 2`. Alive2: https://alive2.llvm.org/ce/z/gyqjQH	2024-09-18 19:10:41 +08:00
Benjamin Maxwell	43c9203d49	[TLI] Support inferring function attributes for sincos[f\|l] (#108554 )	2024-09-18 09:40:29 +01:00
David Green	112aac4e89	[InstCombine] Fold fmod to frem if we know it does not set errno. (#107912 ) fmod will be folded to frem in clang under -fno-math-errno and can be constant folded in llvm if the operands are known. It can be relatively common to have fp code that handles special values before doing some calculation: ``` if (isnan(f)) return handlenan; if (isinf(f)) return handleinf; .. fmod(f, 2.0) ``` This patch enables the folding of fmod to frem in instcombine if the first parameter is not inf and the second is not zero. Other combinations do not set errno. The same transform is performed for fmod with the nnan flag, which implies the input is known to not be inf/zero.	2024-09-18 09:38:28 +01:00
Chengjun	94a98cf5dc	[InstCombine] Remove dead phi web (#108876 ) In current visitPHINode function during InstCombine, it can remove dead phi cycles (all phis have one use, which is another phi). However, it cannot deal with the case when the phis form a web (all phis have one or more uses, and all the uses are phi). This change extends the algorithm so that it can also deal with the dead phi web.	2024-09-18 10:04:49 +02:00
LiqinWeng	a2994b2999	[LV][NFC] Unify printing for WidenEVLReicpe with other EVL recipes (#108177 )	2024-09-18 15:03:37 +08:00
Mircea Trofin	b2d3c315d5	[ctx_prof] Fix checks in `PGOCtxprofFlattening` (#108467 ) The assertion that all out-edges of a BB can't be 0 is incorrect: they can be, if that branch is on a cold subgraph. Added validators and asserts about the expected proprerties of the propagated counters.	2024-09-17 18:19:20 -07:00
vporpo	42c5a301f5	[SandboxVec] Legality boilerplate (#108650 ) This patch adds the basic API for the Legality component of the vectorizer. It also adds some very basic code in the bottom-up vectorizer that uses the API.	2024-09-17 17:06:29 -07:00
Jorge Gorbe Moya	aa2e6b8734	Revert "[SandboxVec] Add barebones Region class." (#109058 ) Reverts llvm/llvm-project#108899 It broke the llvm-clang-x86_64-win-fast buildbot.	2024-09-17 15:47:30 -07:00
Jorge Gorbe Moya	3aecf41c2b	[SandboxVec] Add barebones Region class. (#108899 ) A region identifies a set of vector instructions generated by vectorization passes. The vectorizer can then run a series of RegionPasses on the region, evaluate the cost, and commit/reject the transforms on a region-by-region basis, instead of an entire basic block. This is heavily based ov @vporpo's prototype. In particular, the doc comment for the Region class is all his. The rest of this commit is mostly boilerplate around a SetVector: getters, iterators, and some debug helpers.	2024-09-17 15:40:24 -07:00
Alex MacLean	790f2eb16a	[InstCombine] Avoid simplifying bitcast of undef to a zeroinitializer vector (#108872 ) In some cases, if an undef value is the product of another instcombine simplification, a bitcast of undef is simplified to a zeroinitializer vector instead of undef.	2024-09-17 15:31:28 -07:00
vporpo	318d2f5e5d	[SandboxVec][DAG] Boilerplate (#108862 ) This patch adds a very basic implementation of the Dependency Graph to be used by the vectorizer.	2024-09-17 12:03:52 -07:00
Noah Goldstein	419c53477e	[SimplifyCFG] Mark div/rem as not-cheap to sink if we are replacing const denominator Close #109007	2024-09-17 12:04:34 -05:00
Andreas Jonson	a0d00c94c2	[SimplifyCFG] Swap range metadata to attribute for calls. (#108984 ) Among the last usages of range metadata for call before being able to deprecate and only have the range attribute for calls.	2024-09-17 18:25:53 +02:00
Farzon Lotfi	0f97b4824a	[Scalarizer][DirectX] Add support for scalarization of Target intrinsics (#108776 ) Since we are using the Scalarizer pass in the backend we needed a way to allow this pass to operate on Target intrinsics. We achieved this by adding `TargetTransformInfo ` to the Scalarizer pass. This allowed us to call a function available to the DirectX backend to know if an intrinsic is a target intrinsic that should be scalarized.	2024-09-17 11:35:42 -04:00
Nikita Popov	848cec11f5	Revert "[SLP]Vectorize gathered loads" This reverts commit de1f5b96adcea52bf7c9670c46123fe1197050d2. This has a very large compile-time impact in some cases, in particular lencod. See: http://llvm-compile-time-tracker.com/compare.php?from=b1339abb713063363e7804124b8fb3d84143a003&to=de1f5b96adcea52bf7c9670c46123fe1197050d2&stat=instructions:u	2024-09-17 16:45:25 +02:00
Nikita Popov	34e16b6b9c	[IndVars] Fix strict weak ordering violation (#108947 ) The sort used the block name as a tie-breaker, which will not work for unnamed blocks and can result in a strict weak ordering violation. Fix this by checking that all exiting blocks dominate the latch first, which means that we have a total dominance order. This makes the code structure here align with what optimizeLoopExits() does. Fixes https://github.com/llvm/llvm-project/issues/108618.	2024-09-17 15:33:23 +02:00
David Sherwood	b84c42944a	[NFC][LoopVectorize] Rename variable in replaceVPBBWithIRVPBB (#108543 ) I've renamed the variable in replaceVPBBWithIRVPBB from IRMiddleVPBB -> IRVPBB, since the function is used for more than just replacing the middle VP block.	2024-09-17 12:54:55 +01:00
Alexey Bataev	de1f5b96ad	[SLP]Vectorize gathered loads Final gather/buildvector nodes may have scalar loads, which are not vectorized (since they are part of the gather nodes) but may form full vector loads, being combined. This patch walks over all gather nodes, "gathering" and sorting gathered scalar loads and then tries to build vector loads, which later are reshuffled between the gather nodes. It allows later to add support for segmented loads (kind of AOS to SOA load kind for RISC-V RVV) and may help with the removal of the alternat e opcodes support. Currently, alternate nodes may depend on each other because of the consecutive loads between their operands. Because of that we cannot simply remove alternate vectorization. But this approach may help to remove most of the stuff for it, since we'll be able to vectorize loads in between lanes. Metric: size..text, AVX512 Program size..text test-suite :: MultiSource/Benchmarks/ASCI_Purple/SMG2000/smg2000.test 238381.00 250669.00 5.2% test-suite :: SingleSource/UnitTests/Vectorizer/VPlanNativePath/outer-loop-vect.test 25753.00 26329.00 2.2% test-suite :: SingleSource/UnitTests/Vector/AVX512BWVL/Vector-AVX512BWVL-psadbw.test 3028.00 3092.00 2.1% test-suite :: MultiSource/Benchmarks/Rodinia/hotspot/hotspot.test 4243.00 4275.00 0.8% test-suite :: External/SPEC/CINT2017speed/625.x264_s/625.x264_s.test 649765.00 653877.00 0.6% test-suite :: External/SPEC/CINT2017rate/525.x264_r/525.x264_r.test 649765.00 653877.00 0.6% test-suite :: SingleSource/Benchmarks/BenchmarkGame/n-body.test 4199.00 4222.00 0.5% test-suite :: SingleSource/UnitTests/Vector/AVX512BWVL/Vector-AVX512BWVL-mask_set_bw.test 12933.00 12997.00 0.5% test-suite :: SingleSource/Benchmarks/Misc/flops.test 8282.00 8314.00 0.4% test-suite :: SingleSource/UnitTests/Vector/AVX512BWVL/Vector-AVX512BWVL-unpack_msasm.test 10065.00 10097.00 0.3% test-suite :: SingleSource/Benchmarks/Misc-C++/Large/ray.test 5160.00 5176.00 0.3% test-suite :: External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 12472220.00 12509612.00 0.3% test-suite :: MultiSource/Benchmarks/Prolangs-C++/city/city.test 6908.00 6924.00 0.2% test-suite :: MultiSource/Benchmarks/MiBench/consumer-lame/consumer-lame.test 202830.00 203278.00 0.2% test-suite :: SingleSource/Benchmarks/CoyoteBench/fftbench.test 9133.00 9149.00 0.2% test-suite :: MultiSource/Benchmarks/Olden/power/power.test 6792.00 6803.00 0.2% test-suite :: External/SPEC/CFP2017rate/538.imagick_r/538.imagick_r.test 1395585.00 1397473.00 0.1% test-suite :: External/SPEC/CFP2017speed/638.imagick_s/638.imagick_s.test 1395585.00 1397473.00 0.1% test-suite :: External/SPEC/CINT2017speed/631.deepsjeng_s/631.deepsjeng_s.test 97662.00 97758.00 0.1% test-suite :: External/SPEC/CFP2006/447.dealII/447.dealII.test 595179.00 595739.00 0.1% test-suite :: MultiSource/Benchmarks/DOE-ProxyApps-C/miniAMR/miniAMR.test 70603.00 70667.00 0.1% test-suite :: MultiSource/Benchmarks/Prolangs-C/unix-smail/unix-smail.test 19877.00 19893.00 0.1% test-suite :: MultiSource/Benchmarks/DOE-ProxyApps-C++/PENNANT/PENNANT.test 90231.00 90279.00 0.1% test-suite :: External/SPEC/CINT2006/473.astar/473.astar.test 33738.00 33754.00 0.0% test-suite :: External/SPEC/CFP2017speed/619.lbm_s/619.lbm_s.test 13262.00 13268.00 0.0% test-suite :: External/SPEC/CFP2006/453.povray/453.povray.test 1139964.00 1140460.00 0.0% test-suite :: MultiSource/Applications/JM/lencod/lencod.test 849507.00 849875.00 0.0% test-suite :: External/SPEC/CFP2017rate/511.povray_r/511.povray_r.test 1158379.00 1158859.00 0.0% test-suite :: MultiSource/Benchmarks/DOE-ProxyApps-C/CoMD/CoMD.test 38724.00 38740.00 0.0% test-suite :: External/SPEC/CFP2006/470.lbm/470.lbm.test 15180.00 15186.00 0.0% test-suite :: External/SPEC/CFP2017rate/519.lbm_r/519.lbm_r.test 15484.00 15490.00 0.0% test-suite :: External/SPEC/CINT2006/456.hmmer/456.hmmer.test 167391.00 167455.00 0.0% test-suite :: MultiSource/Benchmarks/TSVC/ControlFlow-dbl/ControlFlow-dbl.test 137448.00 137496.00 0.0% test-suite :: External/SPEC/CFP2017rate/510.parest_r/510.parest_r.test 2030254.00 2030766.00 0.0% test-suite :: MicroBenchmarks/LCALS/SubsetALambdaLoops/lcalsALambda.test 302870.00 302934.00 0.0% test-suite :: MicroBenchmarks/LCALS/SubsetARawLoops/lcalsARaw.test 303126.00 303190.00 0.0% test-suite :: External/SPEC/CFP2006/444.namd/444.namd.test 241107.00 241155.00 0.0% test-suite :: External/SPEC/CFP2006/482.sphinx3/482.sphinx3.test 162974.00 163006.00 0.0% test-suite :: MultiSource/Applications/siod/siod.test 167168.00 167200.00 0.0% test-suite :: MultiSource/Benchmarks/7zip/7zip-benchmark.test 1048796.00 1048988.00 0.0% test-suite :: MultiSource/Benchmarks/DOE-ProxyApps-C++/CLAMR/CLAMR.test 201623.00 201655.00 0.0% test-suite :: MultiSource/Applications/sqlite3/sqlite3.test 501734.00 501798.00 0.0% test-suite :: MultiSource/Applications/ClamAV/clamscan.test 580888.00 580952.00 0.0% test-suite :: MultiSource/Benchmarks/MallocBench/gs/gs.test 168319.00 168335.00 0.0% test-suite :: MicroBenchmarks/ImageProcessing/Interpolation/Interpolation.test 226022.00 226038.00 0.0% test-suite :: MultiSource/Benchmarks/TSVC/StatementReordering-flt/StatementReordering-flt.test 118011.00 118015.00 0.0% test-suite :: External/SPEC/CINT2006/471.omnetpp/471.omnetpp.test 550589.00 550605.00 0.0% test-suite :: External/SPEC/CINT2006/403.gcc/403.gcc.test 3072477.00 3072541.00 0.0% test-suite :: External/SPEC/CINT2006/483.xalancbmk/483.xalancbmk.test 2385563.00 2385579.00 0.0% test-suite :: MultiSource/Applications/JM/ldecod/ldecod.test 389171.00 389155.00 -0.0% test-suite :: MultiSource/Applications/lua/lua.test 234764.00 234748.00 -0.0% test-suite :: MultiSource/Benchmarks/mafft/pairlocalalign.test 227694.00 227678.00 -0.0% test-suite :: MultiSource/Benchmarks/TSVC/NodeSplitting-flt/NodeSplitting-flt.test 119819.00 119807.00 -0.0% test-suite :: MultiSource/Benchmarks/TSVC/Recurrences-flt/Recurrences-flt.test 117995.00 117983.00 -0.0% test-suite :: MultiSource/Benchmarks/TSVC/InductionVariable-flt/InductionVariable-flt.test 123610.00 123594.00 -0.0% test-suite :: MultiSource/Benchmarks/FreeBench/pifft/pifft.test 81414.00 81398.00 -0.0% test-suite :: External/SPEC/CINT2006/464.h264ref/464.h264ref.test 782040.00 781880.00 -0.0% test-suite :: External/SPEC/CINT2017speed/602.gcc_s/602.gcc_s.test 9597420.00 9595292.00 -0.0% test-suite :: External/SPEC/CINT2017rate/502.gcc_r/502.gcc_r.test 9597420.00 9595292.00 -0.0% test-suite :: External/SPEC/CINT2006/445.gobmk/445.gobmk.test 911832.00 911608.00 -0.0% test-suite :: MultiSource/Applications/oggenc/oggenc.test 192507.00 192459.00 -0.0% test-suite :: MultiSource/Benchmarks/TSVC/LoopRestructuring-flt/LoopRestructuring-flt.test 122843.00 122811.00 -0.0% test-suite :: MultiSource/Benchmarks/TSVC/CrossingThresholds-flt/CrossingThresholds-flt.test 122292.00 122260.00 -0.0% test-suite :: External/SPEC/CFP2017rate/508.namd_r/508.namd_r.test 777363.00 777155.00 -0.0% test-suite :: MultiSource/Benchmarks/TSVC/Expansion-flt/Expansion-flt.test 123265.00 123205.00 -0.0% test-suite :: MultiSource/Benchmarks/Bullet/bullet.test 315534.00 315358.00 -0.1% test-suite :: MultiSource/Benchmarks/TSVC/ControlFlow-flt/ControlFlow-flt.test 128163.00 128083.00 -0.1% test-suite :: MultiSource/Benchmarks/mediabench/g721/g721encode/encode.test 6562.00 6555.00 -0.1% test-suite :: MultiSource/Benchmarks/Prolangs-C/compiler/compiler.test 23428.00 23396.00 -0.1% test-suite :: MultiSource/Benchmarks/FreeBench/fourinarow/fourinarow.test 22749.00 22717.00 -0.1% test-suite :: MultiSource/Benchmarks/MiBench/telecomm-gsm/telecomm-gsm.test 39549.00 39485.00 -0.2% test-suite :: MultiSource/Benchmarks/mediabench/gsm/toast/toast.test 39546.00 39482.00 -0.2% test-suite :: MultiSource/Benchmarks/Prolangs-C/bison/mybison.test 57214.00 57118.00 -0.2% test-suite :: SingleSource/Benchmarks/Adobe-C++/loop_unroll.test 413668.00 412804.00 -0.2% test-suite :: MultiSource/Benchmarks/tramp3d-v4/tramp3d-v4.test 1044047.00 1041487.00 -0.2% test-suite :: MultiSource/Benchmarks/McCat/18-imp/imp.test 12414.00 12382.00 -0.3% test-suite :: MultiSource/Benchmarks/Prolangs-C/gnugo/gnugo.test 31161.00 30969.00 -0.6% test-suite :: MultiSource/Benchmarks/MallocBench/espresso/espresso.test 224726.00 223254.00 -0.7% test-suite :: MultiSource/Benchmarks/DOE-ProxyApps-C++/miniFE/miniFE.test 93512.00 92824.00 -0.7% test-suite :: MultiSource/Benchmarks/Prolangs-C/TimberWolfMC/timberwolfmc.test 281151.00 278463.00 -1.0% test-suite :: MultiSource/Benchmarks/Olden/tsp/tsp.test 2820.00 2788.00 -1.1% test-suite :: External/SPEC/CFP2006/433.milc/433.milc.test 156819.00 154739.00 -1.3% test-suite :: MultiSource/Benchmarks/MiBench/security-blowfish/security-blowfish.test 11560.00 11160.00 -3.5% test-suite :: MultiSource/Benchmarks/McCat/08-main/main.test 6734.00 6382.00 -5.2% results results0 diff ASCI_Purple/SMG2000 - extra vector code VPlanNativePath/outer-loop-vect - extra vectorization, better vector code AVX512BWVL/Vector-AVX512BWVL-psadbw - better vector code Rodinia/hotspot - small variations CINT2017speed/625.x264_s CINT2017rate/525.x264_r - extra vector code, better vectorization BenchmarkGame/n-body - better vector code. AVX512BWVL/Vector-AVX512BWVL-unpack_msasm - small variations Misc/flops - extra vector code AVX512BWVL/Vector-AVX512BWVL-mask_set_bw - small variations Misc-C++/Large - better vector code CFP2017rate/526.blender_r - extra vector code Prolangs-C++/city - extra vector code MiBench/consumer-lame - extra vector code CoyoteBench/fftbench - extra vector code Olden/power - better vector code CFP2017rate/538.imagick_r CFP2017speed/638.imagick_s - extra vector code CINT2017rate/531.deepsjeng_r - extra vector code CFP2006/447.dealII - small variations DOE-ProxyApps-C/miniAMR - small variations Prolangs-C/unix-smail - small variations DOE-ProxyApps-C++/PENNANT - small variations CINT2006/473.astar - small variations CFP2006/453.povray - small variations JM/lencod - extra vector code CFP2017rate/511.povray_r - small variations DOE-ProxyApps-C/CoMD - small variations CFP2006/470.lbm - extra vector code CFP2017speed/619.lbm_s CFP2017rate/519.lbm_r - extra vector code CINT2006/456.hmmer - extra code vectorized TSVC/ControlFlow-dbl - extra vector code CFP2017rate/510.parest_r - better vector code LCALS/SubsetALambdaLoops - extra code vectorized LCALS/SubsetARawLoops - extra code vectorized CFP2006/444.namd - extra code vectorized CFP2006/482.sphinx3 - better vector code Applications/siod - better vector code Benchmarks/7zip - better vector code DOE-ProxyApps-C++/CLAMR - extra code vectorized Applications/sqlite3 - extra code vectorized Applications/ClamAV - smaller vector code MallocBench/gs - small variations MicroBenchmarks/ImageProcessing - small variations TSVC/StatementReordering-flt - extra code vectorized CINT2006/471.omnetpp - small variations CINT2006/403.gcc - extra code vectorized CINT2006/483.xalancbmk - extra code vectorized JM/ldecod - small variations Applications/lua - extra code vectorized mafft/pairlocalalign - small variations TSVC/NodeSplitting-flt - extra code vectorized TSVC/Recurrences-flt - extra code vectorized TSVC/InductionVariable-flt - extra code vectorized FreeBench/pifft - small variations CINT2006/464.h264ref - extra code vectorized CINT2017speed/602.gcc_s CINT2017rate/502.gcc_r - some extra code vectorized, extra code inlined CINT2006/445.gobmk - small variations Applications/oggenc - small variations TSVC/LoopRestructuring-flt - extra code vectorized TSVC/CrossingThresholds-flt - extra code vectorized CFP2017rate/508.namd_r - small variations TSVC/ControlFlow-flt - extra code vectorized mediabench/g721 - small variations Prolangs-C/compiler - small variations FreeBench/fourinarow - better vector code MiBench/telecomm-gsm - small variation in vector code mediabench/gsm - same Prolangs-C/bison - small variations Adobe-C++/loop_unroll - extra code vectorized Benchmarks/tramp3d-v4 - extra code gets inlined, small changes in vetor code McCat/18-imp - variations in vector code Prolangs-C/gnugo - variations in vector code MallocBench/espresso - extra code vectorized DOE-ProxyApps-C++/miniFE - small variations in vector code Prolangs-C/TimberWolfMC - extra code vectorized, small changes in previously vectorized code. Olden/tsp - small changes in vector code CFP2006/433.milc - extra code gets inlined, vectorized 2 x stores to 4 x stores MiBench/security-blowfish - extra code vectorized McCat/08-main - better vector code. Metric: size..text, RISCV, sifive-p670 Program size..text results results0 diff test-suite :: MultiSource/Benchmarks/DOE-ProxyApps-C++/miniFE/miniFE.test 63580.00 64020.00 0.7% test-suite :: MultiSource/Benchmarks/MiBench/automotive-susan/automotive-susan.test 21388.00 21406.00 0.1% test-suite :: MultiSource/Benchmarks/Bullet/bullet.test 296992.00 297088.00 0.0% test-suite :: External/SPEC/CFP2017rate/511.povray_r/511.povray_r.test 968112.00 968208.00 0.0% test-suite :: MultiSource/Benchmarks/TSVC/StatementReordering-dbl/StatementReordering-dbl.test 45160.00 45164.00 0.0% test-suite :: External/SPEC/CINT2017rate/523.xalancbmk_r/523.xalancbmk_r.test 2635902.00 2635854.00 -0.0% test-suite :: External/SPEC/CINT2017speed/623.xalancbmk_s/623.xalancbmk_s.test 2635902.00 2635854.00 -0.0% test-suite :: External/SPEC/CINT2017rate/502.gcc_r/502.gcc_r.test 7568730.00 7568578.00 -0.0% test-suite :: External/SPEC/CINT2017speed/602.gcc_s/602.gcc_s.test 7568730.00 7568578.00 -0.0% test-suite :: MultiSource/Benchmarks/TSVC/CrossingThresholds-flt/CrossingThresholds-flt.test 49764.00 49762.00 -0.0% test-suite :: MultiSource/Applications/sqlite3/sqlite3.test 449132.00 449108.00 -0.0% test-suite :: MultiSource/Applications/JM/lencod/lencod.test 695932.00 695892.00 -0.0% test-suite :: External/SPEC/CINT2017rate/525.x264_r/525.x264_r.test 508820.00 508788.00 -0.0% test-suite :: External/SPEC/CINT2017speed/625.x264_s/625.x264_s.test 508820.00 508788.00 -0.0% test-suite :: External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 9594152.00 9593336.00 -0.0% test-suite :: MultiSource/Benchmarks/ASCI_Purple/SMG2000/smg2000.test 166522.00 166490.00 -0.0% test-suite :: External/SPEC/CFP2017rate/508.namd_r/508.namd_r.test 722252.00 722092.00 -0.0% test-suite :: MultiSource/Benchmarks/DOE-ProxyApps-C/miniGMG/miniGMG.test 27554.00 27546.00 -0.0% test-suite :: SingleSource/UnitTests/Vectorizer/VPlanNativePath/outer-loop-vect.test 10900.00 10896.00 -0.0% test-suite :: MultiSource/Benchmarks/TSVC/CrossingThresholds-dbl/CrossingThresholds-dbl.test 46754.00 46732.00 -0.0% test-suite :: MultiSource/Benchmarks/tramp3d-v4/tramp3d-v4.test 631570.00 631226.00 -0.1% test-suite :: MultiSource/Benchmarks/7zip/7zip-benchmark.test 850698.00 850218.00 -0.1% test-suite :: MultiSource/Benchmarks/MiBench/telecomm-gsm/telecomm-gsm.test 24816.00 24800.00 -0.1% test-suite :: MultiSource/Benchmarks/mediabench/gsm/toast/toast.test 24814.00 24798.00 -0.1% test-suite :: External/SPEC/CFP2017rate/510.parest_r/510.parest_r.test 1599946.00 1598394.00 -0.1% test-suite :: MultiSource/Applications/hbd/hbd.test 27236.00 27204.00 -0.1% test-suite :: MultiSource/Applications/JM/ldecod/ldecod.test 293848.00 293480.00 -0.1% test-suite :: MultiSource/Benchmarks/Prolangs-C/compiler/compiler.test 20160.00 20048.00 -0.6% test-suite :: MultiSource/Benchmarks/MallocBench/espresso/espresso.test 182088.00 181040.00 -0.6% test-suite :: MultiSource/Benchmarks/mediabench/g721/g721encode/encode.test 4788.00 4748.00 -0.8% DOE-ProxyApps-C++/miniFE - extra vector code MiBench/automotive-susan - small variations Benchmarks/Bullet - extra vector code CFP2017rate/511.povray_r - slightly better vector code TSVC/StatementReordering-dbl - small variations CINT2017rate/523.xalancbmk_r CINT2017speed/623.xalancbmk_s - extra vector code CINT2017rate/502.gcc_r CINT2017speed/602.gcc_s - extra vector code TSVC/CrossingThresholds-flt - small variations Applications/sqlite3 - extra vector code JM/lencod - extra vector code, small variations CINT2017rate/525.x264_r CINT2017speed/625.x264_s - small variations CFP2017rate/526.blender_r - extra vector code, small variations DOE-ProxyApps-C/miniGMG - small variations Vectorizer/VPlanNativePath/outer-loop-vect - small variations TSVC/CrossingThresholds-dbl - small variations Benchmarks/tramp3d-v4 - small variations Benchmarks/7zip - extra vector code MiBench/telecomm-gsm - small variations mediabench/gsm/toast - small variations CFP2017rate/510.parest_r - extra vector code Applications/hbd - extra vector code JM/ldecod - better vector code Prolangs-C/compiler - extra vector code MallocBench/espresso - extra vector code mediabench/g721/g721encode - extra vectorization Reviewers: RKSimon Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/107461	2024-09-17 06:57:47 -04:00
Antonio Frighetto	942e872d5b	[Instrumentation] Do not request sanitizers for naked functions Sanitizers instrumentation may be incompatible with naked functions, which lack of standard prologue/epilogue.	2024-09-17 09:23:39 +02:00
Alexey Bataev	18ef467d73	[SLP]Fix PR108709: postpone buildvector clustered nodes, if required The "clustered" nodes for buildvector nodes must be postponed in accordance with the global flag, otherwise it may cause crash because of the dependency between phi nodes.	2024-09-16 09:53:46 -07:00
Alexey Bataev	f564a48f0e	[SLP]Fix PR108700: correctly identify id of the operand node If the operand node for truncs is not created during construction, but one of the previous ones is reused instead, need to correctly identify its index, to correctly emit the code. Fixes https://github.com/llvm/llvm-project/issues/108700	2024-09-16 09:44:47 -07:00
Kolya Panchenko	b592917eec	[LV] Added verification of EVL recipes (#107630 )	2024-09-16 11:58:29 -04:00
Kazu Hirata	f4a3309c9a	[IPO] Avoid repeated hash lookups (NFC) (#108796 )	2024-09-16 06:44:34 -07:00
Phoebe Wang	af5a45b34b	[X86,SimplifyCFG] Use passthru to reduce select (#108754 )	2024-09-16 20:20:36 +08:00
Nikita Popov	b7e51b4f13	[IPSCCP] Infer attributes on arguments (#107114 ) During inter-procedural SCCP, also infer attributes on arguments, not just return values. This allows other non-interprocedural passes to make use of the information later.	2024-09-16 10:23:41 +02:00
David Sherwood	b29c5b66fd	[NFC][LoopVectorize] Dont pass LLVMContext to VPTypeAnalysis constructor (#108540 ) We already pass a Type object into the VPTypeAnalysis constructor, which can be used to obtain the context. While in the same area it also made sense to avoid passing the context into the VPTransformState and VPCostContext constructors.	2024-09-16 09:12:11 +01:00
Antonio Frighetto	2ae968a0d9	[Instrumentation] Move out to Utils (NFC) (#108532 ) Utility functions have been moved out to Utils. Minor opportunity to drop the header where not needed.	2024-09-15 21:07:40 -07:00
Yingwei Zheng	87663fdab9	[VectorCombine] Don't shrink lshr if the shamt is not less than bitwidth (#108705 ) Consider the following case: ``` define <2 x i32> @test(<2 x i64> %vec.ind16, <2 x i32> %broadcast.splat20) { %19 = icmp eq <2 x i64> %vec.ind16, zeroinitializer %20 = zext <2 x i1> %19 to <2 x i32> %21 = lshr <2 x i32> %20, %broadcast.splat20 ret <2 x i32> %21 } ``` After https://github.com/llvm/llvm-project/pull/104606, we shrink the lshr into: ``` define <2 x i32> @test(<2 x i64> %vec.ind16, <2 x i32> %broadcast.splat20) { %1 = icmp eq <2 x i64> %vec.ind16, zeroinitializer %2 = trunc <2 x i32> %broadcast.splat20 to <2 x i1> %3 = lshr <2 x i1> %1, %2 %4 = zext <2 x i1> %3 to <2 x i32> ret <2 x i32> %4 } ``` It is incorrect since `lshr i1 X, 1` returns `poison`. This patch adds additional check on the shamt operand. The lshr will get shrunk iff we ensure that the shamt is less than bitwidth of the smaller type. As `computeKnownBits(&I, *DL).countMaxActiveBits() > BW` always evaluates to true for `lshr(zext(X), Y)`, this check will only apply to bitwise logical instructions. Alive2: https://alive2.llvm.org/ce/z/j_RmTa Fixes https://github.com/llvm/llvm-project/issues/108698.	2024-09-15 18:38:06 +08:00
c8ef	86f0399c1f	[InstCombine] Fold expression using basic properties of floor and ceiling function (#107107 ) alive2: ~~https://alive2.llvm.org/ce/z/Ag3Ki7~~ https://alive2.llvm.org/ce/z/ywP5t2 related: #76438 This patch adds the following foldings: `floor(x) <= x --> true` and `x <= ceil(x) --> true`. We leverage the properties of these math functions and ensure there is no floating point input of `nan`. --------- Co-authored-by: Yingwei Zheng <dtcxzyw@qq.com>	2024-09-15 14:25:00 +04:00
Florian Hahn	012dbec604	[VPlan] Handle ForceTargetInstructionCost in during precomputeCosts. Make sure ForceTargetInstruction is respected in precomputeCosts.	2024-09-15 10:53:43 +01:00
Florian Hahn	f66509bf52	[VPlan] Clarify comment for replaceVPBBWithIRVPBB and add assert (NFCI). Follow-up to suggestion during https://github.com/llvm/llvm-project/pull/100735. More specifically `9a40ed0919 (diff-6d0b73adfa9f8465923d2225ab6674ddcdeab71666f7a73dfaec7fa1246b3a1f)`	2024-09-14 21:51:19 +01:00
Florian Hahn	cfe3f5fa61	[VPlan] Remove unneeded ExitBB variable after f0c5caa814. Fix buildbot failures due to an unused variable, e.g. https://lab.llvm.org/buildbot/#/builders/186/builds/2329	2024-09-14 21:35:45 +01:00
Florian Hahn	f0c5caa814	[VPlan] Add VPIRInstruction, use for exit block live-outs. (#100735 ) Add a new VPIRInstruction recipe to wrap existing IR instructions not to be modified during execution, execept for PHIs. For PHIs, a single VPValue operand is allowed, and it is used to add a new incoming value for the single predecessor VPBB. Expect PHIs, VPIRInstructions cannot have any operands. Depends on https://github.com/llvm/llvm-project/pull/100658. PR: https://github.com/llvm/llvm-project/pull/100735	2024-09-14 21:21:55 +01:00
Mircea Trofin	82266d3a2b	[nfc][ctx_prof] Factor the callsite instrumentation exclusion criteria (#108471 ) Reusing this in the logic fetching the instrumentation in `CtxProfAnalysis`.	2024-09-13 21:25:47 -07:00
Teresa Johnson	12d4769cb8	Revert "[MemProf] Streamline and avoid unnecessary context id duplication (#107918 )" (#108652 ) This reverts commit 524a028f69cdf25503912c396ebda7ebf0065ed2, but manually so that follow on PR108086 / ae5f1a78d3a930466f927989faac8e0b9d820a7b is retained (NFC patch to convert tuple to a struct).	2024-09-13 16:20:43 -07:00
Alexey Bataev	1e3536ef31	[SLP]Fix PR108620: Need to check, if the reduced value was transformed Before trying to include the scalar into the list of ExternallyUsedValues, need to check, if it was transformed in previous iteration and use the transformed value, not the original one, to avoid compiler crash when building external uses. Fixes https://github.com/llvm/llvm-project/issues/108620	2024-09-13 15:43:06 -07:00
Felipe de Azevedo Piovezan	ddcc601353	[CoroSplit][DebugInfo] Adjust heuristic for moving DIScope of funclets (#108611 ) CoroSplit has a heuristic where the scope line for funclets is adjusted to match the line of the suspend intrinsic that caused the split. This is useful as it avoids a jump on the line table from the original function declaration to the line where the split happens. However, very often using the line of the split is not ideal: if we can avoid it, we should not have a line entry for the split location, as this would cause breakpoints by line to match against two functions: the funclet before and the funclet after the split. This patch adjusts the heuristics to look for the first instruction with a non-zero line number after the split. In other words, this patch makes breakpoints on `await foo()` lines behave much more like a regular function call.	2024-09-13 15:25:11 -07:00
Florian Hahn	c3fda44147	[VPlan] Use VPBuilder to create scalar IV steps and derived IV (NFCI). Extend VPBuilder to allow creating VPDerivedIVRecipe, VPScalarCastRecipe and VPScalarIVStepsRecipe. Use them to simplify the code to create scalar IV steps slightly.	2024-09-13 22:19:36 +01:00
vporpo	5130f3236f	[SandboxVec] User-defined pass pipeline (#108625 ) This patch adds support for a user-defined pass-pipeline that overrides the default pipeline of the vectorizer. This will commonly be used by lit tests.	2024-09-13 13:14:06 -07:00
Ramkumar Ramachandra	75a57edadc	VPlan/Builder: inline VPBuilder::createICmp (NFC) (#105650 ) Inline VPBuilder::createICmp in the header, in line with the other VPBuilder functions.	2024-09-13 20:08:11 +01:00
Volodymyr Vasylkun	21e3a212c5	[InstCombine] Replace an integer comparison of a `phi` node with multiple `ucmp`/`scmp` operands and a constant with `phi` of individual comparisons of original intrinsic's arguments (#107769 ) When we have a `phi` instruction with more than one of its incoming values being a call to `ucmp` or `scmp`, which is then compared with an integer constant, we can move the comparison through the `phi` into the incoming basic blocks because we know that a comparison of `ucmp`/`scmp` with a constant will be simplified by the next iteration of InstCombine. There's a high chance that other similar patterns can be identified, in which case they can be easily handled by the same code by moving the check for "simplifiable" instructions into a lambda.	2024-09-13 19:50:27 +01:00
Alexey Bataev	c13bf6d4a8	[SLP]Return proper value for phi vectorized node Should not return the original phi vector instruction, need to return actual vectorized value as a result.	2024-09-13 11:30:29 -07:00
vporpo	39f2d2f156	[SandboxVec] Boilerplate for vectorization passes (#108603 ) This patch implements a new empty pass for the Bottom-up vectorizer and creates a pass pipeline that includes it. The SandboxVectorizer LLVM pass runs the Sandbox IR pass pipeline.	2024-09-13 11:22:24 -07:00
Tyler Nowicki	4c040c0275	[Coroutines] Move Shape to its own header (#108242 ) * To create custom ABIs plugin libraries need access to CoroShape. * As a step in enabling plugin libraries, move Shape into its own header * The header will eventually be moved into include/llvm/Transforms/Coroutines See RFC for more info: https://discourse.llvm.org/t/rfc-abi-objects-for-coroutines/81057	2024-09-13 14:11:30 -04:00
Florian Hahn	76fd69be74	[VPlan] Simplify VPBuilder insert point when live outs for FORs. Simplifies setting the insert point, addressing a TODO.	2024-09-13 13:21:23 +01:00

1 2 3 4 5 ...

37539 Commits