llvm-project

Author	SHA1	Message	Date
Kazu Hirata	2f8238f849	[llvm] Migrate away from PointerUnion::{is,get} (NFC) (#119679 ) Note that PointerUnion::{is,get} have been soft deprecated in PointerUnion.h: // FIXME: Replace the uses of is(), get() and dyn_cast() with // isa<T>, cast<T> and the llvm::dyn_cast<T> I'm not touching PointerUnion::dyn_cast for now because it's a bit complicated; we could blindly migrate it to dyn_cast_if_present, but we should probably use dyn_cast when the operand is known to be non-null.	2024-12-12 07:54:48 -08:00
Mel Chen	b3cba9be41	[LoopVectorize] Vectorize select-cmp reduction pattern for increasing integer induction variable (#67812 ) Consider the following loop: ``` int rdx = init; for (int i = 0; i < n; ++i) rdx = (a[i] > b[i]) ? i : rdx; ``` We can vectorize this loop if `i` is an increasing induction variable. The final reduced value will be the maximum of `i` that the condition `a[i] > b[i]` is satisfied, or the start value `init`. This patch added new RecurKind enums - IFindLastIV and FFindLastIV. --------- Co-authored-by: Alexey Bataev <5361294+alexey-bataev@users.noreply.github.com>	2024-12-12 16:48:31 +08:00
Han-Kuan Chen	51a0c1bf25	[SLP] NFC. Replace TreeEntry::setOperandsInOrder with VLOperands. (#118949 ) To reduce repeated code, TreeEntry::setOperandsInOrder will be replaced by VLOperands. Arg_size will be provided to make sure other operands will not be reorderd when VL[0] is IntrinsicInst (because APO is a boolean value). In addition, BoUpSLP::reorderInputsAccordingToOpcode will also be removed since it is simple.	2024-12-11 10:09:23 +08:00
Alexey Bataev	a42aa8f265	[SLP]Fix adjusting of the mask for the fully matched nodes. When checking for the poison elements in the matches node, need to consider the register number, when clearing the corresponding mask element. Fixes #119393	2024-12-10 09:47:16 -08:00
Han-Kuan Chen	da421f55a7	[SLP] NFC. Make InstructionsState more constant. (#118609 ) Add getMainOp and getAltOp. Use `InstructionsState &` instead of `const InstructionsState &`. Use `!S.isAltShuffle()` instead of `S.MainOp == S.AltOp`.	2024-12-10 23:28:14 +08:00
Han-Kuan Chen	b97c447dac	[SLP] NFC. Add assert for shouldBroadcast and canBeVectorized. (#119327 )	2024-12-10 19:25:19 +08:00
Alexey Bataev	376dad72ab	[SLP]Move resulting vector before inert point, if the late generated buildvector fully matched If the perfect diamond match was detected for the postponed buildvectors and the vector for the previous node comes after the current node, need to move the vector register before the current inserting point to prevent compiler crash. Fixes #119002	2024-12-06 13:54:48 -08:00
Nikita Popov	6307e4b31e	Revert "[SLP] NFC. Replace TreeEntry::setOperandsInOrder with VLOperands. (#113880 )" This reverts commit 94fbe7e3ae7c0ce4e9a7d801e7700457a36f731d. Causes a crash when linking mafft in ReleaseLTO-g config.	2024-12-06 14:27:03 +01:00
Anutosh Bhat	89e919fb0d	Fix warnings while compiling SLPVectorizer.cpp (#118051 ) Towards #118048 I was building llvm (clang and lld) for webassembly and came across these warnings. Not sure if they are seen in our builds too. ``` /Users/anutosh491/work/llvm-project/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:6924:67: warning: comparison of integers of different signs: 'typename iterator_traits<user_iterator_impl<User>>::difference_type' (aka 'long') and 'unsigned int' [-Wsign-compare] 6924 \| if (std::distance(LI->user_begin(), LI->user_end()) != \| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ^ 6925 \| LI->getNumUses()) \| ~~~~~~~~~~~~~~~~ [ 79%] Building CXX object lib/Transforms/Instrumentation/CMakeFiles/LLVMInstrumentation.dir/PGOInstrumentation.cpp.o /Users/anutosh491/work/llvm-project/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:9754:43: warning: comparison of integers of different signs: 'typename iterator_traits<Value const >::difference_type' (aka 'long') and 'unsigned int' [-Wsign-compare] 9754 \| count(Slice, Slice.front()) == \| ~~~~~~~~~~~~~~~~~~~~~~~~~~~ ^ 9755 \| (isa<UndefValue>(Slice.front()) ? VF - 1 : 1)) { ``` This PR tries to address those warnings.	2024-12-06 08:11:39 -05:00
Han-Kuan Chen	94fbe7e3ae	[SLP] NFC. Replace TreeEntry::setOperandsInOrder with VLOperands. (#113880 ) To reduce repeated code, TreeEntry::setOperandsInOrder will be replaced by VLOperands. Arg_size will be provided to make sure other operands will not be reorderd when VL[0] is IntrinsicInst (because APO is a boolean value). In addition, BoUpSLP::reorderInputsAccordingToOpcode will also be removed since it is simple.	2024-12-06 12:03:23 +08:00
Han-Kuan Chen	f71ea4bc1b	[SLP][REVEC] reorderNodeWithReuses should not be called if all users of a TreeEntry are ShuffleVectorInst. (#118260 )	2024-12-03 09:04:04 +08:00
Jonas Paulsson	0ad6be1927	[SLPVectorizer, TargetTransformInfo, SystemZ] Improve SLP getGatherCost(). (#112491 ) As vector element loads are free on SystemZ, this patch improves the cost computation in getGatherCost() to reflect this. getScalarizationOverhead() gets an optional parameter which can hold the actual Values so that they in turn can be passed (by BasicTTIImpl) to getVectorInstrCost(). SystemZTTIImpl::getVectorInstrCost() will now recognize a LoadInst and typically return a 0 cost for it, with some exceptions.	2024-11-29 21:19:45 +01:00
Alexey Bataev	f4974e0931	[SLP] Add a check for poison value in AShrChecker Need to check if the value in AShrChecker is a poison before casting it to instruction to avoid compiler crash Fixes #118030	2024-11-29 06:51:19 -08:00
Han-Kuan Chen	ead3a2f598	[SLP][REVEC] getScalarizationOverhead should not be used when ScalarTy is FixedVectorType. (#117536 )	2024-11-26 22:05:54 +08:00
Alexey Bataev	76f0ff8210	[SLP]Add an extra check to avoid infinite vectorization attempts Added extra check for the cost of the buildvector if the -slp-threshold option is used. Prevents infinite vectorization attempts.	2024-11-25 14:27:44 -08:00
Alexey Bataev	f953b5eb72	[SLP]Relax assertion about subvectors mask size SubVectorsMask might be less than CommonMask, if the vectors with larger number of elements are permuted or reused elements are used. Need to consider this when estimation/building the vector to avoid compiler crash Fixes #117518	2024-11-25 08:31:42 -08:00
Alexey Bataev	57bbdbd7ae	[SLP]Relax assertion in mask combine for non-power-of-2 number of elements The nodes may contain non-power-of-2 number of elements. Need to relax the assertion to avoid possible compiler crash Fixes #117517	2024-11-25 07:58:19 -08:00
Alexey Bataev	7523086a05	[SLP]Use getExtendedReduction cost and fix reduction cost calculations Patch uses getExtendedReduction for reductions of ext-based nodes + adds cost estimation for ctpop-kind reductions into basic implementation and RISCV-V specific vcpop cost estimation. Reviewers: RKSimon, preames Reviewed By: preames Pull Request: https://github.com/llvm/llvm-project/pull/117350	2024-11-22 16:12:53 -05:00
Alexey Bataev	b8703369da	[SLP] Match poison as instruction with the same opcode Patch allows to vector scalar instruction + poison values as if poisons are instructions with the same opcode. It allows better vectorization of the repeated values, reduces number of insertelement instructions and serves as a base ground for copyable elements vectorization AVX512, -O3 + LTO JM/ldecod - better vector code Applications/oggenc - better vectorization CINT2017speed/625.x264_s CINT2017rate/525.x264_r - better vector code CFP2017rate/526.blender_r - better vector code CFP2006/447.dealII - small variations Benchmarks/Bullet - extra vector code CFP2017rate/510.parest_r - better vectorization CINT2017rate/502.gcc_r CINT2017speed/602.gcc_s - extra vector code Benchmarks/tramp3d-v4 - small variations CFP2006/453.povray - extra vector code JM/lencod - better vector code CFP2017rate/511.povray_r - extra vector code MemFunctions/MemFunctions - extra vector code LoopVectorization/LoopVectorizationBenchmarks - extra vector code XRay/FDRMode - extra vector code XRay/ReturnReference - extra vector code LCALS/SubsetCLambdaLoops - extra vector code LCALS/SubsetCRawLoops - extra vector code LCALS/SubsetARawLoops - extra vector code LCALS/SubsetALambdaLoops - extra vector code DOE-ProxyApps-C++/miniFE - extra vector code LoopVectorization/LoopInterleavingBenchmarks - extra vector code LCALS/SubsetBLambdaLoops - extra vector code MicroBenchmarks/harris - extra vector code ImageProcessing/Dither - extra vector code MicroBenchmarks/SLPVectorization - extra vector code ImageProcessing/Blur - extra vector code ImageProcessing/Dilate - extra vector code Builtins/Int128 - extra vector code ImageProcessing/Interpolation - extra vector code ImageProcessing/BilateralFiltering - extra vector code ImageProcessing/AnisotropicDiffusion - extra vector code MicroBenchmarks/LoopInterchange - extra code vectorized LCALS/SubsetBRawLoops - extra code vectorized CINT2006/464.h264ref - extra vectorization with wider vectors CFP2017rate/508.namd_r - small variations, extra phis vectorized CFP2006/444.namd - 2 2 x phi replaced by 4 x phi DOE-ProxyApps-C/SimpleMOC - extra code vectorized CINT2017rate/541.leela_r CINT2017speed/641.leela_s - the function better vectorized and inlined Benchmarks/Misc/oourafft - 2 4 x bit reductions replaced by 2 x vector code FreeBench/fourinarow - better vectorization Reviewers: RKSimon Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/115946	2024-11-22 16:10:17 -05:00
Han-Kuan Chen	39913ae095	[SLP][REVEC] Make reorderTopToBottom support ShuffleVectorInst. (#117310 ) We don't want reorderTopToBottom to reorder ShuffleVectorInst (because ShuffleVectorInst currently supports only a limited set of patterns). Either we make ShuffleVectorInst support more patterns, or we let ReorderIndices reorder the result of the vectorization of ShuffleVectorInst. We choose the latter solution.	2024-11-23 01:20:57 +08:00
Alexey Bataev	14bdcefbd8	[SLP]Model reduction_add(ext(<n x i1>)) as ext(ctpop(bitcast <n x i1> to int n)) Currently sequences reduction_add(ext(<n x i1>)) are modeled as vector extensions + reduction add, but later instcombiner transforms it into ext(ctcpop(bitcast <n x i1> to int n)). Patch adds direct support for this in SLP vectorizer, which enables better cost estimation. AVX512, -O3+LTO CINT2006/445.gobmk - extra vector code Prolangs-C/bison - extra vector code Benchmarks/NPB-serial/is - 16 x + 8 x reductions vectorized as 24 x reduction Reviewers: RKSimon Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/116875	2024-11-22 06:50:25 -08:00
Han-Kuan Chen	68aa6ac58c	[SLP] NFC. Remove redundant computation in getReorderingData. (#117295 )	2024-11-22 18:54:41 +08:00
Han-Kuan Chen	55e9afab6e	[SLP] NFC. Remove the useless check for alternate instruction. (#117293 ) Only BinaryOperator and CastInst support alternate instruction. It always returns false for TreeEntry::isAltShuffle if an instruction is ExtractElementInst, ExtractValueInst, LoadInst, StoreInst or InsertElementInst.	2024-11-22 18:53:40 +08:00
Han-Kuan Chen	6b22e39f26	[SLP] NFC. Remove the useless check for alternate instruction. (#117116 ) Only BinaryOperator and CastInst support alternate instruction. It always returns false for TreeEntry::isAltShuffle if an instruction is ExtractElementInst, ExtractValueInst, LoadInst, StoreInst or InsertElementInst.	2024-11-22 10:39:41 +08:00
Alexey Bataev	68ce528def	[SLP]Fix vector factor calculation for adjusted mask Need to choose max vector factor as max(Mask.size(), prev-val-size). Fixes build erros in https://lab.llvm.org/buildbot/#/builders/95/builds/6504	2024-11-21 14:30:20 -08:00
Alexey Bataev	07507cb591	[SLP]Fix shuffling of entries of the different sizes Need to choose the size of vector factor for mask based on the entries vector factors, not mask size, to generate correct code. Fixes #117170	2024-11-21 13:08:27 -08:00
Alexey Bataev	b62557aaeb	Revert "[SLP]Model reduction_add(ext(<n x i1>)) as ext(ctpop(bitcast <n x i1> to int n))" This reverts commit 0298c5921d3b9fbeb5fefc2555321ea82ade6090 to fix a buildbot crash reported by https://lab.llvm.org/buildbot/#/builders/113/builds/4079.	2024-11-21 12:52:55 -08:00
Finn Plummer	8663b8777e	[NFC][VectorUtils][TargetTransformInfo] Add `isVectorIntrinsicWithOverloadTypeAtArg` api (#114849 ) This changes allows target intrinsics to specify and overwrite overloaded types. - Updates `ReplaceWithVecLib` to not provide TTI as there most probably won't be a use-case - Updates `SLPVectorizer` to use available TTI - Updates `VPTransformState` to pass down TTI - Updates `VPlanRecipe` to use passed-down TTI This change will let us add scalarization for `asdouble`: #114847	2024-11-21 11:04:25 -08:00
Alexey Bataev	0298c5921d	[SLP]Model reduction_add(ext(<n x i1>)) as ext(ctpop(bitcast <n x i1> to int n)) Currently sequences reduction_add(ext(<n x i1>)) are modeled as vector extensions + reduction add, but later instcombiner transforms it into ext(ctcpop(bitcast <n x i1> to int n)). Patch adds direct support for this in SLP vectorizer, which enables better cost estimation. AVX512, -O3+LTO CINT2006/445.gobmk - extra vector code Prolangs-C/bison - extra vector code Benchmarks/NPB-serial/is - 16 x + 8 x reductions vectorized as 24 x reduction Reviewers: RKSimon Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/116875	2024-11-21 13:21:00 -05:00
Han-Kuan Chen	75b8f98ef6	[SLP] NFC. Change the comment to match the code execution. (#116022 ) Make code execute like the comment will modify many tests and affect the performance. As a result, we change the comment instead of the code.	2024-11-21 12:42:20 +08:00
Han-Kuan Chen	a62c5497c9	[SLP][REVEC] The vectorized result for ShuffleVector may not be ShuffleVectorInst. (#116940 )	2024-11-20 23:59:23 +08:00
Alexey Bataev	b17f607703	[SLP][NFC]Remove unnecessary std::optional around Factor value	2024-11-20 05:54:15 -08:00
Alexey Bataev	79682c4d57	[SLP]Check if the buildvector root is not a part of the graph before deletion If the buildvector root has no uses, it might be still needed as a part of the graph, so need to check that it is not a part of the graph before deletion. Fixes #116852	2024-11-19 11:31:40 -08:00
Alexey Bataev	ad9c0b369e	[SLP]Check if the gathered loads form full vector before attempting build it Need to check that the number of gathered loads in the slice forms the build vector to avoid compiler crash. Fixes #116691	2024-11-18 14:09:31 -08:00
Alexey Bataev	f6e1d64458	[SLP]Enable interleaved stores support Enables interaleaved stores, results in better estimation for segmented stores for RISC-V Reviewers: preames, topperc, RKSimon Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/115354	2024-11-15 11:01:57 -05:00
Alexey Bataev	af3295bd3d	[SLP]Enable splat ordering for loads Enables splat support for loads with lanes> 2 or number of operands> 2. Allows better detect splats of loads and reduces number of shuffles in some cases. X86, AVX512, -O3+LTO Metric: size..text results results0 diff test-suite :: External/SPEC/CFP2006/433.milc/433.milc.test 154867.00 156723.00 1.2% test-suite :: External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 12467735.00 12468023.00 0.0% Better vectorization quality Reviewers: RKSimon Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/115173	2024-11-15 10:29:43 -05:00
Sushant Gokhale	9991ea28fc	[CostModel][AArch64] Make extractelement, with fmul user, free whenev… (#111479 ) …er possible In case of Neon, if there exists extractelement from lane != 0 such that 1. extractelement does not necessitate a move from vector_reg -> GPR 2. extractelement result feeds into fmul 3. Other operand of fmul is a scalar or extractelement from lane 0 or lane equivalent to 0 then the extractelement can be merged with fmul in the backend and it incurs no cost. e.g. ``` define double @foo(<2 x double> %a) { %1 = extractelement <2 x double> %a, i32 0 %2 = extractelement <2 x double> %a, i32 1 %res = fmul double %1, %2 ret double %res } ``` `%2` and `%res` can be merged in the backend to generate: `fmul d0, d0, v0.d[1]` The change was tested with SPEC FP(C/C++) on Neoverse-v2. Compile time impact: None Performance impact: Observing 1.3-1.7% uplift on lbm benchmark with -flto depending upon the config.	2024-11-13 11:10:49 +05:30
Han-Kuan Chen	5a5502b9e1	[SLP] NFC. Use Value instead of template. (#115440 )	2024-11-13 11:58:19 +08:00
Alexey Bataev	058ac837bc	[SLP]Use generic createShuffle for buildvector Use generic createShuffle function, which know how to adjust the vectors correctly, to avoid compiler crash when trying to build a buildvector as a shuffle Fixes #115732	2024-11-11 10:49:39 -08:00
Han-Kuan Chen	3cdd86bb47	[SLP][REVEC] Make GetMinMaxCost support FixedVectorType when REVEC is enabled. (#115417 )	2024-11-10 13:53:15 +08:00
Alexey Bataev	26a9f3f590	[SLP][NFC]Cleanup getSameOpcode, return InstructionsState::invalid() for non-valid inputs Just a cleanup and related changes	2024-11-08 14:00:32 -08:00
Kazu Hirata	bc7e5c2016	[SLP] Avoid repeated hash lookups (NFC) (#115428 )	2024-11-08 07:35:06 -08:00
Alexey Bataev	77bec78878	[SLP]Do not look for last instruction in schedule block for buildvectors If looking for the insertion point for the node and the node is a buildvector node, the compiler should not use scheduling info for such nodes, they may contain only partial info, which is not fully correct and may cause compiler crash. Fixes #114082	2024-11-08 06:55:29 -08:00
Alexey Bataev	62db1c8a07	[SLP]Better decision making on whether to try stores packs for vectorization Since the stores are sorted by distance, comparing the indices in the original array and early exit, if the index is less than the index of the last store, not always the best strategy. Better to remove such stores explicitly to try better to check for the vectorization opportunity. Fixes #115008	2024-11-07 14:23:15 -08:00
Alexey Bataev	b7a8f5f4c9	[SLP][NFC]Exit early from attempt-to-reorder, if it is useless Adds early exits, which just save compile time. It can exit earl, if the total number of scalars is 2, or all scalars are constant, or the opcode is the same and not alternate. In this case reordering will not happen and compiler can exit early to save compile time	2024-11-07 11:07:49 -08:00
Kazu Hirata	22b4b1ab10	Revert "[SLP][REVEC] Make GetMinMaxCost support FixedVectorType when REVEC is enabled. (#114946 )" This reverts commit f58757b8dc167809b69ec00f9b5ab59281df0902. Failing buildbots: https://lab.llvm.org/buildbot/#/builders/174/builds/8058 https://lab.llvm.org/buildbot/#/builders/127/builds/1357	2024-11-07 10:43:11 -08:00
Han-Kuan Chen	f58757b8dc	[SLP][REVEC] Make GetMinMaxCost support FixedVectorType when REVEC is enabled. (#114946 )	2024-11-08 00:52:59 +08:00
Han-Kuan Chen	c6091cdbed	[SLP][REVEC] Make shufflevector can be vectorized with ReorderIndices and ReuseShuffleIndices. (#114965 )	2024-11-07 11:04:34 +08:00
Alexey Bataev	9f3b6adb15	[SLP][NFC]Exit early if the graph is empty, NFC No need to check anything if the graph is empty, just exit early.	2024-11-06 08:33:14 -08:00
Alexey Bataev	76422385c3	[SLP]Support reordered buildvector nodes for better clustering Patch adds reordering of the buildvector nodes for better clustering of the compatible operations and future vectorization. Includes basic cost estimation and if the transformation is not profitable - reverts it. AVX512, -O3+LTO Metric: size..text Program size..text results results0 diff test-suite :: External/SPEC/CINT2006/401.bzip2/401.bzip2.test 74565.00 75701.00 1.5% test-suite :: External/SPEC/CINT2017rate/541.leela_r/541.leela_r.test 75773.00 76397.00 0.8% test-suite :: External/SPEC/CINT2017speed/641.leela_s/641.leela_s.test 75773.00 76397.00 0.8% test-suite :: External/SPEC/CFP2017rate/510.parest_r/510.parest_r.test 2014462.00 2024494.00 0.5% test-suite :: MultiSource/Applications/JM/ldecod/ldecod.test 395219.00 396979.00 0.4% test-suite :: MultiSource/Applications/JM/lencod/lencod.test 857795.00 859667.00 0.2% test-suite :: External/SPEC/CINT2006/464.h264ref/464.h264ref.test 800472.00 802440.00 0.2% test-suite :: External/SPEC/CFP2006/447.dealII/447.dealII.test 590699.00 591403.00 0.1% test-suite :: MultiSource/Benchmarks/MiBench/consumer-lame/consumer-lame.test 203006.00 203102.00 0.0% test-suite :: MultiSource/Benchmarks/DOE-ProxyApps-C/miniGMG/miniGMG.test 42408.00 42424.00 0.0% test-suite :: External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 12451575.00 12451927.00 0.0% test-suite :: External/SPEC/CFP2017speed/638.imagick_s/638.imagick_s.test 1396480.00 1396448.00 -0.0% test-suite :: External/SPEC/CFP2017rate/538.imagick_r/538.imagick_r.test 1396480.00 1396448.00 -0.0% test-suite :: MultiSource/Benchmarks/7zip/7zip-benchmark.test 1047708.00 1047580.00 -0.0% test-suite :: MultiSource/Benchmarks/MiBench/consumer-jpeg/consumer-jpeg.test 111344.00 111328.00 -0.0% test-suite :: External/SPEC/CINT2006/400.perlbench/400.perlbench.test 1087660.00 1087500.00 -0.0% test-suite :: MultiSource/Benchmarks/Prolangs-C/TimberWolfMC/timberwolfmc.test 280664.00 280616.00 -0.0% test-suite :: MultiSource/Applications/sqlite3/sqlite3.test 502646.00 502006.00 -0.1% test-suite :: MultiSource/Benchmarks/tramp3d-v4/tramp3d-v4.test 1033135.00 1031567.00 -0.2% test-suite :: External/SPEC/CINT2017rate/500.perlbench_r/500.perlbench_r.test 2070917.00 2065845.00 -0.2% test-suite :: External/SPEC/CINT2017speed/600.perlbench_s/600.perlbench_s.test 2070917.00 2065845.00 -0.2% test-suite :: External/SPEC/CINT2006/473.astar/473.astar.test 33893.00 33797.00 -0.3% test-suite :: MultiSource/Benchmarks/MiBench/telecomm-gsm/telecomm-gsm.test 39677.00 39549.00 -0.3% test-suite :: MultiSource/Benchmarks/mediabench/gsm/toast/toast.test 39674.00 39546.00 -0.3% test-suite :: MultiSource/Benchmarks/MiBench/security-blowfish/security-blowfish.test 11560.00 11512.00 -0.4% test-suite :: External/SPEC/CINT2017speed/625.x264_s/625.x264_s.test 653867.00 649275.00 -0.7% test-suite :: External/SPEC/CINT2017rate/525.x264_r/525.x264_r.test 653867.00 649275.00 -0.7% CINT2006/401.bzip2 - extra code vectorized CINT2017rate/541.leela_r CINT2017speed/641.leela_s - function _ZN9FastBoard25get_pattern3_augment_specEiib not inlined anymore, better vectorization CFP2017rate/510.parest_r - better vectorization JM/ldecod - better vectorization JM/lencod - same CINT2006/464.h264ref - extra code vectorized CFP2006/447.dealII - extra vector code MiBench/consumer-lame - vectorized 2 loops previously scalar DOE-ProxyApps-C/miniGMG - small changes Benchmarks/7zip - extra code vectorized, better vectorization CFP2017rate/526.blender_r - extra vectorization CFP2017speed/638.imagick_s CFP2017rate/538.imagick_r - extra vectorization MiBench/consumer-jpeg - extra vectorization CINT2006/400.perlbench - extra vectorization Prolangs-C/TimberWolfMC - small variations Applications/sqlite3 - extra function vectorized and inlined Benchmarks/tramp3d-v4 - extra code vectorized CINT2017rate/500.perlbench_r CINT2017speed/600.perlbench_s - extra code vectorized, function digcpy gets vectorized and inlined CINT2006/473.astar - extra code vectorized MiBench/telecomm-gsm - extra code vectorized, better vector code mediabench/gsm - same MiBench/security-blowfish - extra code vectorized CINT2017speed/625.x264_s CINT2017rate/525.x264_r - sub4x4_dct function vectorized and gets inlined RISCV-V, SiFive-p670, O3+LTO CFP2017rate/510.parest_r - extra vectorization CFP2017rate/526.blender_r - extra vectorization MiBench/consumer-lame - extra vectorized code Reviewers: RKSimon Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/114284	2024-11-06 10:51:15 -05:00

1 2 3 4 5 ...

2025 Commits