llvm-project

Author	SHA1	Message	Date
Alexey Bataev	2b00a73f62	[SLP]Buildvector for alternate instructions with non-profitable gather operands. If the operands of the potentially alternate node are going to produce buildvector sequences, which result in more instructions, than the original code, then suhinstructions should be vectorized as alternate node, better to end up with the buildvector node. Left column - experimental, Right - reference. Metric: size..text Program size..text results results0 diff test-suite :: SingleSource/Benchmarks/Adobe-C++/loop_unroll.test 413680.00 416272.00 0.6% test-suite :: External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 12351788.00 12354844.00 0.0% test-suite :: External/SPEC/CINT2017speed/625.x264_s/625.x264_s.test 664901.00 664949.00 0.0% test-suite :: External/SPEC/CINT2017rate/525.x264_r/525.x264_r.test 664901.00 664949.00 0.0% test-suite :: External/SPEC/CFP2017rate/511.povray_r/511.povray_r.test 1171371.00 1171355.00 -0.0% test-suite :: MultiSource/Benchmarks/7zip/7zip-benchmark.test 1036396.00 1036284.00 -0.0% test-suite :: MultiSource/Benchmarks/MiBench/consumer-jpeg/consumer-jpeg.test 111280.00 111248.00 -0.0% test-suite :: External/SPEC/CFP2017rate/538.imagick_r/538.imagick_r.test 1392113.00 1391361.00 -0.1% test-suite :: External/SPEC/CFP2017speed/638.imagick_s/638.imagick_s.test 1392113.00 1391361.00 -0.1% test-suite :: MultiSource/Benchmarks/Prolangs-C/TimberWolfMC/timberwolfmc.test 281676.00 281452.00 -0.1% test-suite :: MultiSource/Benchmarks/VersaBench/ecbdes/ecbdes.test 3025.00 3019.00 -0.2% test-suite :: MultiSource/Benchmarks/Prolangs-C/plot2fig/plot2fig.test 6351.00 6335.00 -0.3% Metric: SLP.NumVectorInstructions Program SLP.NumVectorInstructions results results0 diff test-suite :: MultiSource/Benchmarks/VersaBench/ecbdes/ecbdes.test 15.00 16.00 6.7% test-suite :: External/SPEC/CINT2017rate/525.x264_r/525.x264_r.test 1703.00 1707.00 0.2% test-suite :: External/SPEC/CINT2017speed/625.x264_s/625.x264_s.test 1703.00 1707.00 0.2% test-suite :: External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 26241.00 26239.00 -0.0% test-suite :: External/SPEC/CFP2017rate/510.parest_r/510.parest_r.test 11761.00 11754.00 -0.1% test-suite :: MultiSource/Benchmarks/Prolangs-C/TimberWolfMC/timberwolfmc.test 824.00 822.00 -0.2% test-suite :: External/SPEC/CFP2017speed/638.imagick_s/638.imagick_s.test 5668.00 5654.00 -0.2% test-suite :: External/SPEC/CFP2017rate/538.imagick_r/538.imagick_r.test 5668.00 5654.00 -0.2% test-suite :: External/SPEC/CINT2017rate/502.gcc_r/502.gcc_r.test 792.00 790.00 -0.3% test-suite :: External/SPEC/CINT2017speed/602.gcc_s/602.gcc_s.test 792.00 790.00 -0.3% test-suite :: MultiSource/Benchmarks/FreeBench/pifft/pifft.test 1389.00 1384.00 -0.4% test-suite :: MultiSource/Benchmarks/7zip/7zip-benchmark.test 596.00 590.00 -1.0% test-suite :: MultiSource/Benchmarks/Prolangs-C/plot2fig/plot2fig.test 6.00 5.00 -16.7% Metric: exec_time Program exec_time results results0 diff test-suite :: External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 99.14 100.00 0.9% Other changes are not significant (less than 0.1% percent with exectime less 5 secs). SingleSource/Benchmarks/Adobe-C++/loop_unroll - same small patterns remain scalar, smaller code. External/SPEC/CFP2017rate/526.blender_r/526.blender_r - many small changes, some extra stores gets vectorized. External/SPEC/CINT2017speed/625.x264_s/625.x264_s External/SPEC/CINT2017rate/525.x264_r/525.x264_r x264 has one change in a loop body, in function ssim_end4, some code remain scalar, resulting in less code size. External/SPEC/CFP2017rate/511.povray_r/511.povray_r - some extra code gets vectorized, looks like some other patterns were matched. MultiSource/Benchmarks/7zip/7zip-benchmark - extra stores were vectorized (looks like the graphs become profitable) MultiSource/Benchmarks/MiBench/consumer-jpeg/consumer-jpeg - small changes in vectorized code (some small part remain scalar). External/SPEC/CFP2017rate/538.imagick_r/538.imagick_r External/SPEC/CFP2017speed/638.imagick_s/638.imagick_s Many changes cause by the fact that the code of one function becomes smaller (onvertLCHabToRGB) and this functions gets inlined after that. MultiSource/Benchmarks/Prolangs-C/TimberWolfMC/timberwolfmc - some small changes here and there, some extra code is vectorized, some remain scalar (2 x vectors) MultiSource/Benchmarks/VersaBench/ecbdes/ecbdes - emits 2 scalars + 2 insertelems instead of insert, broadcast, alt code (3 instructions, total 5 insts) MultiSource/Benchmarks/Prolangs-C/plot2fig/plot2fig - small graph becomes profitable and gets vectorized. External/SPEC/CINT2017rate/502.gcc_r/502.gcc_r External/SPEC/CINT2017speed/602.gcc_s/602.gcc_s Some small graph becomes profitable and gets vectorized. MultiSource/Benchmarks/FreeBench/pifft/pifft - no changes in final code. Reviewers: RKSimon, dtcxzyw Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/84978	2024-04-10 14:33:56 -04:00
Alexey Bataev	6ca5a410d2	[SLP]Fix PR87358: broken module, Instruction does not dominate all uses. If the first node is a gather node with extractelement instructions, still need to put the vector value after all instructions, not after the very first one.	2024-04-10 08:24:15 -07:00
Alexey Bataev	938a73422e	[SLP][NFC]Walk over entries, not single values. Better to walk over SLP nodes rather than single values. Matching a value to a node is not a 1-to-1 relation, one value may be part of several nodes and compiler may get wrong node, when trying to map it. Currently there are no such issues detected, but they may appear in future.	2024-04-10 06:03:26 -07:00
Florian Hahn	a8ec1eb843	[VPlan] Dont assign slots to VPValues with an underlying value. This makes sure the numbering for VPValues without underlying values is consecutive.	2024-04-09 21:30:51 +01:00
Alexey Bataev	910d2de357	[SLP]Fix PR88103: consider the sign of the compare for non-negative operands. Need to improve detection of number of bits, required for the operand, before doing a reduction. If the instruction is incoming operand of the signed compare, need to consider adding an extra bit for signedness.	2024-04-09 10:47:47 -07:00
Alexey Bataev	e8e67957fa	[SLP]Fix PR88123: use vectorized operands consistently. Need to use vectorized operands, not the vecop of the extractelement instructions, to avoid false detection of the extra vector operand in the extractelements shuffling.	2024-04-09 08:42:57 -07:00
David Green	4ac2721e51	[AArch64] Add costs for ST3 and ST4 instructions, modelled as store(shuffle). (#87934 ) This tries to add some costs for the shuffle in a ST3/ST4 instruction, which are represented in LLVM IR as store(interleaving shuffle). In order to detect the store, it needs to add a CxtI context instruction to check the users of the shuffle. LD3 and LD4 are added, LD2 should be a zip1 shuffle, which will be added in another patch. It should help fix some of the regressions from #87510.	2024-04-09 16:36:08 +01:00
Florian Hahn	c836983671	[VPlan] Remove unused first mask op from VPBlendRecipe. (#87770 ) VPBlendRecipe does not use the first mask operand. Removing it allows VPlan-based DCE to remove unused mask computations. This also fixes #87410, where unused Not VPInstructions are considered having only their first lane demanded, but some of their operands providing a vector value due to other users. Fixes https://github.com/llvm/llvm-project/issues/87410 PR: https://github.com/llvm/llvm-project/pull/87770	2024-04-09 11:14:05 +01:00
Florian Hahn	9430a4b9d2	[VPlan] Use getEdgeMask when constructing VPBlendRecipe (NFCI). After 2d0d65b3babe, block-in and edge masks are create up-front. Only retrieve the cached edge-mask here.	2024-04-09 09:32:40 +01:00
Alexey Bataev	01d9528ef9	[SLP]Improve final minbitwidth analysis attempt. Added part for demanded bits analysis in the IsPotentiallyTruncated to improve minbitwidth analysis final attempts. Metric: size..text Program size..text results results0 diff test-suite :: MultiSource/Benchmarks/MiBench/telecomm-gsm/telecomm-gsm.test 43069.00 42973.00 -0.2% test-suite :: MultiSource/Benchmarks/mediabench/gsm/toast/toast.test 43066.00 42970.00 -0.2% Extra trunc instructions are emitted to operate with <32 x i8> instead of <32 x i16>, will be removed in the next patches. Reviewers: RKSimon Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/87786	2024-04-08 15:54:30 -04:00
Alexey Bataev	78c50bbd45	[SLP][NFC]Remove unused variable, NFC.	2024-04-08 09:16:44 -07:00
Alexey Bataev	4a1c53f9fa	[SLP]Improve minbitwidth analysis for abs/smin/smax/umin/umax intrinsics. https://alive2.llvm.org/ce/z/ivPZ26 for the abs transformations. Reviewers: RKSimon Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/86135	2024-04-08 08:32:35 -07:00
Florian Hahn	15d11a4de9	[VPlan] Track IsOrdered in VPReductionRecipe, remove use of ILV (NFCI). Instead of using ILV.useOrderedReductions during ::execute, instead store the information at recipe construction. Another step towards making recipe'::execute independent of legacy ILV.	2024-04-07 20:33:22 +01:00
Alexey Bataev	a612524197	[SLP]Fix the cost of the reduction result to the final type. Need to fix the way the cost is calculated, otherwise wrong cast opcode can be selected and lead to the over-optimistic vector cost. Plus, need to take into account reduction type size. Reviewers: RKSimon Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/87528	2024-04-07 09:51:47 -04:00
David Green	869797daca	[VectorCombine] Add a debug message for foldShuffleOfCastop. NFC This optimization, much like the existing foldShuffleOfBinops can cause a lot of regressions. Add a quick debug message to make the costs are more obvious.	2024-04-07 07:54:22 +01:00
Martin Storsjö	bd9486b4ec	Revert "[SLP]Improve minbitwidth analysis for abs/smin/smax/umin/umax intrinsics." This reverts commit 66b528078e4852412769375e35d2a672bf36a0ec. This commit caused miscompilations, breaking tests in the libyuv testsuite - see https://github.com/llvm/llvm-project/pull/86135#issuecomment-2041049709 for more details.	2024-04-06 23:53:26 +03:00
Alexey Bataev	66b528078e	[SLP]Improve minbitwidth analysis for abs/smin/smax/umin/umax intrinsics. https://alive2.llvm.org/ce/z/ivPZ26 for the abs transformations. Reviewers: RKSimon Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/86135	2024-04-05 14:29:26 -04:00
Florian Hahn	c6e38b928c	Reapply "[LV] Improve AnyOf reduction codegen. (#78304 )" This reverts the revert commit 589c7abb03448. This patch includes a fix for any-of reductions and epilogue vectorization. Extra test coverage for the issue that caused the revert has been added in 399ff08e29d. -------------------------------- Original commit message: Update AnyOf reduction code generation to only keep track of the AnyOf property in a boolean vector in the loop, only selecting either the new or start value in the middle block. The patch incorporates feedback from https://reviews.llvm.org/D153697. This fixes the #62565, as now there aren't multiple uses of the start/new values. Fixes https://github.com/llvm/llvm-project/issues/62565 PR: https://github.com/llvm/llvm-project/pull/78304	2024-04-05 13:45:13 +01:00
David Green	31fd6b8eec	[SLP] Protect against scalable vector users. We started seeing a crash after 8a0bfe490592de3df28d82c5dd69956e43c20f1d that the user could be scalable, meaning the typesize is scalable and an implicit convertion to uint64_t could be performed. Protect against that by making sure the users type is not scalable.	2024-04-05 11:30:14 +01:00
Alexey Bataev	413a66f339	[LV, VP]VP intrinsics support for the Loop Vectorizer + adding new tail-folding mode using EVL. (#76172 ) This patch introduces generating VP intrinsics in the Loop Vectorizer. Currently the Loop Vectorizer supports vector predication in a very limited capacity via tail-folding and masked load/store/gather/scatter intrinsics. However, this does not let architectures with active vector length predication support take advantage of their capabilities. Architectures with general masked predication support also can only take advantage of predication on memory operations. By having a way for the Loop Vectorizer to generate Vector Predication intrinsics, which (will) provide a target-independent way to model predicated vector instructions. These architectures can make better use of their predication capabilities. Our first approach (implemented in this patch) builds on top of the existing tail-folding mechanism in the LV (just adds a new tail-folding mode using EVL), but instead of generating masked intrinsics for memory operations it generates VP intrinsics for loads/stores instructions. The patch adds a new VPlanTransforms to replace the wide header predicate compare with EVL and updates codegen for load/stores to use VP store/load with EVL. Other important part of this approach is how the Explicit Vector Length is computed. (VP intrinsics define this vector length parameter as Explicit Vector Length (EVL)). We use an experimental intrinsic `get_vector_length`, that can be lowered to architecture specific instruction(s) to compute EVL. Also, added a new recipe to emit instructions for computing EVL. Using VPlan in this way will eventually help build and compare VPlans corresponding to different strategies and alternatives. Differential Revision: https://reviews.llvm.org/D99750	2024-04-04 18:30:17 -04:00
Alexey Bataev	8a0bfe4905	[SLP]Fix PR87630: wrong result for externally used vector value. Need to check that the externally used value can be represented with the BitWidth before applying it, otherwise need to keep wider type.	2024-04-04 12:03:28 -07:00
Simon Pilgrim	d54d476300	[SLP] Fix Wunused-variable warning. NFC.	2024-04-04 12:26:34 +01:00
Florian Hahn	7bd163d0a4	[VPlan] Clean up dead recipes after UF & VF specific simplification. Recursively remove dead recipes after simplifying vector loop exit branch.	2024-04-04 12:05:08 +01:00
Simon Pilgrim	212b2bbcd1	[VectorCombine][X86] foldShuffleOfCastops - fold shuffle(cast(x),cast(y)) -> cast(shuffle(x,y)) iff cost efficient (#87510 ) Based off the existing foldShuffleOfBinops fold Fixes #67803	2024-04-04 11:22:37 +01:00
Alexey Bataev	42cbceb0f0	[SLP]Improve minbitwidth analysis for operands of IToFP and ICmp instructions. Compiler can improve analysis for operands of UIToFP/SIToFP instructions and operands of ICmp instruction. Reviewers: RKSimon Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/85966	2024-04-03 14:18:45 -07:00
Alexey Bataev	fa2bbea14d	Revert "[SLP]Improve minbitwidth analysis for operands of IToFP and ICmp instructions." This reverts commit 899855d2b11856a44e530fffe854d76be69b9008 to fix the issue reported in https://lab.llvm.org/buildbot/#/builders/165/builds/51659.	2024-04-03 13:10:16 -07:00
Alexey Bataev	899855d2b1	[SLP]Improve minbitwidth analysis for operands of IToFP and ICmp instructions. Compiler can improve analysis for operands of UIToFP/SIToFP instructions and operands of ICmp instruction. Reviewers: RKSimon Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/85966	2024-04-03 15:58:58 -04:00
Alexey Bataev	d57884011e	[SLP]Add support for commutative intrinsics. Implemented long-standing TODO to support commutative intrinsics. Reviewers: RKSimon Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/86316	2024-04-03 14:28:36 -04:00
Alexey Bataev	cd29126b63	[SLP]Fix PR87133: crash because of different altopcodes for cmps after reordering. If the node has cmp instruction with 3 or more different but swappable predicates, need to keep same kind of main/alternate opcodes to avoid incorrect detection of opcodes after reordering. Reordering changes the order and we may erroneously consider swappable opcodes as non-compatible/alternate, which may lead to a later compiler crash. Reviewers: RKSimon Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/87267	2024-04-03 13:47:50 -04:00
Alexey Bataev	07a566793b	[SLP]Fix PR87477: fix alternate node cast cost/codegen. Have to compare actual type size to pick up proper cast operation opcode.	2024-04-03 10:00:03 -07:00
Alexey Bataev	250b467f7c	[SLP][NFC]Simplify common analysis of instructions in BoUpSLP::collectValuesToDemote by outlining common code, NFC.	2024-04-03 06:45:42 -07:00
Florian Hahn	e329b68413	[VPlan] Factor out logic to check if recipe is dead (NFCI). In preparation to use the helper in more places.	2024-04-03 14:22:41 +01:00
Han-Kuan Chen	bf1df25048	[SLP] Use isValidElementType instead of (#87469 ) FixedVectorType::isValidElementType for consistency.	2024-04-03 17:57:46 +08:00
Florian Hahn	e5abd963c7	[VPlan] Remove VPTransformState::addMetadata with ArrayRef arg (NFCI). addMeadata is only over called with a single element, clean up the variant that takes multiple values.	2024-04-03 09:43:12 +01:00
Florian Hahn	6261c53c6f	[VPlan] Make sure OR VPInstructions are treated as disjoint ops. Make sure that VPInstructions with OR opcodes are properly registered as disjoint ops. Fixes https://github.com/llvm/llvm-project/issues/87378.	2024-04-02 21:48:51 +01:00
Alexey Bataev	d595080b48	[SLP]Fix PR87384: check for fixed vector type before using. If we have mixed extractelement instructions, fixed and scalable ones, need to check that compiler tries to estimate the cost for fixed vector extractelement, not the scalable one, to avoid compiler crash.	2024-04-02 11:38:26 -07:00
Alexey Bataev	9cb7dffa88	[SLP]Fix PR80027: handle case when ext is not reduced but its operand is. Need to handle the case, where the resize operation itself is not reduced but its operand is. In this case need to take an extra analysis for the operand, not the instruction itself.	2024-04-02 09:32:25 -07:00
Alexey Bataev	6b7b18a1a7	[SLP]Fix PR87329: crash on alternate cast vectorization. Need to fix the analysis for the alternate instructions, based on int extension operations. If the alternate extension node is resized, but not the operand, need to resize the node and do not shuffle final result, we end up only with trunc instruction.	2024-04-02 08:19:29 -07:00
Alexey Bataev	cb9cf331fa	[SLP][NFC]Do not lookup in MinBWs, reuse previously used iterator.	2024-04-02 05:53:34 -07:00
Simon Pilgrim	1d06f41b72	[VectorCombine] foldBitcastShuffle - peek through any residual bitcasts before creating a new bitcast on top (#86119 ) Encountered while working on #67803, wading through the chains of bitcasts that SSE intrinsics introduces - this patch helps prevents cases where the bitcast chains aren't cleared out and we can't perform further combines until after InstCombine/InstSimplify has run.	2024-04-02 10:58:45 +01:00
Florian Hahn	16da9d5351	[VPlan] Remove redundant set of debug loc in VPInstruction (NFCI). Consistently use setDebugLocFrom and remove redundant setDebugLocFrom.	2024-04-02 10:43:34 +01:00
Alexey Bataev	41afef9066	[SLP]Fix PR87011: Missing sign extension of demoted type before zero extension Need to drop skipping of the first zext/sext nodes, it leads to incorrect and less profitable code.	2024-04-01 06:07:18 -07:00
Florian Hahn	e701c1a653	[VPlan] Use recipe's debug loc for VPWidenMemoryInstructionRecipe (NFCI) Now that VPRecipeBase manages debug locations for recipes, use it in VPWidenMemoryInstructionRecipe.	2024-04-01 12:07:30 +01:00
Florian Hahn	a34834138a	[VPlan] Inline addVPValue into single caller (NFCI). Inline the function into its single caller.	2024-04-01 11:12:35 +01:00
Jakub Kuderski	2b0ab05c4a	[SLP][NFC] Simplify type checks with isa predicates (#87182 ) For more context on isa predicates, see: https://github.com/llvm/llvm-project/pull/83753.	2024-03-31 14:55:11 -04:00
Florian Hahn	8d9cb6b016	[VPlan] Inline getVPValue in only caller (NFCI).	2024-03-30 20:38:40 +00:00
Alexey Bataev	01e02e0b6a	[SLP]Fix PR87011: Do not assume that initial ext/trunc nodes can be represented by bitwidth without analysis. Need to check that initial ext/trunc nodes can be safely represented using calculated bitwidth before applying it.	2024-03-28 18:02:26 -07:00
Florian Hahn	8a614c1d31	[VPlan] Rename getVPValueOrAddLiveIn -> getOrAddLiveIn (NFCI). The helper now only deals with live-ins, clarify the name.	2024-03-28 21:02:15 +00:00
Alexey Bataev	70cf2a09ce	[SLP][NFC]Simplify function/constructors by removing unnecessary params.	2024-03-28 13:34:59 -07:00
Alexey Bataev	d7975c9d93	[SLP]Add better minbitwidth analysis for udiv/urem instructions. Adds improved bitwidth analysis for udiv/urem instructions. The analysis is based on similar version in InstCombiner. Reviewers: RKSimon Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/85928	2024-03-28 10:35:15 -04:00

1 2 3 4 5 ...

4391 Commits