llvm-project

Author	SHA1	Message	Date
Alexey Bataev	51aac5b043	[SLP][NFCI]Improve compile time for phis with large number of incoming values. Added a limit of 128 incoming values at max for PHIs nodes to be vectorized plus improved performance by using logarithmic search instead of linear if the number of incoming values is > 4.	2024-04-30 14:42:49 -07:00
Florian Hahn	9c3f5fe88f	[LV] Don't consider the latch block as ScalarPredicatedBB. The conditional branch from the loop latch will be replaced by a single branch controlling the loop, so there is no extra overhead from scalarization. This improves the cost esimates in some cases.	2024-04-29 19:15:46 +01:00
Alexey Bataev	37ae4ad0ee	[SLP]Support minbitwidth analisys for buildvector nodes. Metric: size..text Program size..text exp ref diff test-suite :: MultiSource/Benchmarks/mediabench/gsm/toast/toast.test 42906.00 42986.00 0.2% test-suite :: MultiSource/Benchmarks/MiBench/telecomm-gsm/telecomm-gsm.test 42909.00 42989.00 0.2% test-suite :: External/SPEC/CINT2017rate/525.x264_r/525.x264_r.test 664581.00 664661.00 0.0% test-suite :: External/SPEC/CINT2017speed/625.x264_s/625.x264_s.test 664581.00 664661.00 0.0% Less is better. Replaces `buildvector <p x in> + trunc <p x in> to <p x im>` sequences to `buildvector <p x im> of { trunc in to im }` scalars, which is free in most cases, results in better code. Reviewers: RKSimon Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/88504	2024-04-29 09:57:37 -04:00
Alexey Bataev	040b5a1255	[SLP]Fix PR90211: vectorized node must match completely to be reused. If the gather node matches the vectorized node, it must also match with the scalars completely. Otherwise, need to revectorize the gather node to generate correct code.	2024-04-29 06:51:11 -07:00
Maciej Gabka	bfc0317153	Move several vector intrinsics out of experimental namespace (#88748 ) This patch is moving out following intrinsics: * vector.interleave2/deinterleave2 * vector.reverse * vector.splice from the experimental namespace. All these intrinsics exist in LLVM for more than a year now, and are widely used, so should not be considered as experimental.	2024-04-29 10:16:45 +01:00
Florian Hahn	aafed3408e	[VPlan] Make createScalarIVSteps return VPScalarIVStepsRecipe (NFC). This avoids the need for using getVPSingleValue/getDefiningRecipe at the place the return value is used.	2024-04-28 21:56:55 +01:00
Florian Hahn	b6a8f5486b	[LV] Consider all exit branch conditions uniform. If we vectorize a loop with multiple exits, all exiting branches should be considered uniform, as the resulting loop will be controlled by the canonical IV only. Previously we were overestimating the cost of values contributing to the other exits.	2024-04-28 13:15:55 +01:00
Florian Hahn	9ee8e38cdc	[VPlan] Also propagate versioned strides to users via sext/zext. The versioned value may not be used in the loop directly but through a sext/zext. Add new live-ins in those cases.	2024-04-26 21:29:43 +01:00
Alexey Bataev	79314c64d0	[SLP]Fix PR90224: check that users of gep are all vectorized. Before deleting extractelement instruction for vectorized GEP with external users, need to check that all users vectorized before deleting this extractelement.	2024-04-26 11:49:12 -07:00
Alexey Bataev	d74e42acd2	[SLP]Attempt to vectorize long stores, if short one failed. We can try to vectorize long store sequences, if short ones were unsuccessful because of the non-profitable vectorization. It should not increase compile time significantly (stores are sorted already, complexity is n x log n), but vectorize extra code. Metric: size..text Program size..text results results0 diff test-suite :: External/SPEC/CINT2006/400.perlbench/400.perlbench.test 1088012.00 1088236.00 0.0% test-suite :: SingleSource/UnitTests/matrix-types-spec.test 480396.00 480476.00 0.0% test-suite :: External/SPEC/CINT2017rate/525.x264_r/525.x264_r.test 664613.00 664661.00 0.0% test-suite :: External/SPEC/CINT2017speed/625.x264_s/625.x264_s.test 664613.00 664661.00 0.0% test-suite :: External/SPEC/CFP2017rate/510.parest_r/510.parest_r.test 2041105.00 2040961.00 -0.0% test-suite :: MultiSource/Applications/JM/lencod/lencod.test 836563.00 836387.00 -0.0% test-suite :: MultiSource/Benchmarks/7zip/7zip-benchmark.test 1035100.00 1032140.00 -0.3% In all benchmarks extra code gets vectorized Reviewers: RKSimon Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/88563	2024-04-26 06:53:44 -07:00
Troy Butler	468fecfc39	Fix mismatches between function parameter definitions and declarations (#89512 ) Addresses issue #88716. Some function parameter names in the affected header files did not match the parameter names in the definitions, or were listed in a different order. --------- Signed-off-by: Troy-Butler <squintik@outlook.com>	2024-04-26 13:00:31 +02:00
Alexey Bataev	f758bb66e8	[SLP]Fix PR89988: do extra analysis of the icmp args to correctly handle signed/unsigned comparison. If operands of icmp has different signedness, need to consider extending unsigned operands to correctly handle comparison with the signed operands.	2024-04-25 16:10:24 -07:00
Simon Pilgrim	282b56f43d	[VectorCombine] foldShuffleOfBinops - add support for length changing shuffles (#88899 ) Refactor to be closer to foldShuffleOfCastops - sibling patch to #88743 that can be used to address some of the issues identified in #88693	2024-04-24 10:18:49 +01:00
Patrick O'Neill	adb0126ef1	[VPlan] Add scalar inferencing support for Not and Or insns (#89160 ) Fixes #87394. PR: https://github.com/llvm/llvm-project/pull/89160	2024-04-23 15:48:43 +01:00
Alexey Bataev	b4a0fd40f1	[SLP]Fix PR89635: do not try to vectorize single-gather alternate node. No need to try to vectorize single gather/buildvector with alternate opcode graph, it is not profitable. In other cases, need to use last instruction for inserting the vectorized code.	2024-04-23 06:45:43 -07:00
Florian Hahn	dadf6f2c5a	[VPlan] Ignore incoming values with constant false mask. (#89384 ) Ignore incoming values with constant false masks when trying to simplify VPBlendRecipes. As a follow-on optimization, we should also be able to drop all incoming values with false masks by creating a new VPBlendRecipe with those operands dropped. PR: https://github.com/llvm/llvm-project/pull/89384	2024-04-23 13:59:01 +01:00
Simon Pilgrim	7f4f237cd8	[VectorCombine] foldShuffleOfShuffles - add missing arguments to getShuffleCost calls. Ensure the getShuffleCost arguments/instruction args are populated - minor extension to #88743 to help improve shuffle costs for certain corner cases (e.g. shuffles of loads)	2024-04-23 11:53:08 +01:00
Florian Hahn	17fb3e82f6	[VPlan] Skip extending ICmp results in trunateToMinimalBitwidth. Results of icmp don't need extending after truncating their operands, as the result will always be i1. Skip them during extending. Fixes https://github.com/llvm/llvm-project/issues/79742 Fixes https://github.com/llvm/llvm-project/issues/85185	2024-04-23 11:50:26 +01:00
Alexey Bataev	0ab0c1d982	[SLP]Introduce transformNodes() and transform loads + reverse to strided loads. Introduced transformNodes() function to perform transformation of the nodes (cost-based, instruction count based, etc.). Implemented transformation of consecutive loads + reverse order to strided loads with stride -1, if profitable. Reviewers: RKSimon, preames, topperc Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/88530	2024-04-22 12:31:57 -04:00
Alexey Bataev	6bd29d6639	[SLP]Fix PR89614: phis can be reordered, if reuses are not empty. Need to relax assertion and check ReuseShuffleIndices is not empty, if the root phi node has reorder indices.	2024-04-22 08:40:19 -07:00
Alexey Bataev	102a811094	[SLP]Fix a check for multi-users for icmp user. The compiler should not take into account the type of the cmp instruction, otherwise it may treat the size incorrectly and it may lead to incorrect codegen.	2024-04-22 08:23:15 -07:00
Simon Pilgrim	bddfbe748b	[VectorCombine] foldShuffleOfShuffles - fold "shuffle (shuffle x, undef), (shuffle y, undef)" -> "shuffle x, y" (#88743 ) Another step towards cleaning up shuffles that have been split, often across bitcasts between SSE intrinsic. Strip shuffles entirely if we fold to an identity shuffle.	2024-04-22 15:57:59 +01:00
Alexey Bataev	ef1d19b0a5	[SLP]Fix PR89438: check for all tree entries for the resized value. Need to check all possible entries, before trying looking for the minbitwidth in the user node. Otherwise we may incorrectly get signedness info.	2024-04-22 06:38:38 -07:00
Florian Hahn	c93f02978c	[VPlan] Remove custom checks for EVL placement in verifier (NFCI). After e2a72fa583d9, def-use chains of EVL are modeled explicitly. So there's no need for a custom check of its placement, as regular def-use verification will catch mis-placements.	2024-04-22 12:49:49 +01:00
Simon Pilgrim	4cc9c6d98d	[VectorCombine] foldShuffleOfBinops - don't fold shuffle(divrem(x,y),divrem(z,w)) if mask contains poison Fixes #89390	2024-04-22 09:00:38 +01:00
David Green	a8105026ff	[LV] Fix warning about Mask being set twice. NFC	2024-04-20 16:40:08 +01:00
Alexey Bataev	cee7d994b9	[SLP]Fix PR89438: Check for same vectorized node in MinBWs, not user. Need to check if the buildvector node has perfect diamond match in the graph and the matched node is resized.	2024-04-19 12:52:19 -07:00
Alexey Bataev	4d7f3d9e0f	[SLP]Fix final analysis for unsigned nodes. Need to check that at least single bit is cleared for unsigned nodes before reducing their size. Otherwise they might be treated as signed in signed nodes.	2024-04-19 03:03:56 -07:00
Mikhail Goncharov	054b1b3b5a	Revert "[SLP]Fix final analysis for unsigned nodes." This reverts commit 74e07ab523122d6a8347b25770062ab331b6bb84. It might be that Mask.getBitWidth() == Mask.countl_zero() (32 in my case) and zero bitwidth2 causes the crash.	2024-04-19 11:32:56 +02:00
Florian Hahn	e2a72fa583	[VPlan] Introduce recipes for VP loads and stores. (#87816 ) Introduce new subclasses of VPWidenMemoryRecipe for VP (vector-predicated) loads and stores to address multiple TODOs from https://github.com/llvm/llvm-project/pull/76172 Note that the introduction of the new recipes also improves code-gen for VP gather/scatters by removing the redundant header mask. With the new approach, it is not sufficient to look at users of the widened canonical IV to find all uses of the header mask. In some cases, a widened IV is used instead of separately widening the canonical IV. To handle that, first collect all VPValues representing header masks (by looking at users of both the canonical IV and widened inductions that are canonical) and then checking all users (recursively) of those header masks. Depends on https://github.com/llvm/llvm-project/pull/87411. PR: https://github.com/llvm/llvm-project/pull/87816	2024-04-19 09:44:23 +01:00
Alexey Bataev	74e07ab523	[SLP]Fix final analysis for unsigned nodes. Need to check that at least single bit is cleared for unsigned nodes before reducing their size. Otherwise they might be treated as signed in signed nodes.	2024-04-18 10:05:54 -07:00
Ramkumar Ramachandra	73e7f2ff70	LoopVectorize: guard marking iv as scalar; fix bug (#88730 ) When collecting loop scalars, LoopVectorize over-eagerly marks the induction variable and its update as scalars after vectorization, even if the induction variable update is a first-order recurrence. Guard the process with this check, fixing a crash. Fixes #72969.	2024-04-18 14:41:07 +01:00
Alexey Bataev	9462abdff1	[SLP]Fix PR89187: fixx assertion check. Need to use proper index variable to fix a crash.	2024-04-18 04:22:25 -07:00
Ramkumar Ramachandra	63d8058ef5	LoopVectorize: guard appending InstsToScalarize; fix bug (#88720 ) In the process of collecting instructions to scalarize, LoopVectorize uses faulty reasoning whereby it also adds instructions that will be scalar after vectorization. If an instruction satisfies isScalarAfterVectorization() for the given VF, it should not be appended to InstsToScalarize. Add this extra guard, fixing a crash. Fixes #55096.	2024-04-18 10:03:07 +01:00
Nikita Popov	1baa385065	[IR][PatternMatch] Only accept poison in getSplatValue() (#89159 ) In #88217 a large set of matchers was changed to only accept poison values in splats, but not undef values. This is because we now use poison for non-demanded vector elements, and allowing undef can cause correctness issues. This patch covers the remaining matchers by changing the AllowUndef parameter of getSplatValue() to AllowPoison instead. We also carry out corresponding renames in matchers. As a followup, we may want to change the default for things like m_APInt to m_APIntAllowPoison (as this is much less risky when only allowing poison), but this change doesn't do that. There is one caveat here: We have a single place (X86FixupVectorConstants) which does require handling of vector splats with undefs. This is because this works on backend constant pool entries, which currently still use undef instead of poison for non-demanded elements (because SDAG as a whole does not have an explicit poison representation). As it's just the single use, I've open-coded a getSplatValueAllowUndef() helper there, to discourage use in any other places.	2024-04-18 15:44:12 +09:00
Nikita Popov	888836930b	Revert "[SLP]Attempt to vectorize long stores, if short one failed." This reverts commit 6f7160eedb2db02f37d4ffd52fff7b0cf88b3fdc. This still causes large compile-time regressions in some cases.	2024-04-18 10:15:45 +09:00
Alexey Bataev	6f7160eedb	[SLP]Attempt to vectorize long stores, if short one failed. We can try to vectorize long store sequences, if short ones were unsuccessful because of the non-profitable vectorization. It should not increase compile time significantly (stores are sorted already, complexity is n x log n), but vectorize extra code. Metric: size..text Program size..text results results0 diff test-suite :: External/SPEC/CINT2006/400.perlbench/400.perlbench.test 1088012.00 1088236.00 0.0% test-suite :: SingleSource/UnitTests/matrix-types-spec.test 480396.00 480476.00 0.0% test-suite :: External/SPEC/CINT2017rate/525.x264_r/525.x264_r.test 664613.00 664661.00 0.0% test-suite :: External/SPEC/CINT2017speed/625.x264_s/625.x264_s.test 664613.00 664661.00 0.0% test-suite :: External/SPEC/CFP2017rate/510.parest_r/510.parest_r.test 2041105.00 2040961.00 -0.0% test-suite :: MultiSource/Applications/JM/lencod/lencod.test 836563.00 836387.00 -0.0% test-suite :: MultiSource/Benchmarks/7zip/7zip-benchmark.test 1035100.00 1032140.00 -0.3% In all benchmarks extra code gets vectorized Reviewers: RKSimon Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/88563	2024-04-17 10:24:35 -07:00
Florian Hahn	5d314353fb	[VPlan] Check for VPWidenLoadRecipe directly in truncateToMinBW. (NFCI). Since ne After a separate recipe has been introduced for wide loads in a9bafe91dd0, we can directly check for load recipes in the early bail-out and remove the redundant bail out for stores.	2024-04-17 15:53:32 +01:00
Florian Hahn	41b7341d6b	[VPlan] Factor out helper to recursively collect all users (NFCI). Factor out logic to collect all users recursively to be re-used in https://github.com/llvm/llvm-project/pull/87816.	2024-04-17 14:56:47 +01:00
Florian Hahn	a9bafe91dd	[VPlan] Split VPWidenMemoryInstructionRecipe (NFCI). (#87411 ) This patch introduces a new VPWidenMemoryRecipe base class and distinct sub-classes to model loads and stores. This is a first step in an effort to simplify and modularize code generation for widened loads and stores and enable adding further more specialized memory recipes. PR: https://github.com/llvm/llvm-project/pull/87411	2024-04-17 11:00:58 +01:00
Mel Chen	cbe148b730	[LV][NFC] Remove the declaration of function `fixReduction`. (#88491 )	2024-04-17 17:59:52 +08:00
Nikita Popov	efd60556f7	Revert "[SLP]Attempt to vectorize long stores, if short one failed." This reverts commit 7d4e8c1f3bbfe976f4871c9cf953f76d771b0eda. Contrary to the commit description, this does cause large compile-time regressions (up to 10% on individual files).	2024-04-17 09:25:05 +09:00
Arthur Eubanks	c6e01627ac	Revert "Reapply "[LV] Improve AnyOf reduction codegen. (#78304 )"" This reverts commit c6e38b928c56f562aea68a8e90f02dbdf0eada85. Causes miscompiles, see comments on #78304.	2024-04-16 20:40:21 +00:00
Florian Hahn	34777c238b	[VPlan] Don't mark VPBlendRecipe as phi-like. VPBlendRecipes don't get lowered to phis and usually do not appear at the beginning of blocks, due to their masks appearing before them. This effectively relaxes an over-eager verifier message. Fixes https://github.com/llvm/llvm-project/issues/88297. Fixes https://github.com/llvm/llvm-project/issues/88804.	2024-04-16 21:24:25 +01:00
Alexey Bataev	7d4e8c1f3b	[SLP]Attempt to vectorize long stores, if short one failed. We can try to vectorize long store sequences, if short ones were unsuccessful because of the non-profitable vectorization. It should not increase compile time significantly (stores are sorted already, complexity is n x log n), but vectorize extra code. Metric: size..text Program size..text results results0 diff test-suite :: External/SPEC/CINT2006/400.perlbench/400.perlbench.test 1088012.00 1088236.00 0.0% test-suite :: SingleSource/UnitTests/matrix-types-spec.test 480396.00 480476.00 0.0% test-suite :: External/SPEC/CINT2017rate/525.x264_r/525.x264_r.test 664613.00 664661.00 0.0% test-suite :: External/SPEC/CINT2017speed/625.x264_s/625.x264_s.test 664613.00 664661.00 0.0% test-suite :: External/SPEC/CFP2017rate/510.parest_r/510.parest_r.test 2041105.00 2040961.00 -0.0% test-suite :: MultiSource/Applications/JM/lencod/lencod.test 836563.00 836387.00 -0.0% test-suite :: MultiSource/Benchmarks/7zip/7zip-benchmark.test 1035100.00 1032140.00 -0.3% In all benchmarks extra code gets vectorized Reviewers: RKSimon Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/88563	2024-04-16 14:55:41 -04:00
Alexey Bataev	c7657cf7d1	[SLP]Keep externally used GEPs as GEPs, if possible instead of extractelement. If the vectorized GEP instruction can be still kept as a scalar GEP, better to keep it as scalar instead of extractelement. In many cases it is more profitable. Metric: size..text Program size..text results results0 diff test-suite :: SingleSource/Benchmarks/Misc/oourafft.test 18911.00 19695.00 4.1% test-suite :: SingleSource/Benchmarks/Misc-C++-EH/spirit.test 59987.00 60707.00 1.2% test-suite :: External/SPEC/CFP2017speed/638.imagick_s/638.imagick_s.test 1392209.00 1392753.00 0.0% test-suite :: External/SPEC/CFP2017rate/538.imagick_r/538.imagick_r.test 1392209.00 1392753.00 0.0% test-suite :: External/SPEC/CINT2006/400.perlbench/400.perlbench.test 1087996.00 1088236.00 0.0% test-suite :: MultiSource/Benchmarks/Bullet/bullet.test 309310.00 309342.00 0.0% test-suite :: External/SPEC/CINT2017rate/525.x264_r/525.x264_r.test 664661.00 664693.00 0.0% test-suite :: External/SPEC/CINT2017speed/625.x264_s/625.x264_s.test 664661.00 664693.00 0.0% test-suite :: External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 12354636.00 12354908.00 0.0% test-suite :: External/SPEC/CFP2006/453.povray/453.povray.test 1152748.00 1152716.00 -0.0% test-suite :: MultiSource/Applications/oggenc/oggenc.test 191787.00 191771.00 -0.0% test-suite :: SingleSource/UnitTests/matrix-types-spec.test 480796.00 480476.00 -0.1% Misc/oourafft - Extra code gets vectorized Misc-C++-EH/spirit - same CFP2017speed/638.imagick_s CFP2017rate/538.imagick_r - same, extra code gets vectorized CINT2006/400.perlbench - some extra 4 x ptr stores vectorized Bullet/bullet - extra 4 x ptr store vectorized CINT2017rate/525.x264_r CINT2017speed/625.x264_s - same CFP2017rate/526.blender_r - extra 8 x float stores (several), some extra 4 x ptr stores CFP2006/453.povray - 2 x double loads/stores replaced by 4 x double loads/stores Applications/oggenc - extra code is vectorized UnitTests/matrix-types-spec - extra code gets vectorized Reviewers: RKSimon Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/88877	2024-04-16 14:54:06 -04:00
Harald van Dijk	60de56c743	[ValueTracking] Restore isKnownNonZero parameter order. (#88873 ) Prior to #85863, the required parameters of llvm::isKnownNonZero were Value and DataLayout. After, they are Value, Depth, and SimplifyQuery, where SimplifyQuery is implicitly constructible from DataLayout. The change to move Depth before SimplifyQuery needed callers to be updated unnecessarily, and as commented in #85863, we actually want Depth to be after SimplifyQuery anyway so that it can be defaulted and the caller does not need to specify it.	2024-04-16 15:21:09 +01:00
Alexey Bataev	e84b2fb48d	[LV][NFCI]Use integer for cost/trip count calculations instead of double, fix possible UB. Using fp type in the compiler is not the best idea, here it used with the comparison for equal to 0 and may cause undefined behavior in some cases. Reviewers: fhahn Reviewed By: fhahn Pull Request: https://github.com/llvm/llvm-project/pull/87241	2024-04-16 09:48:13 -04:00
Alexey Bataev	26ebe16d78	[SLP]Fix PR88834: check if unsigned arg can be trunced, being used in smax/smin intrinsics. Need to check that unsigned argument can be safely used in smax/smin intrinsics by checking if at least single sign bit is cleared, otherwise its value may be treated as negative instead of positive.	2024-04-16 06:42:15 -07:00
Florian Hahn	b73476c784	[SLP] Make sure MinVF is a power-of-2 by using PowerOf2Ceil. This should ensure we explore the same VFs as before 6d66db3890a18e39. Fixes https://github.com/llvm/llvm-project/issues/88640.	2024-04-16 13:29:35 +01:00

1 2 3 4 5 ...

4459 Commits