llvm-project

Author	SHA1	Message	Date
Simon Pilgrim	5921295dca	Revert "[SLP] getSpillCost - fully populate IntrinsicCostAttributes to improve cost analysis." (#124962 ) Reverts llvm/llvm-project#124129 as its currently causing a regression at #124499 - avoids the regression until a proper fix can be added to getSpillCost	2025-01-29 22:17:53 +00:00
Alexey Bataev	4a1a697427	[SLP][NFC]Unify ScalarToTreeEntries and MultiNodeScalars, NFC Currently, SLP has 2 distinct storages to manage mapping between vectorized instructions and their corresponding vectorized TreeEntry nodes. It leads to inefficient lookup for the matching TreeEntries and makes it harder to correctly track instructions, associated with multiple nodes. There is a plan to extend this support for instructions, that require scheduling, to allow support for copyable elements. Merging ScalarToTreeEntry and MultiNodeScalars will allow reduce maintenance of the feature Reviewers: RKSimon Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/124914	2025-01-29 09:05:54 -05:00
Alexey Bataev	947d8ebbf3	[SLP]Unify getNumberOfParts use Adds getNumberOfParts and uses it instead of similar code across code base, fixes analysis of non-vectorizable types in computeMinimumValueSizes. Reviewers: RKSimon Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/124774	2025-01-28 12:16:44 -05:00
Alexey Bataev	a1ab5b4c87	[SLP]Check the MainOp matches the requirements for the instructions Need to include MainOp into the analysis of the instructions in getSameOpcode to be sure that it is checked for the requirements to prevent crashes during further analysis.	2025-01-28 06:00:52 -08:00
Alexey Bataev	1d5fbe83c3	[SLP]Adjust NumberOfParts value for adjusted number of buildvector scalars Need to adjust NumParts value, when GatheredScalars scalars are adjusted after extractelements analysis, to fix compiler crash	2025-01-28 05:45:13 -08:00
Han-Kuan Chen	08d14e10ca	[SLP] Fix CommonMask will be transformed into an incorrect mask if createShuffle is called multiple times. (#124244 ) We have two types of mask in SLP: a scalar mask and a vector mask. When vectorizing four i32 additions into <4 x i32>, SLP creates a mask of length 4. When vectorizing four <2 x i32> additions into <8 x i32>, SLP also creates a mask of length 4. We refer to the first case as a scalar mask (because the mask element represents a scalar, i32), and the second case as a vector mask (because the mask element represents a vector, <4 x i32>). At some point, we must convert the scalar mask into a vector mask (otherwise, calling TTI cost functions or IRBuilderBase functions may yield incorrect results). Since both ShuffleCostEstimator and ShuffleInstructionBuilder can modify the CommonMask, we have decided to perform the mask transformation only within createShuffle. However, we do not store the transformed result, as createShuffle may be called multiple times.	2025-01-28 12:02:37 +08:00
Alexey Bataev	f1d5e70a00	[SLP][NFC]Do not check poison values for corresponding vectorized entries No need to check poison values if they have been vectorized and/or mark them as vectorized, it should work only for instructions.	2025-01-27 06:38:23 -08:00
Simon Pilgrim	a12d7e4b61	[SLP] getVectorCallCosts - don't provide scalar argument data for vector IntrinsicCostAttributes (#124254 ) getVectorCallCosts determines the cost of a vector intrinsic, based off an existing scalar intrinsic call - but we were including the scalar argument data to the IntrinsicCostAttributes, which meant that not only was the cost calculation not type-only based, it was making incorrect assumptions about constant values etc. This also exposed an issue that x86 relied on fallback calculations for funnel shift costs - this is great when we have the argument data as that improves the accuracy of uniform shift amounts etc., but meant that type-only costs would default to Cost=2 for all custom lowered funnel shifts, which was far too cheap. This is the reverse of #124129 where we weren't including argument data when we could. Fixes #63980	2025-01-24 15:13:13 +00:00
Jeremy Morse	8e70273509	[NFC][DebugInfo] Use iterator moveBefore at many call-sites (#123583 ) As part of the "RemoveDIs" project, BasicBlock::iterator now carries a debug-info bit that's needed when getFirstNonPHI and similar feed into instruction insertion positions. Call-sites where that's necessary were updated a year ago; but to ensure some type safety however, we'd like to have all calls to moveBefore use iterators. This patch adds a (guaranteed dereferenceable) iterator-taking moveBefore, and changes a bunch of call-sites where it's obviously safe to change to use it by just calling getIterator() on an instruction pointer. A follow-up patch will contain less-obviously-safe changes. We'll eventually deprecate and remove the instruction-pointer insertBefore, but not before adding concise documentation of what considerations are needed (very few).	2025-01-24 10:53:11 +00:00
Alexey Bataev	c7e6ca76cb	[SLP][NFC]Add dump() method for ScheduleData struct type for better debugging	2025-01-23 09:49:37 -08:00
Simon Pilgrim	d8cd8d56ea	[SLP] getSpillCost - fully populate IntrinsicCostAttributes to improve cost analysis. (#124129 ) We were only constructing the IntrinsicCostAttributes with the arg type info, and not the args themselves, preventing more detailed cost analysis (constant / uniform args etc.) Just pass the whole IntrinsicInst to the constructor and let it resolve everything it can. Noticed while having yet another attempt at #63980	2025-01-23 16:57:13 +00:00
Alexey Bataev	fa299294c0	[SLP][NFC]Modernize code base in several places	2025-01-23 08:43:07 -08:00
Han-Kuan Chen	d3aea77f50	[SLP] Move transformMaskAfterShuffle into BaseShuffleAnalysis and use it as much as possible. (#123896 )	2025-01-23 09:47:38 +08:00
Alexey Bataev	5deb4ef9ab	[SLP]Initial non-power-of-2 (but still whole register) for remaining nodes Added non-power-of-2 (but still whole registers) vectorization support for nodes other than stores and reductions. Reviewers: preames, RKSimon, hiraditya Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/113356	2025-01-21 10:33:03 -05:00
Alexey Bataev	7d01a8f2b9	[SLP]Fix vector factor for repeated node for bv When adding a node vector, when it is used already in the shuffle for buildvector, need to calculate vector factor from all vector, not only this single vector, to avoid incorrect result. Also, need to increase stability of the reused entries detection to avoid mismatch in cost estimation/codegen. Fixes #123639	2025-01-20 14:22:20 -08:00
Alexey Bataev	2b1e037adb	[SLP]Fix createInsertVector mask emission	2025-01-18 11:48:53 -08:00
Han-Kuan Chen	07d496538f	[SLP] Replace MainOp and AltOp in TreeEntry with InstructionsState. (#122443 ) Add TreeEntry::hasState. Add assert for getTreeEntry. Remove the OpValue parameter from the canReuseExtract function. Remove the Opcode parameter from the ComputeMaxBitWidth lambda function.	2025-01-18 10:23:20 +08:00
George Chaltas	b1bf95c081	ReduxWidth check for 0 (#123257 ) Added assert to check for underflow of ReduxWidth modified: llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp Source code analysis flagged the operation (ReduxWwidth - 1) as potential underflow, since ReduxWidth is unsigned. Realize that this should never happen if everything is working right, but added an assert to check for it just in case.	2025-01-17 15:56:58 -05:00
Alexey Bataev	fec503d1a3	[SLP][NFC]Add safe createExtractVector and use instead Builder.CreateExtractVector	2025-01-17 11:46:46 -08:00
Ramkumar Ramachandra	0fe8469e08	SLPVectorizer: strip bad FIXME (NFC) (#122888 ) Follow up on 4a0d53a (PatternMatch: migrate to CmpPredicate) to get rid of the FIXME it introduced in SLPVectorizer: the FIXME is bad, and we'd get no testable impact by using CmpPredicate::getMatching here.	2025-01-14 11:27:55 +00:00
Alexey Bataev	066b88879a	[SLP]Correctly set vector operand for extracts with poisons When extracts are vectorized and it has some poison values instead of instructions, need to correctly set the vectorized operand not as poison, but as a main vector operand of the main extract instruction. Fixes #122583	2025-01-13 10:57:07 -08:00
Alexey Bataev	092d628383	[SLP]Check for div/rem instructions before extending with poisons Need to check if the instructions can be safely extended with poison before actually doing this to avoid incorrect transformations. Fixes #122691	2025-01-13 09:28:27 -08:00
Alexey Bataev	af524de1fa	[SLP]Do not include subvectors for fully matched buildvectors If the buildvector node fully matched another node, need to exclude subvectors, when building final shuffle, just a shuffle of the original node must be emitted. Fixes #122584	2025-01-13 07:24:16 -08:00
Mel Chen	56a37a3c76	[SLPVectorizer] Refactor HorizontalReduction::createOp (NFC) (#121549 ) This patch simplifies select-based integer min/max reductions by utilizing `llvm::getMinMaxReductionPredicate`, and generates intrinsic-based min/max reductions by utilizing `llvm::getMinMaxReductionIntrinsicOp`.	2025-01-13 16:11:31 +08:00
Han-Kuan Chen	35e76b6a4f	Revert "[SLP] NFC. Replace MainOp and AltOp in TreeEntry with InstructionsState. (#120198 )" This reverts commit f3d6cdc5aebafac3961d4fccbd2ca0e302c6082c.	2025-01-10 10:09:54 -08:00
Alexey Bataev	681c83a2f9	[SLP]Fix mask generation after cost estimation When estimating the cost of entries shuffles for buildvectors, need to rebuild original mask, not a generated submask, used for subregisters analysis. Fixes #122430	2025-01-10 09:32:35 -08:00
Alex MacLean	986f2ac48f	[SLPVectorizer] minor tweaks around lambdas for compatibility with older compilers (#122348 ) Older version of msvc do not have great lambda support and are not able to handle uses of class data or lambdas with implicit return types in some cases. These minor changes improve the sources compatibility with older msvc and don't hurt readability either.	2025-01-10 09:18:28 -08:00
Alexey Bataev	3c9c94a24f	Revert "[SLP]Fix mask generation after cost estimation" This reverts commit 547ba9730bf05df3383150f730a689f2c8336206 to fix buildbots reported in https://lab.llvm.org/buildbot/#/builders/123/builds/11370, https://lab.llvm.org/buildbot/#/builders/133/builds/9492	2025-01-10 08:46:42 -08:00
Alexey Bataev	547ba9730b	[SLP]Fix mask generation after cost estimation When estimating the cost of entries shuffles for buildvectors, need to rebuild original mask, not a generated submask, used for subregisters analysis. Fixes #122430	2025-01-10 08:17:56 -08:00
Mel Chen	e0f14e11c7	[SLPVectorizer] Refine the scope of RdxOpcode in HorizontalReduction::createOp (NFC) (#122239 ) This patch is one part of unifying IAnyOf and FAnyOf reduction. #118393 The related patch is #118777.	2025-01-10 16:01:36 +08:00
Han-Kuan Chen	f3d6cdc5ae	[SLP] NFC. Replace MainOp and AltOp in TreeEntry with InstructionsState. (#120198 ) Add TreeEntry::hasState. Add assert for getTreeEntry. Remove the OpValue parameter from the canReuseExtract function. Remove the Opcode parameter from the ComputeMaxBitWidth lambda function.	2025-01-09 23:41:52 -08:00
Han-Kuan Chen	5454ac28b3	Revert "[SLP] NFC. Replace MainOp and AltOp in TreeEntry with InstructionsState. (#120198 )" This reverts commit 760f550de25792db83cd39c88ef57ab6d80a41a0.	2025-01-09 18:41:47 -08:00
Han-Kuan Chen	36b423e0f8	[SLP] NFC. Refactor getSameOpcode and reduce for loop iterations. (#122241 ) Replace Cnt and AltIndex with MainOp and AltOp. Reduce the number of iterations in the for loop.	2025-01-10 09:06:07 +08:00
Han-Kuan Chen	760f550de2	[SLP] NFC. Replace MainOp and AltOp in TreeEntry with InstructionsState. (#120198 ) Add TreeEntry::hasState. Add assert for getTreeEntry. Remove the OpValue parameter from the canReuseExtract function. Remove the Opcode parameter from the ComputeMaxBitWidth lambda function.	2025-01-10 09:05:39 +08:00
Alexey Bataev	5ff36748cf	[SLP]Fix mask processing for reused gathered scalars Need to sync the mask between cost and actual emission to avoid bugs in mask calculation Fixes #122324	2025-01-09 11:24:48 -08:00
Alexey Bataev	5b76a2e51b	[SLP]Correctly calculate mask for the inserted vector	2025-01-08 15:18:06 -08:00
Alexey Bataev	0d921f96d4	[SLP][NFC]Introduce and use createInsertVector helper function, NFC	2025-01-08 14:26:13 -08:00
Alexey Bataev	1160994602	[SLP]Fix a crash for very long GEP chains Need to check if the GEP bases are equal and return false early. Also, need to return false if the lookup is too deep, considering bases equal too. Fixes a crash in the assertion.	2025-01-08 06:47:41 -08:00
Han-Kuan Chen	c50370c67a	[SLP] NFC. Use InstructionsState::valid if users just want to know whether VL has same opcode. (#120217 ) Add assert for InstructionsState::getOpcode. Use InstructionsState::getOpcode only when necessary.	2025-01-04 00:44:57 +08:00
Fangrui Song	edc42b2dc1	[SLP] Migrate away from PointerUnion::get	2024-12-27 21:01:09 -08:00
Alexey Bataev	07ba457525	[SLP][NFC]Add dump of combined entries, where applicable	2024-12-27 07:56:10 -08:00
Alexey Bataev	889215a30e	[SLP]Followup fix for the poisonous logical op in reductions If the VectorizedTree still may generate poisonous value, but it is not the original operand of the reduction op, need to check if Res still the operand, to generate correct code. Fixes #114905	2024-12-26 05:11:26 -08:00
Alexey Bataev	07d284d4eb	[SLP]Add cost estimation for gather node reshuffling Adds cost estimation for the variants of the permutations of the scalar values, used in gather nodes. Currently, SLP just unconditionally emits shuffles for the reused buildvectors, but in some cases better to leave them as buildvectors rather than shuffles, if the cost of such buildvectors is better. X86, AVX512, -O3+LTO Metric: size..text Program size..text results results0 diff test-suite :: External/SPEC/CINT2006/445.gobmk/445.gobmk.test 912998.00 913238.00 0.0% test-suite :: MultiSource/Benchmarks/MiBench/consumer-lame/consumer-lame.test 203070.00 203102.00 0.0% test-suite :: External/SPEC/CFP2017speed/638.imagick_s/638.imagick_s.test 1396320.00 1396448.00 0.0% test-suite :: External/SPEC/CFP2017rate/538.imagick_r/538.imagick_r.test 1396320.00 1396448.00 0.0% test-suite :: MultiSource/Benchmarks/Bullet/bullet.test 309790.00 309678.00 -0.0% test-suite :: External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 12477607.00 12470807.00 -0.1% CINT2006/445.gobmk - extra code vectorized MiBench/consumer-lame - small variations CFP2017speed/638.imagick_s CFP2017rate/538.imagick_r - extra vectorized code Benchmarks/Bullet - extra code vectorized CFP2017rate/526.blender_r - extra vector code RISC-V, sifive-p670, -O3+LTO CFP2006/433.milc - regressions, should be fixed by https://github.com/llvm/llvm-project/pull/115173 CFP2006/453.povray - extra vectorized code CFP2017rate/508.namd_r - better vector code CFP2017rate/510.parest_r - extra vectorized code SPEC/CFP2017rate - extra/better vector code CFP2017rate/526.blender_r - extra vectorized code CFP2017rate/538.imagick_r - extra vectorized code CINT2006/403.gcc - extra vectorized code CINT2006/445.gobmk - extra vectorized code CINT2006/464.h264ref - extra vectorized code CINT2006/483.xalancbmk - small variations CINT2017rate/525.x264_r - better vectorization Reviewers: RKSimon Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/115201	2024-12-24 15:35:29 -05:00
Alexey Bataev	852feea820	[SLP]Propagate AssumptionCache where possible	2024-12-24 09:20:26 -08:00
Alexey Bataev	0d6cb0ae9d	[SLP]Fix strict weak ordering criterion in comparators Fixes #121019	2024-12-24 08:13:57 -08:00
Alexey Bataev	f0f8dab712	[SLP]Check if the first reduced value requires freeze/swap, if it may be too poisonous If several reduced values are combined and the first reduced value is just the original reduced value of the bool logical op, need to freeze it to prevent the propagation of the poison value. Fixes #114905	2024-12-24 07:40:35 -08:00
Alexey Bataev	030829a7e5	[SLP]Drop samesign flag if the vector node has reduced bitwidth If the operands of the icmp instructions has reduced bitwidth after MinBitwidth analysis, need to drop samesign flag to preserve correctness of the transformation. Fixes #120823	2024-12-23 16:55:11 -08:00
Han-Kuan Chen	11676da808	[SLP] Normalize debug messages for newTreeEntry. (#119514 ) A debug message should follow after newTreeEntry. Make ExtractValueInst and ExtractElementInst use setOperand directly.	2024-12-23 21:42:02 +08:00
Finn Plummer	45c01e8a33	[NFC][TargetTransformInfo][VectorUtils] Consolidate `isVectorIntrinsic...` api (#117635 ) - update `VectorUtils:isVectorIntrinsicWithScalarOpAtArg` to use TTI for all uses, to allow specifiction of target specific intrinsics - add TTI to the `isVectorIntrinsicWithStructReturnOverloadAtField` api - update TTI api to provide `isTargetIntrinsicWith...` functions and consistently name them - move `isTriviallyScalarizable` to VectorUtils - update all uses of the api and provide the TTI parameter Resolves #117030	2024-12-19 11:54:26 -08:00
DianQK	e7a4d78ad3	[SLP] Check if instructions exist after vectorization (#120434 ) Fixes #120433.	2024-12-19 06:21:57 +08:00

1 2 3 4 5 ...

2081 Commits