llvm-project

Author	SHA1	Message	Date
Simon Pilgrim	5921295dca	Revert "[SLP] getSpillCost - fully populate IntrinsicCostAttributes to improve cost analysis." (#124962 ) Reverts llvm/llvm-project#124129 as its currently causing a regression at #124499 - avoids the regression until a proper fix can be added to getSpillCost	2025-01-29 22:17:53 +00:00
Nikita Popov	29441e4f5f	[IR] Convert from nocapture to captures(none) (#123181 ) This PR removes the old `nocapture` attribute, replacing it with the new `captures` attribute introduced in #116990. This change is intended to be essentially NFC, replacing existing uses of `nocapture` with `captures(none)` without adding any new analysis capabilities. Making use of non-`none` values is left for a followup. Some notes: * `nocapture` will be upgraded to `captures(none)` by the bitcode reader. * `nocapture` will also be upgraded by the textual IR reader. This is to make it easier to use old IR files and somewhat reduce the test churn in this PR. * Helper APIs like `doesNotCapture()` will check for `captures(none)`. * MLIR import will convert `captures(none)` into an `llvm.nocapture` attribute. The representation in the LLVM IR dialect should be updated separately.	2025-01-29 16:56:47 +01:00
Simon Pilgrim	89ca3e72ca	[CostModel][X86] Reduce worst case v8i16/v16i8 SSE2 shuffle costs (#124789 ) These were based off instruction count, not throughput - we can probably improve these further, but these throughput numbers match the worse expanded shuffles we see in the vector-shuffle-128-v* codegen tests.	2025-01-29 10:23:09 +00:00
Alexey Bataev	947d8ebbf3	[SLP]Unify getNumberOfParts use Adds getNumberOfParts and uses it instead of similar code across code base, fixes analysis of non-vectorizable types in computeMinimumValueSizes. Reviewers: RKSimon Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/124774	2025-01-28 12:16:44 -05:00
Alexey Bataev	a1ab5b4c87	[SLP]Check the MainOp matches the requirements for the instructions Need to include MainOp into the analysis of the instructions in getSameOpcode to be sure that it is checked for the requirements to prevent crashes during further analysis.	2025-01-28 06:00:52 -08:00
Alexey Bataev	1d5fbe83c3	[SLP]Adjust NumberOfParts value for adjusted number of buildvector scalars Need to adjust NumParts value, when GatheredScalars scalars are adjusted after extractelements analysis, to fix compiler crash	2025-01-28 05:45:13 -08:00
Han-Kuan Chen	08d14e10ca	[SLP] Fix CommonMask will be transformed into an incorrect mask if createShuffle is called multiple times. (#124244 ) We have two types of mask in SLP: a scalar mask and a vector mask. When vectorizing four i32 additions into <4 x i32>, SLP creates a mask of length 4. When vectorizing four <2 x i32> additions into <8 x i32>, SLP also creates a mask of length 4. We refer to the first case as a scalar mask (because the mask element represents a scalar, i32), and the second case as a vector mask (because the mask element represents a vector, <4 x i32>). At some point, we must convert the scalar mask into a vector mask (otherwise, calling TTI cost functions or IRBuilderBase functions may yield incorrect results). Since both ShuffleCostEstimator and ShuffleInstructionBuilder can modify the CommonMask, we have decided to perform the mask transformation only within createShuffle. However, we do not store the transformed result, as createShuffle may be called multiple times.	2025-01-28 12:02:37 +08:00
Simon Pilgrim	dec47b76f4	[CostModel][X86] Update baseline CTTZ/CTLZ costs for x86_64 (#124312 ) Followup to #123623 - now that the CMOV has been removed, the throughput has improved, reducing the benefit of vectorization on pre-x86-64-v3 CPUs	2025-01-26 14:43:51 +00:00
Alexey Bataev	5e65f43041	[SLP][NFC]Add a test, producing serie of extrtactelements, building non-extendable tree	2025-01-25 11:50:14 -08:00
Simon Pilgrim	a12d7e4b61	[SLP] getVectorCallCosts - don't provide scalar argument data for vector IntrinsicCostAttributes (#124254 ) getVectorCallCosts determines the cost of a vector intrinsic, based off an existing scalar intrinsic call - but we were including the scalar argument data to the IntrinsicCostAttributes, which meant that not only was the cost calculation not type-only based, it was making incorrect assumptions about constant values etc. This also exposed an issue that x86 relied on fallback calculations for funnel shift costs - this is great when we have the argument data as that improves the accuracy of uniform shift amounts etc., but meant that type-only costs would default to Cost=2 for all custom lowered funnel shifts, which was far too cheap. This is the reverse of #124129 where we weren't including argument data when we could. Fixes #63980	2025-01-24 15:13:13 +00:00
Simon Pilgrim	625e0a40f1	[SLP][X86] Add missing SSE2/SSE4 checks from vector rotate tests	2025-01-24 10:12:19 +00:00
Simon Pilgrim	7746596713	[SLP][X86] Add VBMI2 coverage for funnel shift tests VBMI2 CPUs actually have vector funnel shift instruction support	2025-01-24 09:47:40 +00:00
Simon Pilgrim	d8cd8d56ea	[SLP] getSpillCost - fully populate IntrinsicCostAttributes to improve cost analysis. (#124129 ) We were only constructing the IntrinsicCostAttributes with the arg type info, and not the args themselves, preventing more detailed cost analysis (constant / uniform args etc.) Just pass the whole IntrinsicInst to the constructor and let it resolve everything it can. Noticed while having yet another attempt at #63980	2025-01-23 16:57:13 +00:00
Alexey Bataev	ccd77953d0	[SLP][NFC]Add a test with potential alternate node, marked for minbitwidth size	2025-01-22 06:48:34 -08:00
Sushant Gokhale	c6c647588f	[SLP][NFC] Update test for PR #118055 (#122696 ) This patch updates the motivating test for the above PR so that it does not conflict with urem PR #122236	2025-01-22 03:28:34 -08:00
Alexey Bataev	184c056e35	[SLP][NFC]Update the test by replacing undefs with constant values, NFC	2025-01-21 08:43:33 -08:00
Alexey Bataev	5deb4ef9ab	[SLP]Initial non-power-of-2 (but still whole register) for remaining nodes Added non-power-of-2 (but still whole registers) vectorization support for nodes other than stores and reductions. Reviewers: preames, RKSimon, hiraditya Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/113356	2025-01-21 10:33:03 -05:00
Alexey Bataev	7d01a8f2b9	[SLP]Fix vector factor for repeated node for bv When adding a node vector, when it is used already in the shuffle for buildvector, need to calculate vector factor from all vector, not only this single vector, to avoid incorrect result. Also, need to increase stability of the reused entries detection to avoid mismatch in cost estimation/codegen. Fixes #123639	2025-01-20 14:22:20 -08:00
Alexey Bataev	5e4c34a9b6	[SLP][NFC]Add a test with incorrect length and cost for repeated matching node	2025-01-20 14:17:15 -08:00
Alexey Bataev	2b1e037adb	[SLP]Fix createInsertVector mask emission	2025-01-18 11:48:53 -08:00
Alexey Bataev	92a6eff62b	[SLP][NFC]Fix the test to use poison and update to show the error	2025-01-18 11:46:04 -08:00
Alexey Bataev	55f7491dde	[SLP][NFC]Add a test with incomplete insertion mask, NFC	2025-01-18 08:13:54 -08:00
Han-Kuan Chen	07d496538f	[SLP] Replace MainOp and AltOp in TreeEntry with InstructionsState. (#122443 ) Add TreeEntry::hasState. Add assert for getTreeEntry. Remove the OpValue parameter from the canReuseExtract function. Remove the Opcode parameter from the ComputeMaxBitWidth lambda function.	2025-01-18 10:23:20 +08:00
Alexey Bataev	ebfdd38228	[SLP][NFC]Replace undef with constant zero in tests, NFC	2025-01-17 09:48:03 -08:00
Alexey Bataev	98e2328451	[SLP][NFC]Add a test with non-power-of-2 gathered consecutive loads, NFC	2025-01-13 12:50:11 -08:00
Alexey Bataev	066b88879a	[SLP]Correctly set vector operand for extracts with poisons When extracts are vectorized and it has some poison values instead of instructions, need to correctly set the vectorized operand not as poison, but as a main vector operand of the main extract instruction. Fixes #122583	2025-01-13 10:57:07 -08:00
Alexey Bataev	ae54617523	[SLP][NFC]Add a test with incorrect extractelement parameter after extending with poison	2025-01-13 10:32:11 -08:00
Alexey Bataev	092d628383	[SLP]Check for div/rem instructions before extending with poisons Need to check if the instructions can be safely extended with poison before actually doing this to avoid incorrect transformations. Fixes #122691	2025-01-13 09:28:27 -08:00
Alexey Bataev	af524de1fa	[SLP]Do not include subvectors for fully matched buildvectors If the buildvector node fully matched another node, need to exclude subvectors, when building final shuffle, just a shuffle of the original node must be emitted. Fixes #122584	2025-01-13 07:24:16 -08:00
Alexey Bataev	681c83a2f9	[SLP]Fix mask generation after cost estimation When estimating the cost of entries shuffles for buildvectors, need to rebuild original mask, not a generated submask, used for subregisters analysis. Fixes #122430	2025-01-10 09:32:35 -08:00
Alexey Bataev	3c9c94a24f	Revert "[SLP]Fix mask generation after cost estimation" This reverts commit 547ba9730bf05df3383150f730a689f2c8336206 to fix buildbots reported in https://lab.llvm.org/buildbot/#/builders/123/builds/11370, https://lab.llvm.org/buildbot/#/builders/133/builds/9492	2025-01-10 08:46:42 -08:00
Alexey Bataev	547ba9730b	[SLP]Fix mask generation after cost estimation When estimating the cost of entries shuffles for buildvectors, need to rebuild original mask, not a generated submask, used for subregisters analysis. Fixes #122430	2025-01-10 08:17:56 -08:00
Alexey Bataev	920c58916a	[SLP][NFC]Add a test with the mask translate after buildvector shuffle cost estimation	2025-01-10 08:12:03 -08:00
Alexey Bataev	5ff36748cf	[SLP]Fix mask processing for reused gathered scalars Need to sync the mask between cost and actual emission to avoid bugs in mask calculation Fixes #122324	2025-01-09 11:24:48 -08:00
Alexey Bataev	1160994602	[SLP]Fix a crash for very long GEP chains Need to check if the GEP bases are equal and return false early. Also, need to return false if the lookup is too deep, considering bases equal too. Fixes a crash in the assertion.	2025-01-08 06:47:41 -08:00
David Green	a8dab1aa03	[AArch64] Add a subvector extract cost. (#121472 ) These can generally be emitted using an ext instruction or mov from the high half. The half half extracts can be free depending on the users, but that is not handled here, just the basic costs. It originally included all subvector extracts, but that was toned-down to just half-vector extracts to try and help the mid end not breakup high/low extracts without having the SLP vectorizer create a mess using other shuffles.	2025-01-08 08:13:07 +00:00
Alexey Bataev	889215a30e	[SLP]Followup fix for the poisonous logical op in reductions If the VectorizedTree still may generate poisonous value, but it is not the original operand of the reduction op, need to check if Res still the operand, to generate correct code. Fixes #114905	2024-12-26 05:11:26 -08:00
Alexey Bataev	07d284d4eb	[SLP]Add cost estimation for gather node reshuffling Adds cost estimation for the variants of the permutations of the scalar values, used in gather nodes. Currently, SLP just unconditionally emits shuffles for the reused buildvectors, but in some cases better to leave them as buildvectors rather than shuffles, if the cost of such buildvectors is better. X86, AVX512, -O3+LTO Metric: size..text Program size..text results results0 diff test-suite :: External/SPEC/CINT2006/445.gobmk/445.gobmk.test 912998.00 913238.00 0.0% test-suite :: MultiSource/Benchmarks/MiBench/consumer-lame/consumer-lame.test 203070.00 203102.00 0.0% test-suite :: External/SPEC/CFP2017speed/638.imagick_s/638.imagick_s.test 1396320.00 1396448.00 0.0% test-suite :: External/SPEC/CFP2017rate/538.imagick_r/538.imagick_r.test 1396320.00 1396448.00 0.0% test-suite :: MultiSource/Benchmarks/Bullet/bullet.test 309790.00 309678.00 -0.0% test-suite :: External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 12477607.00 12470807.00 -0.1% CINT2006/445.gobmk - extra code vectorized MiBench/consumer-lame - small variations CFP2017speed/638.imagick_s CFP2017rate/538.imagick_r - extra vectorized code Benchmarks/Bullet - extra code vectorized CFP2017rate/526.blender_r - extra vector code RISC-V, sifive-p670, -O3+LTO CFP2006/433.milc - regressions, should be fixed by https://github.com/llvm/llvm-project/pull/115173 CFP2006/453.povray - extra vectorized code CFP2017rate/508.namd_r - better vector code CFP2017rate/510.parest_r - extra vectorized code SPEC/CFP2017rate - extra/better vector code CFP2017rate/526.blender_r - extra vectorized code CFP2017rate/538.imagick_r - extra vectorized code CINT2006/403.gcc - extra vectorized code CINT2006/445.gobmk - extra vectorized code CINT2006/464.h264ref - extra vectorized code CINT2006/483.xalancbmk - small variations CINT2017rate/525.x264_r - better vectorization Reviewers: RKSimon Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/115201	2024-12-24 15:35:29 -05:00
Alexey Bataev	f0f8dab712	[SLP]Check if the first reduced value requires freeze/swap, if it may be too poisonous If several reduced values are combined and the first reduced value is just the original reduced value of the bool logical op, need to freeze it to prevent the propagation of the poison value. Fixes #114905	2024-12-24 07:40:35 -08:00
Alexey Bataev	6bafbc99b0	[SLP][NFC]Add a test with incorrect (more poisnous) reduction chain	2024-12-24 07:33:40 -08:00
Alexey Bataev	030829a7e5	[SLP]Drop samesign flag if the vector node has reduced bitwidth If the operands of the icmp instructions has reduced bitwidth after MinBitwidth analysis, need to drop samesign flag to preserve correctness of the transformation. Fixes #120823	2024-12-23 16:55:11 -08:00
Simon Pilgrim	611401c115	[CostModel][X86] getShuffleCost - use processShuffleMasks to split SK_PermuteTwoSrc shuffles to legal types (#120599 ) processShuffleMasks can now correctly handle 2 src shuffles, so we can use the existing SK_PermuteSingleSrc splitting cost logic to handle SK_PermuteTwoSrc as well and correctly recognise the number of active subvectors per legalised shuffle.	2024-12-20 10:39:45 +00:00
Simon Pilgrim	091448e3c1	Revert "[CostModel][X86] getShuffleCost - use processShuffleMasks to split SK_PermuteTwoSrc shuffles to legal types" (#120707 ) Reverts llvm/llvm-project#120599 - some recent tests are currently failing	2024-12-20 10:06:03 +00:00
Simon Pilgrim	81e63f9e0c	[CostModel][X86] getShuffleCost - use processShuffleMasks to split SK_PermuteTwoSrc shuffles to legal types (#120599 ) processShuffleMasks can now correctly handle 2 src shuffles, so we can use the existing SK_PermuteSingleSrc splitting cost logic to handle SK_PermuteTwoSrc as well and correctly recognise the number of active subvectors per legalised shuffle.	2024-12-20 09:55:11 +00:00
DianQK	e7a4d78ad3	[SLP] Check if instructions exist after vectorization (#120434 ) Fixes #120433.	2024-12-19 06:21:57 +08:00
Alexander Kornienko	23a239267e	Revert "[InstCombine] Infer nuw for gep inbounds from base of object" (#120460 ) Reverts llvm/llvm-project#119225 due to the lack of sanitizer support, large potential of breaking code containing latent UB, non-trivial localization and investigation, and what seems to be a bad interaction with msan (a test is in the works). Related discussions: https://github.com/llvm/llvm-project/pull/119225#issuecomment-2551904822 https://github.com/llvm/llvm-project/pull/118472#issuecomment-2549986255	2024-12-18 19:06:34 +01:00
Alexey Bataev	0e11e19416	[SLP][NFC]Remove undef and update tests	2024-12-17 11:45:20 -08:00
Alexey Bataev	d1a7225076	[SLP]Check if the node must keep its original bitwidth Need to check if during previous analysis the node has requested to keep its original bitwidth to avoid incorrect codegen. Fixes #120076	2024-12-16 08:01:22 -08:00
Alexey Bataev	c53901405a	[SLP][NFC]Add a test with incorrect bitwidth for the node, previously identified as non-shrinkable	2024-12-16 07:50:49 -08:00
Han-Kuan Chen	3133acf1fb	Revert "[SLP] Make getSameOpcode support different instructions if they have same semantics. (#112181 )" This reverts commit 82204154b7bd1f8c487c94c7ef00399d776b29f0.	2024-12-12 20:38:31 -08:00

1 2 3 4 5 ...

2086 Commits