llvm-project

Author	SHA1	Message	Date
Alexey Bataev	ebcb5d59fc	Revert "[IR]Add NumSrcElts param to is..Mask static function in ShuffleVectorInst." This reverts commit 9f5960e004ff54082ccfa9396522e07358f5b66b to fix buildbots reported here https://lab.llvm.org/buildbot/#/builders/230/builds/19412.	2023-09-29 15:03:46 -07:00
Alexey Bataev	9f5960e004	[IR]Add NumSrcElts param to is..Mask static function in ShuffleVectorInst. Need to add NumSrcElts param to is..Mask functions in ShuffleVectorInstruction class for better mask analysis. Mask.size() not always matches the sizes of the permuted vector(s). Allows to better estimate the cost in SLP and fix uses of the functions in other cases. Differential Revision: https://reviews.llvm.org/D158449	2023-09-29 13:16:03 -07:00
Alexey Bataev	019aee8327	[SLP]Improve costs in computeExtractCost() to avoid crash after D158449. Need to consider the length of the original vector for extractelements, not the length, matched number of the scalars. It fixes 2 issues: 1) improves cost estimation; 2) Fixes crashes after D158449.	2023-09-29 07:48:02 -07:00
Hans Wennborg	06f3b0ed43	Revert "[SLP]Improve costs in computeExtractCost() to avoid crash after D158449." This caused asserts: Assertion failed: NumElts > 1 && "Expected at least 2-element fixed length vector(s).", file C:\b\s\w\ir\cache\builder\src\third_party\llvm\llvm\lib\Transforms\Vectorize\SLPVectorizer.cpp, line 7096 see comment on `59a67ea35d` > Need to consider the length of the original vector for extractelements, > not the length, matched number of the scalars. It fixes 2 issues: 1) > improves cost estimation; 2) Fixes crashes after D158449. This reverts commit 59a67ea35d608480257fc64ec3e5106ef50de740.	2023-09-29 10:42:19 +02:00
Alexey Bataev	3204f88a8b	Revert "[IR]Add NumSrcElts param to is..Mask static function in ShuffleVectorInst." This reverts commit c88c281cf1ac1a01c55231b93826d7c8ae83985b to fix the crash revealed by https://lab.llvm.org/buildbot/#/builders/230/builds/19353.	2023-09-28 11:57:32 -07:00
Alexey Bataev	c88c281cf1	[IR]Add NumSrcElts param to is..Mask static function in ShuffleVectorInst. Need to add NumSrcElts param to is..Mask functions in ShuffleVectorInstruction class for better mask analysis. Mask.size() not always matches the sizes of the permuted vector(s). Allows to better estimate the cost in SLP and fix uses of the functions in other cases. Differential Revision: https://reviews.llvm.org/D158449	2023-09-28 11:03:21 -07:00
Alexey Bataev	59a67ea35d	[SLP]Improve costs in computeExtractCost() to avoid crash after D158449. Need to consider the length of the original vector for extractelements, not the length, matched number of the scalars. It fixes 2 issues: 1) improves cost estimation; 2) Fixes crashes after D158449.	2023-09-28 09:36:08 -07:00
Nikita Popov	3b82397965	[VectorCombine] Check for non-byte-sized element type We should check whether the element type is non-byte-sized, not the vector type. For types like <32 x i1> the whole type is byte-sized, but the individual elements (that we scalarize to) are not. Fixes https://github.com/llvm/llvm-project/issues/67060.	2023-09-28 14:18:30 +02:00
Mikael Holmen	9cecee97a0	[VPlan] Silence gcc Wparentheses warning [NFC] Without the fix gcc warns about ../lib/Transforms/Vectorize/VPlanTransforms.cpp:968:42: warning: suggest parentheses around '&&' within '\|\|' [-Wparentheses] 968 \| UseActiveLaneMaskForControlFlow && \| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~ 969 \| "DataAndControlFlowWithoutRuntimeCheck implies " \| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 970 \| "UseActiveLaneMaskForControlFlow"); \| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~	2023-09-28 12:04:26 +02:00
Alexey Bataev	9eeb0293e2	[SLP]Cleanup MultiNodeScalars when tree deleted. Need to clear MultiNodeScalars map to avoid compiler crash when tree is deleted.	2023-09-27 07:48:53 -07:00
Alexey Bataev	ea7f43ec14	[SLP]Do not gather node, if the instruction, that does not require scheduling, is previously vectorized. If the main node was vectorized already, but does not require scheduling, we still can try to vectorize it in this new node instead of gathering.	2023-09-26 11:57:35 -07:00
Ben Shi	ea0ee55c02	[VectorCombine] Enable transform 'scalarizeLoadExtract' for non constant indexes (#65445 ) Enable the transform if a non constant index is guaranteed to be safe via a UREM/AND.	2023-09-26 09:41:53 +08:00
alexfh	5d86176f48	Revert "[SLP]Do not gather node, if the instruction, that does not require" (#67386 ) This reverts commit 77053421228edd12a3ba73d4eebd970fcdd3b2c0, which introduces a clang crash (test case: https://gcc.godbolt.org/z/zn5n4KWPY).	2023-09-26 02:45:11 +02:00
Florian Hahn	97687b7aea	[VPlan] Add active-lane-mask as VPlan-to-VPlan transformation. This patch updates the mask creation code to always create compares of the form (ICMP_ULE, wide canonical IV, backedge-taken-count) up front when tail folding and introduce active-lane-mask as later transformation. This effectively makes (ICMP_ULE, wide canonical IV, backedge-taken-count) the canonical form for tail-folding early on. Introducing more specific active-lane-mask recipes is treated as a VPlan-to-VPlan optimization. This has the advantage of keeping the logic (and complexity) of introducing active-lane-mask recipes in a single place, instead of spreading the logic out across multiple functions. It also simplifies initial VPlan construction and enables treating introducing EVL as similar optimization. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D158779	2023-09-25 13:34:45 +01:00
Florian Hahn	1a9e45080f	[VPBuilder] Add setInsertPoint version taking a recipe directly (NFC). This helps to slightly simplify code when a recipe can be obtained easily. Suggested in D158779.	2023-09-25 12:17:53 +01:00
Florian Hahn	541e88dbc2	[VPlan] Simplify HCFG construction of region blocks (NFC). Update the logic to update the successors and predecessors of region blocks directly. This adds special handling for header and latch blocks in place, and removes the separate loop to fix up the region blocks. Helps to simplify D158333. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D159136	2023-09-24 21:53:35 +01:00
Kazu Hirata	e7497570d8	[Vectorize] Use range-based for loops (NFC)	2023-09-22 17:43:06 -07:00
Youngsuk Kim	e5026f0179	[llvm] Remove uses of Type::getPointerTo() (NFC) Partial progress towards removing in-tree uses of `getPointerTo()`, by employing the following options: * Drop the call entirely if the sole purpose of it is to support a no-op bitcast (remove the no-op bitcast as well). * Replace with `PointerType::get()`/`PointerType::getUnqual()` This is a NFC cleanup effort. Reviewed By: barannikov88 Differential Revision: https://reviews.llvm.org/D155232	2023-09-22 19:44:38 -04:00
Florian Hahn	d9f83169d1	[VPlan] Ensure start value of phis is the first op at construction (NFC) Header phi recipes have the start value (incoming from outside the loop) as first operand. This wasn't the case for VPWidenPHIRecipes. Instead the start value was picked during execute() by doing extra work. To be in line with other recipes, ensure the operand order is as expected during construction.	2023-09-22 21:24:15 +01:00
Alexey Bataev	7ff83ed6cd	[SLP]Do not try to reorder possible strided nodes. Reordering of possible strided nodes in bottom-to-top order requires top-to-bottom reordering of the operands of such nodes, which is not supported. Need to disable reordering of strided operands to avoid compiler crashes.	2023-09-22 07:55:43 -07:00
David Spickett	8f548610a6	Revert "[SLP]Use source vector type as the original vector type instead of" This reverts commit 9a99944df068b29b905cd8ba9a2132cc6382b6fb. Due to test suite failures on all our SVE buildbots e.g.: https://lab.llvm.org/buildbot/#/builders/184/builds/7375 clang: ../llvm/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp:3565: InstructionCost llvm::AArch64TTIImpl::getShuffleCost(TTI::ShuffleKind, VectorType , ArrayRef<int>, TTI::TargetCostKind, int, VectorType , ArrayRef<const Value *>): Assertion `Mask.size() == TpNumElts && "Expected Mask and Tp size to match!"' failed.	2023-09-22 07:52:16 +00:00
Alexey Bataev	9a99944df0	[SLP]Use source vector type as the original vector type instead of artificial for better cost estimation. Need to use original source vector type, not the one artificially constructed, based on the number of vectorized scalars. It affect the cost significantly.	2023-09-21 11:34:02 -07:00
Alexey Bataev	3dc28e6c6a	[SLp]Fix a crash because of wrong deps between vectorized nodes. Need to change the order of the nodes vectorization to avoid too early insertion of the first node.	2023-09-21 10:19:11 -07:00
Alexey Bataev	12fda304cc	[SLP][NFC]Unify add() member function in CostEstimator, NFC. Make add() function smart enough to understand that the shuffle of a single entry is requested, if it sees that the second node is the same as the first.	2023-09-21 07:59:37 -07:00
Alexey Bataev	c601928cb9	[SLP][NFC]Improve compile time by storing all nodes for the given scalar. No need to scan the whole graph when trying to find matching node for the scalar, vectorized in several nodes, better to store corresponding nodes along and scan just this small list.	2023-09-21 07:24:31 -07:00
Florian Hahn	f23246a0bb	[LV] Directly add fast-math flags to select recipe (NFC). Now that VPInstruction can manage fast math flags via VPRecipeWithIRFlags, use them directly to model the fast-math flags of the select created for the final reduction value instead of adding them late.	2023-09-21 11:05:55 +01:00
Florian Hahn	1a9358c090	[LV] Relax over-strict assertion for reduction exit value selects. After f108c6c, (mul x, 1) is simplified to x, which can cause the select for the final reduction value when tail-folding to use the reduction value for both options. Relax the assertion to make sure this case is allowed. Note that the reduction is now redundant itself and could be further simplified. Fixes #66895.	2023-09-21 10:12:29 +01:00
Michael Maitland	e0aaa1956d	[VectorCombine][RISCV] Convert VPIntrinsics with splat operands to splats (#65706 ) of the scalar operation VP Intrinsics whose vector operands are both splat values may be simplified into the scalar version of the operation and the result is splatted. This issue is the intrinsic dual of #65072.	2023-09-20 18:27:51 -04:00
Alexey Bataev	7705342122	[SLP]Do not gather node, if the instruction, that does not require scheduling, is previously vectorized. If the main node was vectorized already, but does not require scheduling, we still can try to vectorize it in this new node instead of gathering.	2023-09-20 12:52:37 -07:00
Alexey Bataev	ebed4692f8	[SLP]Fix a crash when trying to find operand with re-vectorized main instruction. Need to check if the operand scalars are vectorized in the a different vector node, if the main instruction is already gets vectorized in other vector node.	2023-09-20 09:54:15 -07:00
Alexey Bataev	7db87a66b0	[SLP]Fix PR66795: Check correct deps for vectorized inst with multiple vectorized node uses. If the instruction is vectorized in many different vector nodes, it may break the dependency analysis for gathered nodes with matched scalars. Need to properly check the dependency between such gather nodes to avoid cycle dependency.	2023-09-19 12:11:33 -07:00
Florian Hahn	f108c6cdc1	[VPlan] Fold (MUL A, 1) -> A as VPlan2VPlan transform. Add first VPlan-based recipe simplification to fold (MUL A, 1) -> A. Among other things, this enables additional simplifications after applying versioned strides, as follow up to D147783. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D159200	2023-09-18 21:45:34 +01:00
Ben Shi	87143ff9f2	[VectorCombine] Fix a spot in commit 068357d9b09cd635b1c2f126d119ce9afecb28f7 My previous commit leads to a crash in "Builders/sanitizer-x86_64-linux-fast" as https://lab.llvm.org/buildbot/#/builders/5/builds/36746. And this patch fixes it.	2023-09-18 15:01:47 +08:00
Ben Shi	068357d9b0	[VectorCombine] Enable transform 'scalarizeLoadExtract' for scalable vector types (#65443 ) The transform 'scalarizeLoadExtract' can be applied to scalable vector types if the index is less than the minimum number of elements. The check whether the index is less than the minimum number of elements locates at line 1175~1180. 'scalarizeLoadExtract' will call 'canScalarizeAccess' and check the returned result if this transform is safe. At the beginning of the function 'canScalarizeAccess', the index will be checked 1. If it is less than the number of elements of a fixed vector type. 2. If it is less than the minimum number of elements of a scalable vector type. Otherwise 'canScalarizeAccess' will return unsafe and this transform will be prevented.	2023-09-18 10:49:18 +08:00
Florian Hahn	1d1cba44ea	[VPlan] Remove stray indent when printing scalar steps recipe. VPScalarIVStepsRecipe will now be printed as vp<%6> = SCALAR-STEPS vp<%3>, ir<1> instead of vp<%6> = SCALAR-STEPS vp<%3>, ir<1>	2023-09-17 10:15:52 +01:00
Alexey Bataev	434aa2fe56	[SLP]Improve canreuseExtracts for reordering analysis. Improve the analysis in canReuseExtracts for the reodering to better reorder extracts for ExtractSubvector pattern.	2023-09-15 12:09:45 -07:00
Alexey Bataev	b9ad72ba05	[SLP]Fix PR66176: SLP incorrectly reorders select operands. On the very first iteration for the reductions, when trying to build reduction for boolean logic operations, no need to compare LHS/RHS with the Reduction(VectorizedTree), need to compare with actual parameters of the reduction operations.	2023-09-15 03:57:36 -07:00
Alexey Bataev	c15c1e5dd5	[SLP]Do not account non-instructions for external use. If the non-instruction gets vectorized, no need to account its extract cost, it won't be removed and replaced by extractelement instruction.	2023-09-14 12:40:33 -07:00
Jeremy Morse	e54277fa10	[NFC][RemoveDIs] Use iterators over inst-pointers when using IRBuilder This patch adds a two-argument SetInsertPoint method to IRBuilder that takes a block/iterator instead of an instruction, and updates many call sites to use it. The motivating reason for doing this is given here [0], we'd like to pass around more information about the position of debug-info in the iterator object. That necessitates passing iterators around most of the time. [0] https://discourse.llvm.org/t/rfc-instruction-api-changes-needed-to-eliminate-debug-intrinsics-from-ir/68939 Differential Revision: https://reviews.llvm.org/D152468	2023-09-11 20:01:19 +01:00
Alexey Bataev	9a90457a76	[SLP][NFC]Use ArrayReffor operands directly instead of entry/operand number, NFC.	2023-09-11 11:16:13 -07:00
Jeremy Morse	6942c64e81	[NFC][RemoveDIs] Prefer iterator-insertion over instructions Continuing the patch series to get rid of debug intrinsics [0], instruction insertion needs to be done with iterators rather than instruction pointers, so that we can communicate information in the iterator class. This patch adds an iterator-taking insertBefore method and converts various call sites to take iterators. These are all sites where such debug-info needs to be preserved so that a stage2 clang can be built identically; it's likely that many more will need to be changed in the future. At this stage, this is just changing the spelling of a few operations, which will eventually become signifiant once the debug-info bearing iterator is used. [0] https://discourse.llvm.org/t/rfc-instruction-api-changes-needed-to-eliminate-debug-intrinsics-from-ir/68939 Differential Revision: https://reviews.llvm.org/D152537	2023-09-11 11:48:45 +01:00
Alexey Bataev	5bab59de44	[SLP]Try to vectorize scalars, being vectorized already, but does not need to be scheduled. If the scalar does not need to be scheduled and it was vectorized already in one of the vector nodes, we still can try to vectorize it in another node. Just does not need account its cost in the scalar total cost, as it will be handled in the main vectorized node. Differential Revision: https://reviews.llvm.org/D159205	2023-09-08 13:34:12 -07:00
Florian Hahn	08de6508ab	[LV] Return debug loc directly from getDebugLocFromInstrOrOps (NFCI) The return value of the function is only used to get the debug location. Directly return the debug location, as this avoids an extra null check in the caller.	2023-09-08 16:29:09 +01:00
Florian Hahn	3e2d564c3d	[VPlan] Use VPRecipeWithFlags for VPScalarIVStepsRecipe (NFC). This directly models the flags as part of the recipe, which allows dropping them using the VPlan infrastructure when required. It also allows removing the full reference to InductionDescriptor and limit it to only the opcode.	2023-09-08 15:46:12 +01:00
Alexey Bataev	30edf1c449	[SLP]Do not early exit if the number of unique elements is non-power-of-2. (#65476 ) We still can try to vectorize the bundle of the instructions, even if the repeated number of instruction is non-power-of-2. In this case need to adjust the cost (calculate the cost only for unique scalar instructions) and cost of the extracts. Also, when scheduling the bundle need to schedule only unique scalars to avoid compiler crash because of the multiple dependencies. Can be safely applied only if all scalars's users are also vectorized and do not require memory accesses (this one is a temporarily requirement, can be relaxed later). --------- Co-authored-by: Alexey Bataev <a.bataev@outlook.com>	2023-09-08 10:00:46 -04:00
Alexey Bataev	8d933ea5ac	[SLP][NFC]Use SmallDensetSet for lookup instead of ArrayRef, NFC.	2023-09-06 13:17:30 -07:00
Florian Hahn	785e7063b9	[VPlan] Don't rely on underlying instr in VPWidenRecipe (NFCI). VPWidenRecipe only needs the opcode to widen, all other information (flags, debug loc and operands) is already modeled directly via the recipe. This removes the remaining uses of the underlying instruction from VPWidenRecipe::execute.	2023-09-06 16:27:09 +01:00
Alexey Bataev	09b8bbd6e0	[SLP][NFC]Reorder indeces instead of real values, NFC. May save some memory/compile time.	2023-09-05 08:48:52 -07:00
Florian Hahn	165e24aa2a	[VPlan] Move DebugLoc to VPRecipeBase (NFCI). Add a dedicated debug location to VPRecipeBase to remove another unneeded use of the underlying LLVM IR instruction and also consolidate various DL fields in sub classes. Each recipe can have debug location and it shouldn't rely on reference to the underlying LLVM IR instructions to retain it. See various recipes that had separate DL fields already.	2023-09-05 15:45:16 +01:00
Florian Hahn	168e23c741	[VPlan] Remove reference to Instr when setting debug loc. (NFCI) This allows untangling references to underlying IR for various recipes.	2023-09-05 10:59:13 +01:00

1 2 3 4 5 ...

3993 Commits