llvm-project

Author	SHA1	Message	Date
Ben Shi	87143ff9f2	[VectorCombine] Fix a spot in commit 068357d9b09cd635b1c2f126d119ce9afecb28f7 My previous commit leads to a crash in "Builders/sanitizer-x86_64-linux-fast" as https://lab.llvm.org/buildbot/#/builders/5/builds/36746. And this patch fixes it.	2023-09-18 15:01:47 +08:00
Ben Shi	068357d9b0	[VectorCombine] Enable transform 'scalarizeLoadExtract' for scalable vector types (#65443 ) The transform 'scalarizeLoadExtract' can be applied to scalable vector types if the index is less than the minimum number of elements. The check whether the index is less than the minimum number of elements locates at line 1175~1180. 'scalarizeLoadExtract' will call 'canScalarizeAccess' and check the returned result if this transform is safe. At the beginning of the function 'canScalarizeAccess', the index will be checked 1. If it is less than the number of elements of a fixed vector type. 2. If it is less than the minimum number of elements of a scalable vector type. Otherwise 'canScalarizeAccess' will return unsafe and this transform will be prevented.	2023-09-18 10:49:18 +08:00
Florian Hahn	1d1cba44ea	[VPlan] Remove stray indent when printing scalar steps recipe. VPScalarIVStepsRecipe will now be printed as vp<%6> = SCALAR-STEPS vp<%3>, ir<1> instead of vp<%6> = SCALAR-STEPS vp<%3>, ir<1>	2023-09-17 10:15:52 +01:00
Alexey Bataev	434aa2fe56	[SLP]Improve canreuseExtracts for reordering analysis. Improve the analysis in canReuseExtracts for the reodering to better reorder extracts for ExtractSubvector pattern.	2023-09-15 12:09:45 -07:00
Alexey Bataev	b9ad72ba05	[SLP]Fix PR66176: SLP incorrectly reorders select operands. On the very first iteration for the reductions, when trying to build reduction for boolean logic operations, no need to compare LHS/RHS with the Reduction(VectorizedTree), need to compare with actual parameters of the reduction operations.	2023-09-15 03:57:36 -07:00
Alexey Bataev	c15c1e5dd5	[SLP]Do not account non-instructions for external use. If the non-instruction gets vectorized, no need to account its extract cost, it won't be removed and replaced by extractelement instruction.	2023-09-14 12:40:33 -07:00
Jeremy Morse	e54277fa10	[NFC][RemoveDIs] Use iterators over inst-pointers when using IRBuilder This patch adds a two-argument SetInsertPoint method to IRBuilder that takes a block/iterator instead of an instruction, and updates many call sites to use it. The motivating reason for doing this is given here [0], we'd like to pass around more information about the position of debug-info in the iterator object. That necessitates passing iterators around most of the time. [0] https://discourse.llvm.org/t/rfc-instruction-api-changes-needed-to-eliminate-debug-intrinsics-from-ir/68939 Differential Revision: https://reviews.llvm.org/D152468	2023-09-11 20:01:19 +01:00
Alexey Bataev	9a90457a76	[SLP][NFC]Use ArrayReffor operands directly instead of entry/operand number, NFC.	2023-09-11 11:16:13 -07:00
Jeremy Morse	6942c64e81	[NFC][RemoveDIs] Prefer iterator-insertion over instructions Continuing the patch series to get rid of debug intrinsics [0], instruction insertion needs to be done with iterators rather than instruction pointers, so that we can communicate information in the iterator class. This patch adds an iterator-taking insertBefore method and converts various call sites to take iterators. These are all sites where such debug-info needs to be preserved so that a stage2 clang can be built identically; it's likely that many more will need to be changed in the future. At this stage, this is just changing the spelling of a few operations, which will eventually become signifiant once the debug-info bearing iterator is used. [0] https://discourse.llvm.org/t/rfc-instruction-api-changes-needed-to-eliminate-debug-intrinsics-from-ir/68939 Differential Revision: https://reviews.llvm.org/D152537	2023-09-11 11:48:45 +01:00
Alexey Bataev	5bab59de44	[SLP]Try to vectorize scalars, being vectorized already, but does not need to be scheduled. If the scalar does not need to be scheduled and it was vectorized already in one of the vector nodes, we still can try to vectorize it in another node. Just does not need account its cost in the scalar total cost, as it will be handled in the main vectorized node. Differential Revision: https://reviews.llvm.org/D159205	2023-09-08 13:34:12 -07:00
Florian Hahn	08de6508ab	[LV] Return debug loc directly from getDebugLocFromInstrOrOps (NFCI) The return value of the function is only used to get the debug location. Directly return the debug location, as this avoids an extra null check in the caller.	2023-09-08 16:29:09 +01:00
Florian Hahn	3e2d564c3d	[VPlan] Use VPRecipeWithFlags for VPScalarIVStepsRecipe (NFC). This directly models the flags as part of the recipe, which allows dropping them using the VPlan infrastructure when required. It also allows removing the full reference to InductionDescriptor and limit it to only the opcode.	2023-09-08 15:46:12 +01:00
Alexey Bataev	30edf1c449	[SLP]Do not early exit if the number of unique elements is non-power-of-2. (#65476 ) We still can try to vectorize the bundle of the instructions, even if the repeated number of instruction is non-power-of-2. In this case need to adjust the cost (calculate the cost only for unique scalar instructions) and cost of the extracts. Also, when scheduling the bundle need to schedule only unique scalars to avoid compiler crash because of the multiple dependencies. Can be safely applied only if all scalars's users are also vectorized and do not require memory accesses (this one is a temporarily requirement, can be relaxed later). --------- Co-authored-by: Alexey Bataev <a.bataev@outlook.com>	2023-09-08 10:00:46 -04:00
Alexey Bataev	8d933ea5ac	[SLP][NFC]Use SmallDensetSet for lookup instead of ArrayRef, NFC.	2023-09-06 13:17:30 -07:00
Florian Hahn	785e7063b9	[VPlan] Don't rely on underlying instr in VPWidenRecipe (NFCI). VPWidenRecipe only needs the opcode to widen, all other information (flags, debug loc and operands) is already modeled directly via the recipe. This removes the remaining uses of the underlying instruction from VPWidenRecipe::execute.	2023-09-06 16:27:09 +01:00
Alexey Bataev	09b8bbd6e0	[SLP][NFC]Reorder indeces instead of real values, NFC. May save some memory/compile time.	2023-09-05 08:48:52 -07:00
Florian Hahn	165e24aa2a	[VPlan] Move DebugLoc to VPRecipeBase (NFCI). Add a dedicated debug location to VPRecipeBase to remove another unneeded use of the underlying LLVM IR instruction and also consolidate various DL fields in sub classes. Each recipe can have debug location and it shouldn't rely on reference to the underlying LLVM IR instructions to retain it. See various recipes that had separate DL fields already.	2023-09-05 15:45:16 +01:00
Florian Hahn	168e23c741	[VPlan] Remove reference to Instr when setting debug loc. (NFCI) This allows untangling references to underlying IR for various recipes.	2023-09-05 10:59:13 +01:00
Mel Chen	26aed5b9a8	[VPlan][LoopUtils] Remove unused parameter TTI This patch removes the member TTI from VPReductionRecipe, as the generation of reduction operations no longer requires TTI. Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D158148	2023-09-04 05:30:37 -07:00
Florian Hahn	3fa1b254b7	[VPlan] Print blend recipe as operand directly, instead of IR PHI. Update VPBlendRecipe::print() to print the result directly, instead of relying on the stored Phi pointer. This brings the recipe in line with how other recipes are printed.	2023-09-04 12:35:58 +01:00
Florian Hahn	19d286bca0	[VPlan] Assert that inst isnt' a debug or pseudo inst (NFCI). Debug and pseudo instructions aren't modeled in VPlan. Turn a check into an assertion. This will help removing the direct use of Inst here in the future.	2023-09-03 21:31:31 +01:00
Nuno Lopes	5a3fd5f3f5	[LoopVectorizer] Fix PR #65212 : vectorization of reduction loop wasn't respecting original store alignment	2023-09-03 16:35:05 +01:00
Florian Hahn	fd66195777	[VPlan] Manage compare predicates in VPRecipeWithIRFlags. Extend VPRecipeWithIRFlags to also manage predicates for compares. This allows removing the custom ICmpULE opcode from VPInstruction which was a workaround for missing proper predicate handling. This simplifies the code a bit while also allowing compares with any predicates. It also fixes a case where the compare predixcate wasn't printed properly for VPReplicateRecipes. Discussed/split off from D150398. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D158992	2023-09-02 21:45:24 +01:00
Kazu Hirata	6da470d7f8	[llvm] Use range-based for loops (NFC)	2023-09-02 09:32:45 -07:00
Fangrui Song	111fcb0df0	[llvm] Fix duplicate word typos. NFC Those fixes were taken from https://reviews.llvm.org/D137338	2023-09-01 18:25:16 -07:00
Igor Kirillov	ac65fb8699	[LoopVectorize] Fix incorrect order of invariant stores when there are multiple reductions. When a loop has multiple reductions, each with an intermediate invariant store, the order in which those reductions are processed is not considered. This can result in the invariant stores outside the loop not preserving the original order. This patch sorts VPReductionPHIRecipes by the order in which they have stores in the original loop before running `InnerLoopVectorizer::fixReduction` function, and it helps to maintain the correct order of stores. Fixes https://github.com/llvm/llvm-project/issues/64047 Differential Revision: https://reviews.llvm.org/D157631	2023-08-31 16:21:44 +00:00
Philip Reames	aada8f2e54	[slp] Tweak debug costing output to include VL This makes it much easier to understand which vector length is being considered when the same set of nodes are evaluated at multiple vector lengths.	2023-08-30 09:13:19 -07:00
Florian Hahn	e544d9cc36	[VPlan] Remove unused VPBuilder::insert member (NFC).	2023-08-30 16:35:55 +01:00
Florian Hahn	cd9563ae17	[VPlan] Remove unused VPInstruction::clone member (NFC).	2023-08-30 15:53:39 +01:00
Florian Hahn	96e83d3705	[LV] Use IRBuilder to create and optimize middle-block compare. Split off from D150398 to avoid builder-related diff changes there. Using IRBuilder to create ICmps simplifies the result if both operands are constants. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D158332	2023-08-29 11:42:18 +01:00
Florian Hahn	32cb8f519e	[VPlan] Generalize variable names for ICmpULE operands (NFC) ICmp codegen for VPInstructionD will be extended for other predicates, and the operands could be any values (not just IV and TC as implied by the names). Suggested cleanup from 150398.	2023-08-28 15:47:04 +01:00
David Sherwood	c02184f286	[LoopVectorize] Allow inner loop runtime checks to be hoisted above an outer loop Suppose we have a nested loop like this: void foo(int32_t dst, int32_t src, int m, int n) { for (int i = 0; i < m; i++) { for (int j = 0; j < n; j++) { dst[(i * n) + j] += src[(i * n) + j]; } } } We currently generate runtime memory checks as a precondition for entering the vectorised version of the inner loop. However, if the runtime-determined trip count for the inner loop is quite small then the cost of these checks becomes quite expensive. This patch attempts to mitigate these costs by adding a new option to expand the memory ranges being checked to include the outer loop as well. This leads to runtime checks that can then be hoisted above the outer loop. For example, rather than looking for a conflict between the memory ranges: 1. &dst[(i * n)] -> &dst[(i * n) + n] 2. &src[(i * n)] -> &src[(i * n) + n] we can instead look at the expanded ranges: 1. &dst[0] -> &dst[((m - 1) * n) + n] 2. &src[0] -> &src[((m - 1) * n) + n] which are outer-loop-invariant. As with many optimisations there is a trade-off here, because there is a danger that using the expanded ranges we may never enter the vectorised inner loop, whereas with the smaller ranges we might enter at least once. I have added a HoistRuntimeChecks option that is turned off by default, but can be enabled for workloads where we know this is guaranteed to be of real benefit. In future, we can also use PGO to determine if this is worthwhile by using the inner loop trip count information. When enabling this option for SPEC2017 on neoverse-v1 with the flags "-Ofast -mcpu=native -flto" I see an overall geomean improvement of ~0.5%: SPEC2017 results (+ is an improvement, - is a regression): 520.omnetpp: +2% 525.x264: +2% 557.xz: +1.2% ... GEOMEAN: +0.5% I didn't investigate all the differences to see if they are genuine or noise, but I know the x264 improvement is real because it has some hot nested loops with low trip counts where I can see this hoisting is beneficial. Tests have been added here: Transforms/LoopVectorize/runtime-checks-hoist.ll Differential Revision: https://reviews.llvm.org/D152366	2023-08-24 12:14:02 +00:00
Alexey Bataev	66c623bfc6	[SLP][NFC]Use TreeEntry::getOprand instead of trying to rebuild it in getOperandInfo(), NFC.	2023-08-23 13:37:36 -07:00
Florian Hahn	26bb2da28b	[VPlan] Proactively create mask for tail-folding up-front (NFCI). Split off mask creation for tail folding and proactively create the mask for the header block. This simplifies createBlockInMask. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D157037	2023-08-23 21:36:16 +01:00
Ben Shi	ad35d916cd	[VectorCombine] Enable transform 'foldSingleElementStore' for scalable vector types The transform 'foldSingleElementStore' can be applied to scalable vector types if the index is less than the minimum number of elements. Reviewed By: dmgreen, nikic Differential Revision: https://reviews.llvm.org/D157676	2023-08-23 17:12:36 +08:00
Florian Hahn	34d25924c4	[VPlan] Mark some VPInstruction opcodes as not having side effects. Mark some VPInstruction opcodes as not having side effects, preparation for D157037.	2023-08-22 20:05:57 +01:00
Alexey Bataev	a9e6295548	[SLP][NFC]Use all_of/any_of instead of loops, NFC.	2023-08-22 08:21:36 -07:00
Alexey Bataev	b51195dece	[SLP]Fix PR63854: Add proper sorting of pointers for masked stores. If the masked gathers can be reordered, it may produce strided access pattern and the reordering does not affect common reodering, better to try to reorder masked gathers for better performance. Differential Revision: https://reviews.llvm.org/D157009	2023-08-22 06:14:01 -07:00
Kolya Panchenko	acbe886880	[LV] Vectorization remark for outerloop Reviewed By: fhahn, ABataev Differential Revision: https://reviews.llvm.org/D150696	2023-08-21 13:05:06 -04:00
Florian Hahn	57a6f6579c	[LV] Simplify condition for induction recipe insertion (NFCI). Split off independent suggestion from D157037. This simplifies the condition to decide if a recipe needs to be inserted to the header phi section or simply appended. The assertion has been updated to allow cases where the first non-phi recipe is the end of the block, in which case inserting before this point is equivalent to appending.	2023-08-21 15:58:07 +01:00
Florian Hahn	c34d049706	[LV] Re-use existing NewInsertionPoint variable for insertion (NFCI). Split off independent suggestion from D157037.	2023-08-21 15:21:29 +01:00
Florian Hahn	56f5738d85	[LV] Move induction ::execute impls to VPlanRecipes.cpp (NFC). All dependencies on code from LoopVectorize.cpp have been removed/refactored. Move the ::execute implementations to other recipe definitions in VPlanRecipes.cpp	2023-08-20 21:00:05 +01:00
Craig Topper	46eded75cd	[LoopVectorize] Replace dyn_cast with isa to suppress an unused variable warning. NFC	2023-08-19 14:41:00 -07:00
Florian Hahn	622b611f23	[VPlan] Inline buildScalarSteps in single user (NFC). Other users have been refactored, remove the uneeded function.	2023-08-19 17:02:31 +01:00
Florian Hahn	ada2a455fc	[VPlan] Use VPBasicBlock to get incoming block for exit phi fixup (NFC) Retrieve block via VPlan infrastructure as suggested as independent cleanup in D150398.	2023-08-17 18:17:45 +01:00
Florian Hahn	9ee4a740e3	[LV] Remove unused MiddleVPBB argument from addUsersInExitBlock (NFC). The argument is no longer used, remove it.	2023-08-17 10:36:12 +01:00
Mel Chen	463e7cb892	[LV][VPlan] Refactor VPReductionRecipe to use reference for member RdxDesc This commit refactors the implementation of VPReductionRecipe to use reference instead of pointer for member RdxDesc. Because the member RdxDesc in VPReductionRecipe should not be a nullptr, using a reference will provide clearer semantics. Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D158058	2023-08-16 19:37:49 -07:00
Alexey Bataev	ca2eabdb52	[SLP][NFC]Improve code to meet coding standards, NFC.	2023-08-15 11:08:25 -07:00
Alexey Bataev	63c7815faf	[SLP]Fix comparator for PHI nodes comparison. Fixed comparator for PHI nodes sorting to meet the criteria for strict weak ordering.	2023-08-14 14:05:39 -07:00
Florian Hahn	00bc500830	[VPlan] Store FPBinOp directly in VPDerivedIVRecipe (NFCI). Address post-commit simplification suggestion for 8a56179bcd8c: Store operator only for floating point inductions (i.e. the binary op is a FPMathOperator).	2023-08-14 21:45:19 +01:00

1 2 3 4 5 ...

3961 Commits