llvm-project

Author	SHA1	Message	Date
Alexey Bataev	8d933ea5ac	[SLP][NFC]Use SmallDensetSet for lookup instead of ArrayRef, NFC.	2023-09-06 13:17:30 -07:00
Florian Hahn	785e7063b9	[VPlan] Don't rely on underlying instr in VPWidenRecipe (NFCI). VPWidenRecipe only needs the opcode to widen, all other information (flags, debug loc and operands) is already modeled directly via the recipe. This removes the remaining uses of the underlying instruction from VPWidenRecipe::execute.	2023-09-06 16:27:09 +01:00
Alexey Bataev	09b8bbd6e0	[SLP][NFC]Reorder indeces instead of real values, NFC. May save some memory/compile time.	2023-09-05 08:48:52 -07:00
Florian Hahn	165e24aa2a	[VPlan] Move DebugLoc to VPRecipeBase (NFCI). Add a dedicated debug location to VPRecipeBase to remove another unneeded use of the underlying LLVM IR instruction and also consolidate various DL fields in sub classes. Each recipe can have debug location and it shouldn't rely on reference to the underlying LLVM IR instructions to retain it. See various recipes that had separate DL fields already.	2023-09-05 15:45:16 +01:00
Florian Hahn	168e23c741	[VPlan] Remove reference to Instr when setting debug loc. (NFCI) This allows untangling references to underlying IR for various recipes.	2023-09-05 10:59:13 +01:00
Mel Chen	26aed5b9a8	[VPlan][LoopUtils] Remove unused parameter TTI This patch removes the member TTI from VPReductionRecipe, as the generation of reduction operations no longer requires TTI. Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D158148	2023-09-04 05:30:37 -07:00
Florian Hahn	3fa1b254b7	[VPlan] Print blend recipe as operand directly, instead of IR PHI. Update VPBlendRecipe::print() to print the result directly, instead of relying on the stored Phi pointer. This brings the recipe in line with how other recipes are printed.	2023-09-04 12:35:58 +01:00
Florian Hahn	19d286bca0	[VPlan] Assert that inst isnt' a debug or pseudo inst (NFCI). Debug and pseudo instructions aren't modeled in VPlan. Turn a check into an assertion. This will help removing the direct use of Inst here in the future.	2023-09-03 21:31:31 +01:00
Nuno Lopes	5a3fd5f3f5	[LoopVectorizer] Fix PR #65212 : vectorization of reduction loop wasn't respecting original store alignment	2023-09-03 16:35:05 +01:00
Florian Hahn	fd66195777	[VPlan] Manage compare predicates in VPRecipeWithIRFlags. Extend VPRecipeWithIRFlags to also manage predicates for compares. This allows removing the custom ICmpULE opcode from VPInstruction which was a workaround for missing proper predicate handling. This simplifies the code a bit while also allowing compares with any predicates. It also fixes a case where the compare predixcate wasn't printed properly for VPReplicateRecipes. Discussed/split off from D150398. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D158992	2023-09-02 21:45:24 +01:00
Kazu Hirata	6da470d7f8	[llvm] Use range-based for loops (NFC)	2023-09-02 09:32:45 -07:00
Fangrui Song	111fcb0df0	[llvm] Fix duplicate word typos. NFC Those fixes were taken from https://reviews.llvm.org/D137338	2023-09-01 18:25:16 -07:00
Igor Kirillov	ac65fb8699	[LoopVectorize] Fix incorrect order of invariant stores when there are multiple reductions. When a loop has multiple reductions, each with an intermediate invariant store, the order in which those reductions are processed is not considered. This can result in the invariant stores outside the loop not preserving the original order. This patch sorts VPReductionPHIRecipes by the order in which they have stores in the original loop before running `InnerLoopVectorizer::fixReduction` function, and it helps to maintain the correct order of stores. Fixes https://github.com/llvm/llvm-project/issues/64047 Differential Revision: https://reviews.llvm.org/D157631	2023-08-31 16:21:44 +00:00
Philip Reames	aada8f2e54	[slp] Tweak debug costing output to include VL This makes it much easier to understand which vector length is being considered when the same set of nodes are evaluated at multiple vector lengths.	2023-08-30 09:13:19 -07:00
Florian Hahn	e544d9cc36	[VPlan] Remove unused VPBuilder::insert member (NFC).	2023-08-30 16:35:55 +01:00
Florian Hahn	cd9563ae17	[VPlan] Remove unused VPInstruction::clone member (NFC).	2023-08-30 15:53:39 +01:00
Florian Hahn	96e83d3705	[LV] Use IRBuilder to create and optimize middle-block compare. Split off from D150398 to avoid builder-related diff changes there. Using IRBuilder to create ICmps simplifies the result if both operands are constants. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D158332	2023-08-29 11:42:18 +01:00
Florian Hahn	32cb8f519e	[VPlan] Generalize variable names for ICmpULE operands (NFC) ICmp codegen for VPInstructionD will be extended for other predicates, and the operands could be any values (not just IV and TC as implied by the names). Suggested cleanup from 150398.	2023-08-28 15:47:04 +01:00
David Sherwood	c02184f286	[LoopVectorize] Allow inner loop runtime checks to be hoisted above an outer loop Suppose we have a nested loop like this: void foo(int32_t dst, int32_t src, int m, int n) { for (int i = 0; i < m; i++) { for (int j = 0; j < n; j++) { dst[(i * n) + j] += src[(i * n) + j]; } } } We currently generate runtime memory checks as a precondition for entering the vectorised version of the inner loop. However, if the runtime-determined trip count for the inner loop is quite small then the cost of these checks becomes quite expensive. This patch attempts to mitigate these costs by adding a new option to expand the memory ranges being checked to include the outer loop as well. This leads to runtime checks that can then be hoisted above the outer loop. For example, rather than looking for a conflict between the memory ranges: 1. &dst[(i * n)] -> &dst[(i * n) + n] 2. &src[(i * n)] -> &src[(i * n) + n] we can instead look at the expanded ranges: 1. &dst[0] -> &dst[((m - 1) * n) + n] 2. &src[0] -> &src[((m - 1) * n) + n] which are outer-loop-invariant. As with many optimisations there is a trade-off here, because there is a danger that using the expanded ranges we may never enter the vectorised inner loop, whereas with the smaller ranges we might enter at least once. I have added a HoistRuntimeChecks option that is turned off by default, but can be enabled for workloads where we know this is guaranteed to be of real benefit. In future, we can also use PGO to determine if this is worthwhile by using the inner loop trip count information. When enabling this option for SPEC2017 on neoverse-v1 with the flags "-Ofast -mcpu=native -flto" I see an overall geomean improvement of ~0.5%: SPEC2017 results (+ is an improvement, - is a regression): 520.omnetpp: +2% 525.x264: +2% 557.xz: +1.2% ... GEOMEAN: +0.5% I didn't investigate all the differences to see if they are genuine or noise, but I know the x264 improvement is real because it has some hot nested loops with low trip counts where I can see this hoisting is beneficial. Tests have been added here: Transforms/LoopVectorize/runtime-checks-hoist.ll Differential Revision: https://reviews.llvm.org/D152366	2023-08-24 12:14:02 +00:00
Alexey Bataev	66c623bfc6	[SLP][NFC]Use TreeEntry::getOprand instead of trying to rebuild it in getOperandInfo(), NFC.	2023-08-23 13:37:36 -07:00
Florian Hahn	26bb2da28b	[VPlan] Proactively create mask for tail-folding up-front (NFCI). Split off mask creation for tail folding and proactively create the mask for the header block. This simplifies createBlockInMask. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D157037	2023-08-23 21:36:16 +01:00
Ben Shi	ad35d916cd	[VectorCombine] Enable transform 'foldSingleElementStore' for scalable vector types The transform 'foldSingleElementStore' can be applied to scalable vector types if the index is less than the minimum number of elements. Reviewed By: dmgreen, nikic Differential Revision: https://reviews.llvm.org/D157676	2023-08-23 17:12:36 +08:00
Florian Hahn	34d25924c4	[VPlan] Mark some VPInstruction opcodes as not having side effects. Mark some VPInstruction opcodes as not having side effects, preparation for D157037.	2023-08-22 20:05:57 +01:00
Alexey Bataev	a9e6295548	[SLP][NFC]Use all_of/any_of instead of loops, NFC.	2023-08-22 08:21:36 -07:00
Alexey Bataev	b51195dece	[SLP]Fix PR63854: Add proper sorting of pointers for masked stores. If the masked gathers can be reordered, it may produce strided access pattern and the reordering does not affect common reodering, better to try to reorder masked gathers for better performance. Differential Revision: https://reviews.llvm.org/D157009	2023-08-22 06:14:01 -07:00
Kolya Panchenko	acbe886880	[LV] Vectorization remark for outerloop Reviewed By: fhahn, ABataev Differential Revision: https://reviews.llvm.org/D150696	2023-08-21 13:05:06 -04:00
Florian Hahn	57a6f6579c	[LV] Simplify condition for induction recipe insertion (NFCI). Split off independent suggestion from D157037. This simplifies the condition to decide if a recipe needs to be inserted to the header phi section or simply appended. The assertion has been updated to allow cases where the first non-phi recipe is the end of the block, in which case inserting before this point is equivalent to appending.	2023-08-21 15:58:07 +01:00
Florian Hahn	c34d049706	[LV] Re-use existing NewInsertionPoint variable for insertion (NFCI). Split off independent suggestion from D157037.	2023-08-21 15:21:29 +01:00
Florian Hahn	56f5738d85	[LV] Move induction ::execute impls to VPlanRecipes.cpp (NFC). All dependencies on code from LoopVectorize.cpp have been removed/refactored. Move the ::execute implementations to other recipe definitions in VPlanRecipes.cpp	2023-08-20 21:00:05 +01:00
Craig Topper	46eded75cd	[LoopVectorize] Replace dyn_cast with isa to suppress an unused variable warning. NFC	2023-08-19 14:41:00 -07:00
Florian Hahn	622b611f23	[VPlan] Inline buildScalarSteps in single user (NFC). Other users have been refactored, remove the uneeded function.	2023-08-19 17:02:31 +01:00
Florian Hahn	ada2a455fc	[VPlan] Use VPBasicBlock to get incoming block for exit phi fixup (NFC) Retrieve block via VPlan infrastructure as suggested as independent cleanup in D150398.	2023-08-17 18:17:45 +01:00
Florian Hahn	9ee4a740e3	[LV] Remove unused MiddleVPBB argument from addUsersInExitBlock (NFC). The argument is no longer used, remove it.	2023-08-17 10:36:12 +01:00
Mel Chen	463e7cb892	[LV][VPlan] Refactor VPReductionRecipe to use reference for member RdxDesc This commit refactors the implementation of VPReductionRecipe to use reference instead of pointer for member RdxDesc. Because the member RdxDesc in VPReductionRecipe should not be a nullptr, using a reference will provide clearer semantics. Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D158058	2023-08-16 19:37:49 -07:00
Alexey Bataev	ca2eabdb52	[SLP][NFC]Improve code to meet coding standards, NFC.	2023-08-15 11:08:25 -07:00
Alexey Bataev	63c7815faf	[SLP]Fix comparator for PHI nodes comparison. Fixed comparator for PHI nodes sorting to meet the criteria for strict weak ordering.	2023-08-14 14:05:39 -07:00
Florian Hahn	00bc500830	[VPlan] Store FPBinOp directly in VPDerivedIVRecipe (NFCI). Address post-commit simplification suggestion for 8a56179bcd8c: Store operator only for floating point inductions (i.e. the binary op is a FPMathOperator).	2023-08-14 21:45:19 +01:00
Alexey Bataev	4f0bd8f7ac	[SLP]Fix strict weak ordering for Cmp instruction comparator. Sorting algorithms require strict weak ordering for comparators, final fix for cmp instructions comparator.	2023-08-14 09:37:46 -07:00
Florian Hahn	aacaf3d580	[VPlan] Simplify VPDerivedIV truncation handling (NFCI). Address post-commit simplification suggestion for 8a56179bcd8c: Replace IsTruncated by conditionally setting TruncResultTy only if truncation is required.	2023-08-14 17:33:10 +01:00
Florian Hahn	d32e68ae53	[docs] Graduate VectorizationPlan.rst from proposal. VPlan has become an integral part of the inner loop vectorizer pipeline that has been actively developed over the previous years. Let's move VectorizationPlan.rst from the proposal stage to bring the docs in line and to avoid confusion when reading the docs. Reviewed By: rengolin Differential Revision: https://reviews.llvm.org/D157593	2023-08-10 17:15:43 +01:00
Alexey Bataev	2216507171	[SLP]Fix PR64568: Crash during horizontal reduction. If the reduced values is constant-foldable and was folded to a constant during previous transformations, need to excluded it from the list of the reduced values-instructions as non-matchable.	2023-08-10 07:33:16 -07:00
Alexey Bataev	42b3925d42	[SLP][NFC]Fix formatting/warnings in tryToReduce(), NFC.	2023-08-10 06:42:50 -07:00
Bjorn Pettersson	e53b28c833	[llvm] Drop some bitcasts and references related to typed pointers Differential Revision: https://reviews.llvm.org/D157551	2023-08-10 15:07:07 +02:00
Florian Hahn	8a56179bcd	[VPlan] Store induction kind & binop directly in VPDerviedIVRecipe(NFC) Limit the information stored in VPDerivedIVRecipe to the ingredients really needed.	2023-08-10 10:57:32 +01:00
Florian Hahn	e6d5dcf84c	[LV] Pass kind and induction binop to emitTransformedIndex (NFC). Explicitly pass InductionKind and InductionBinOp to emitTransformedIndex. Only those values are needed from the induction descriptor. This makes explicit what is needed for the function and allows future use cases where the a full induction descriptor object is not available.	2023-08-10 10:35:42 +01:00
Valery N Dmitriev	f522be63bc	[SLP][NFC] Make buildShuffleEntryMask routine a TreeEntry method. The routine uses data stored at TreeEntry node for building a mask so it is natural to make it a method for the type. That will simplify its interface and reduces data transfer. The method is added as buildAltOpShuffleMask. Differential Revision: https://reviews.llvm.org/D157545	2023-08-09 13:43:03 -07:00
Alexey Bataev	c619222ea4	[SLP]Use common logic for cost estimation of the alternate vector nodes. We can use buildShuffleEntryMask() to build the shuffle mask correctly not only for the alternate nodes with reuses, but also for the nodes without reused scalars. It allows better to estimate the cost of the node and emit better code. Differential Revision: https://reviews.llvm.org/D157413	2023-08-09 11:50:39 -07:00
Florian Hahn	b223229e2c	[VPlan] Re-use existing step again after 34accad1feae. This fixes a failing RISCV test case that was missed originally.	2023-08-08 21:42:56 +01:00
Florian Hahn	34accad1fe	[VPlan] Move logic to create VPScalarIVStepsRecipe to helper (NFC). This allows for easier re-use in follow-on patches.	2023-08-08 21:25:06 +01:00
Florian Hahn	698ae66092	[VPlan] Replace FMF in VPInstruction with VPRecipeWithIRFlags (NFC). Update VPInstruction to use VPRecipeWithIRFlags to manage FMFs for VPInstruction. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D157144	2023-08-08 20:13:11 +01:00

1 2 3 4 5 ...

3948 Commits