llvm-project

Author	SHA1	Message	Date
David Sherwood	c02184f286	[LoopVectorize] Allow inner loop runtime checks to be hoisted above an outer loop Suppose we have a nested loop like this: void foo(int32_t dst, int32_t src, int m, int n) { for (int i = 0; i < m; i++) { for (int j = 0; j < n; j++) { dst[(i * n) + j] += src[(i * n) + j]; } } } We currently generate runtime memory checks as a precondition for entering the vectorised version of the inner loop. However, if the runtime-determined trip count for the inner loop is quite small then the cost of these checks becomes quite expensive. This patch attempts to mitigate these costs by adding a new option to expand the memory ranges being checked to include the outer loop as well. This leads to runtime checks that can then be hoisted above the outer loop. For example, rather than looking for a conflict between the memory ranges: 1. &dst[(i * n)] -> &dst[(i * n) + n] 2. &src[(i * n)] -> &src[(i * n) + n] we can instead look at the expanded ranges: 1. &dst[0] -> &dst[((m - 1) * n) + n] 2. &src[0] -> &src[((m - 1) * n) + n] which are outer-loop-invariant. As with many optimisations there is a trade-off here, because there is a danger that using the expanded ranges we may never enter the vectorised inner loop, whereas with the smaller ranges we might enter at least once. I have added a HoistRuntimeChecks option that is turned off by default, but can be enabled for workloads where we know this is guaranteed to be of real benefit. In future, we can also use PGO to determine if this is worthwhile by using the inner loop trip count information. When enabling this option for SPEC2017 on neoverse-v1 with the flags "-Ofast -mcpu=native -flto" I see an overall geomean improvement of ~0.5%: SPEC2017 results (+ is an improvement, - is a regression): 520.omnetpp: +2% 525.x264: +2% 557.xz: +1.2% ... GEOMEAN: +0.5% I didn't investigate all the differences to see if they are genuine or noise, but I know the x264 improvement is real because it has some hot nested loops with low trip counts where I can see this hoisting is beneficial. Tests have been added here: Transforms/LoopVectorize/runtime-checks-hoist.ll Differential Revision: https://reviews.llvm.org/D152366	2023-08-24 12:14:02 +00:00
Florian Hahn	26bb2da28b	[VPlan] Proactively create mask for tail-folding up-front (NFCI). Split off mask creation for tail folding and proactively create the mask for the header block. This simplifies createBlockInMask. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D157037	2023-08-23 21:36:16 +01:00
Kolya Panchenko	acbe886880	[LV] Vectorization remark for outerloop Reviewed By: fhahn, ABataev Differential Revision: https://reviews.llvm.org/D150696	2023-08-21 13:05:06 -04:00
Florian Hahn	57a6f6579c	[LV] Simplify condition for induction recipe insertion (NFCI). Split off independent suggestion from D157037. This simplifies the condition to decide if a recipe needs to be inserted to the header phi section or simply appended. The assertion has been updated to allow cases where the first non-phi recipe is the end of the block, in which case inserting before this point is equivalent to appending.	2023-08-21 15:58:07 +01:00
Florian Hahn	c34d049706	[LV] Re-use existing NewInsertionPoint variable for insertion (NFCI). Split off independent suggestion from D157037.	2023-08-21 15:21:29 +01:00
Florian Hahn	56f5738d85	[LV] Move induction ::execute impls to VPlanRecipes.cpp (NFC). All dependencies on code from LoopVectorize.cpp have been removed/refactored. Move the ::execute implementations to other recipe definitions in VPlanRecipes.cpp	2023-08-20 21:00:05 +01:00
Craig Topper	46eded75cd	[LoopVectorize] Replace dyn_cast with isa to suppress an unused variable warning. NFC	2023-08-19 14:41:00 -07:00
Florian Hahn	622b611f23	[VPlan] Inline buildScalarSteps in single user (NFC). Other users have been refactored, remove the uneeded function.	2023-08-19 17:02:31 +01:00
Florian Hahn	9ee4a740e3	[LV] Remove unused MiddleVPBB argument from addUsersInExitBlock (NFC). The argument is no longer used, remove it.	2023-08-17 10:36:12 +01:00
Mel Chen	463e7cb892	[LV][VPlan] Refactor VPReductionRecipe to use reference for member RdxDesc This commit refactors the implementation of VPReductionRecipe to use reference instead of pointer for member RdxDesc. Because the member RdxDesc in VPReductionRecipe should not be a nullptr, using a reference will provide clearer semantics. Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D158058	2023-08-16 19:37:49 -07:00
Florian Hahn	00bc500830	[VPlan] Store FPBinOp directly in VPDerivedIVRecipe (NFCI). Address post-commit simplification suggestion for 8a56179bcd8c: Store operator only for floating point inductions (i.e. the binary op is a FPMathOperator).	2023-08-14 21:45:19 +01:00
Florian Hahn	aacaf3d580	[VPlan] Simplify VPDerivedIV truncation handling (NFCI). Address post-commit simplification suggestion for 8a56179bcd8c: Replace IsTruncated by conditionally setting TruncResultTy only if truncation is required.	2023-08-14 17:33:10 +01:00
Florian Hahn	d32e68ae53	[docs] Graduate VectorizationPlan.rst from proposal. VPlan has become an integral part of the inner loop vectorizer pipeline that has been actively developed over the previous years. Let's move VectorizationPlan.rst from the proposal stage to bring the docs in line and to avoid confusion when reading the docs. Reviewed By: rengolin Differential Revision: https://reviews.llvm.org/D157593	2023-08-10 17:15:43 +01:00
Bjorn Pettersson	e53b28c833	[llvm] Drop some bitcasts and references related to typed pointers Differential Revision: https://reviews.llvm.org/D157551	2023-08-10 15:07:07 +02:00
Florian Hahn	8a56179bcd	[VPlan] Store induction kind & binop directly in VPDerviedIVRecipe(NFC) Limit the information stored in VPDerivedIVRecipe to the ingredients really needed.	2023-08-10 10:57:32 +01:00
Florian Hahn	e6d5dcf84c	[LV] Pass kind and induction binop to emitTransformedIndex (NFC). Explicitly pass InductionKind and InductionBinOp to emitTransformedIndex. Only those values are needed from the induction descriptor. This makes explicit what is needed for the function and allows future use cases where the a full induction descriptor object is not available.	2023-08-10 10:35:42 +01:00
Florian Hahn	698ae66092	[VPlan] Replace FMF in VPInstruction with VPRecipeWithIRFlags (NFC). Update VPInstruction to use VPRecipeWithIRFlags to manage FMFs for VPInstruction. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D157144	2023-08-08 20:13:11 +01:00
Florian Hahn	af635a5547	[VPlan] Model wrap flags directly, remove NUW opcodes (NFC) Model wrap flags directly using VPRecipeWithIRFlags and clean up the duplicated NUW opcodes. D157144 will build on this and also model FMFs for VPInstruction. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D157194	2023-08-08 12:12:30 +01:00
Florian Hahn	aac8acb115	[VPlan] Model masked assumes as replicate recipes, drop them (NFCI). Replace ConditionalAssume set by treating conditional assumes like other predicated instructions (i.e. create a VPReplicateRecipe with a mask) and later remove any assume recipes with masks during VPlan cleanup. This reduces coupling of VPlan construction and Legal by removing a shared set between the 2 and results in a cleaner code structure overall. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D157034	2023-08-06 20:56:24 +01:00
Florian Hahn	a6d6730709	[LV] Split off code to optimize initial VPlan (NFC). Split up tryToBuildVPlanWithVPRecipes into intial plan creation and optimizations, by introducing a VPLanTransform::optimize helper. Depends on D154640. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D154644	2023-08-04 13:21:20 +01:00
Florian Hahn	39cf210450	[LV] Remove unnecessary std::move from tryToBuildVPlanWith.. (NFC). Split off D154644.	2023-08-04 11:56:05 +01:00
Florian Hahn	c30099ef0b	[LV] Return null VPlanPtr instead of std::optional for tryToBuild (NFC) Cleanup in preparation for D154644. This was suggested earlier and helps to simplify the code with D154644.	2023-08-04 11:48:24 +01:00
Florian Hahn	deec9e7674	[VPlan] Move VPTransformState::get() to VPlan.cpp (NFC). The last dependency of code defined in LoopVectorize.cpp has been removed a while ago. Move VPTransformState::get() to VPlan.cpp where other members are also defined.	2023-08-03 21:49:58 +01:00
Mel Chen	425e9e81a0	[LV] Rename the Select[I\|F]Cmp reduction pattern to [I\|F]AnyOf. (NFC) Regarding this NFC change, please refer to the discussion in this thread. https://reviews.llvm.org/D150851#4467261 Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D155786	2023-08-03 00:37:19 -07:00
Mel Chen	97cccdd9f3	[LV][NFC] Remove the redundant braces.	2023-08-02 20:45:04 -07:00
Florian Hahn	8ea274b46b	[VPlan] Fix in-loop reduction chains using VPlan def-use chains (NFCI) Update adjustRecipesForReductions to directly use the VPlan def-use chains for in-loop reductions to collect the reduction operations that need adjusting. This allows the removal of * ReductionChainMap * recording of recipes for instruction in the reduction chain * removes late uses of getVPValue * removes to need for removeVPValueFor. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D155845	2023-08-02 17:04:29 +01:00
Florian Hahn	d1d0e135a1	[LV] Move packScalarIntoVectorValue to VPTransformState (NFC). This moves packScalarIntoVectorValue from ILV to the more approriate VPTransformState.	2023-08-02 12:36:48 +01:00
Bjorn Pettersson	408cc94445	[LV][LSV][SLP] Drop some typed pointer bitcasts Differential Revision: https://reviews.llvm.org/D156736	2023-08-02 12:08:37 +02:00
Florian Hahn	707359ecf5	Recommit "[LV] Re-use existing broadcast value for live-ins." This reverts commit 245ec675a4e41f7ec24dfc998720bffdc46a6c53. Recommits eea9258648ce with a fix to only erase the instruction from the first part if it is defined outside the loop. This fixes a use-after-free error reported.	2023-08-01 15:54:02 +01:00
Florian Hahn	822c749aec	[LV] Shrink operands before creating new instr to force eval order. Shrink operands before creating the new instruction to make sure the same evaluation order is used on all platforms. This fixes buildbot failures due to different argument evaluation order on different systems.	2023-07-30 17:16:37 +01:00
Martin Storsjö	245ec675a4	Revert "[LV] Re-use existing broadcast value for live-ins." This reverts commit eea9258648ce73507f6f85c395de978af659d498. That commit triggered crashes in the following testcase: $ cat reduced.c typedef struct { int a[8] } b; typedef struct { b c; short d } e; void f() { int g; char h; e i = f; short j = i->d; int a = i->c->a[0]; for (;;) for (; g < a; g++) { h = j * i->d >> 8; h++; } } $ clang -target aarch64-linux-gnu -w -c -O2 reduced.c	2023-07-25 10:35:41 +03:00
Florian Hahn	eea9258648	[LV] Re-use existing broadcast value for live-ins. When requesting a vector value for a live-in, we can re-use the broadcast of the live-in of part 0 for parts > 0.	2023-07-24 11:50:47 +01:00
Florian Hahn	25d34215bb	[LV] Replace use of getMaxSafeDepDist with isSafeForAnyVector (NFC) Replace the use of getMaxSafeDepDistBytes with the more direct isSafeForAnyVector. This removes the need to define getMaxSafeDepDistBytes.	2023-07-21 22:05:50 +02:00
Florian Hahn	68746a8cea	[LV] Move all VPlan transforms after initial VPlan construction. Reorder VPlan transforms slightly so they are all grouped together, after disabling Value -> VPValue lookup. In terms of codegen impact, this should be NFC modulo a small number of instruction reorderings. Preparation to split up tryToBuildVPlanWithVPRecipes in a follow-up. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D154640	2023-07-18 10:53:30 +01:00
Nikita Popov	94abecca6b	[IVDescriptors] Remove typed pointer support (NFC) This also removes the element type from the descriptor, as it is always i8. The meaning of the step is now the same between integers and pointers.	2023-07-12 15:48:29 +02:00
Florian Hahn	9259f41e62	[VPlan] Clear reduction flags directly as VPlanTransform. After D150027, all relevant recipes should model their IR flags directly. Instead of removing the flags after codegen as part of fixReductions, drop poison generating flags directly from the recipes. Depends on D150027. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D150028	2023-07-09 21:11:51 +01:00
Florian Hahn	14ec3f4b06	[LV] Skip VFs > # iterations remaining for epilogue vectorization. If a candidate VF for epilogue vectorization is greater than the number of remaining iterations, the epilogue loop would be dead. Skip such factors. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D154264	2023-07-07 21:43:51 +01:00
Florian Hahn	aee851fd0e	Revert "[LV] Skip VFs < iterations remaining for epilogue vectorization." This reverts commit 7cc0be01a0068946ea3613dc2cb45c81b0f45860. The title of the commit is incorrect, revert to fix the commit message.	2023-07-07 21:41:24 +01:00
Florian Hahn	7cc0be01a0	[LV] Skip VFs < iterations remaining for epilogue vectorization. If a candidate VF for epilogue vectorization is less than the number of remaining iterations, the epilogue loop would be dead. Skip such factors. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D154264	2023-07-07 20:33:42 +01:00
Florian Hahn	a0fcf84a8c	[LV] Consider if scalar epilogue is required in getMaximizedVFForTarget. When a scalar epilogue is required, at least one iteration of the scalar loop has to execute. Adjust ConstTripCount accordingly to avoid picking a max VF that results in a dead vector loop. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D154261	2023-07-06 13:35:35 +01:00
Florian Hahn	1746ac42ca	[LV] Forget SCEVs for exit phis after vectorization. After vectorization, the exit blocks of the original loop will have additional predecessors. Invalidate SCEVs for the exit phis in case SE looked through single-entry phis. Fixes https://github.com/llvm/llvm-project/issues/63368 Fixes https://github.com/llvm/llvm-project/issues/63669	2023-07-04 21:28:03 +01:00
Florian Hahn	39385c521d	[LV] Move getBroadcastInstr to VPTransformState.::get (NFCI). getBroadcastInstrs is only used in VPTransformState::get. Move it closer to use to reduce unnecessary interaction with ILV object.	2023-07-04 11:24:11 +01:00
Florian Hahn	b4efc0f070	[LV] Break up condition in selectEpilogueVectorizationFactor loop (NFCI) Restructure the loop as suggested in D154264 to increase readability and make it easier to extend.	2023-07-03 22:39:40 +01:00
Florian Hahn	55e7f1f786	[LV] Pass bool to requiresScalarEpilogue (NFC). requiresScalarEpilogue only checks if the selected VF is vectorizing (and not scalar). Update it to just take a boolean, to make it clearer what information is used and to allow callers without a VF (used in a follow-up patch).	2023-06-30 22:08:27 +01:00
Igor Kirillov	17bde328d6	[LV] Add mask support for vectorizing interleaved groups This patch extends LoopVectorize to handle the vectorization of interleaved memory accesses with scalable vectors when mask is required or/and predicated tail folding is enabled. Differential Revision: https://reviews.llvm.org/D152258	2023-06-29 17:50:56 +00:00
Florian Hahn	ea6ca9cb2b	[LV] Fix crash when stride isn't a constant. In same cases, the stride may not be a constant. Just skip those cases for now. This should only happen for cases where LV interleaves only, if it is vectorized the stride needs to be versioned to a constant.	2023-06-14 16:53:34 +01:00
Florian Hahn	d209084720	[VPlan] Replace versioned stride with constant during VPlan opts. After constructing the initial VPlan, replace VPValues for versioned strides with their constant counterparts. Differential Revision: https://reviews.llvm.org/D147783	2023-06-13 08:26:55 +01:00
Graham Hunter	95bfb1902d	[LV][AArch64] Allow (limited) interleaving for scalable vectors This patch uses the (de)interleaving intrinsics introduced in D141924 to handle vectorization of interleaving groups with a factor of 2 for scalable vectors. Reviewed By: fhahn, reames Differential Revision: https://reviews.llvm.org/D145163	2023-06-09 11:42:10 +01:00
Nikita Popov	143ed21b26	Revert "[LCSSA] Remove unused ScalarEvolution argument (NFC)" This reverts commit 5362a0d859d8e96b3f7c0437b7866e17a818a4f7. In preparation for reverting a dependent revision.	2023-06-05 16:45:38 +02:00
Florian Hahn	e19297471a	[LV] Check if value was already not uniform for previous VF. If the value was already known to not be uniform for the previous (smaller VF), it cannot be uniform for the larger VF. This slightly reduces compile-time, once uniformity checks are becoming a bit more expensive due to using SCEV rewriting (D148841). Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D151658	2023-06-04 20:31:01 +01:00

1 2 3 4 5 ...

1944 Commits