llvm-project

Author	SHA1	Message	Date
Florian Hahn	f7a8a78cb7	[VPlan] Also print operands of canonical IV (NFC). Also print the operands of VPCanonicalIVPHIRecipe. That was missed earlier.	2023-10-16 20:28:23 +01:00
Florian Hahn	d9f83169d1	[VPlan] Ensure start value of phis is the first op at construction (NFC) Header phi recipes have the start value (incoming from outside the loop) as first operand. This wasn't the case for VPWidenPHIRecipes. Instead the start value was picked during execute() by doing extra work. To be in line with other recipes, ensure the operand order is as expected during construction.	2023-09-22 21:24:15 +01:00
Florian Hahn	f23246a0bb	[LV] Directly add fast-math flags to select recipe (NFC). Now that VPInstruction can manage fast math flags via VPRecipeWithIRFlags, use them directly to model the fast-math flags of the select created for the final reduction value instead of adding them late.	2023-09-21 11:05:55 +01:00
Florian Hahn	1d1cba44ea	[VPlan] Remove stray indent when printing scalar steps recipe. VPScalarIVStepsRecipe will now be printed as vp<%6> = SCALAR-STEPS vp<%3>, ir<1> instead of vp<%6> = SCALAR-STEPS vp<%3>, ir<1>	2023-09-17 10:15:52 +01:00
Jeremy Morse	6942c64e81	[NFC][RemoveDIs] Prefer iterator-insertion over instructions Continuing the patch series to get rid of debug intrinsics [0], instruction insertion needs to be done with iterators rather than instruction pointers, so that we can communicate information in the iterator class. This patch adds an iterator-taking insertBefore method and converts various call sites to take iterators. These are all sites where such debug-info needs to be preserved so that a stage2 clang can be built identically; it's likely that many more will need to be changed in the future. At this stage, this is just changing the spelling of a few operations, which will eventually become signifiant once the debug-info bearing iterator is used. [0] https://discourse.llvm.org/t/rfc-instruction-api-changes-needed-to-eliminate-debug-intrinsics-from-ir/68939 Differential Revision: https://reviews.llvm.org/D152537	2023-09-11 11:48:45 +01:00
Florian Hahn	3e2d564c3d	[VPlan] Use VPRecipeWithFlags for VPScalarIVStepsRecipe (NFC). This directly models the flags as part of the recipe, which allows dropping them using the VPlan infrastructure when required. It also allows removing the full reference to InductionDescriptor and limit it to only the opcode.	2023-09-08 15:46:12 +01:00
Florian Hahn	785e7063b9	[VPlan] Don't rely on underlying instr in VPWidenRecipe (NFCI). VPWidenRecipe only needs the opcode to widen, all other information (flags, debug loc and operands) is already modeled directly via the recipe. This removes the remaining uses of the underlying instruction from VPWidenRecipe::execute.	2023-09-06 16:27:09 +01:00
Florian Hahn	165e24aa2a	[VPlan] Move DebugLoc to VPRecipeBase (NFCI). Add a dedicated debug location to VPRecipeBase to remove another unneeded use of the underlying LLVM IR instruction and also consolidate various DL fields in sub classes. Each recipe can have debug location and it shouldn't rely on reference to the underlying LLVM IR instructions to retain it. See various recipes that had separate DL fields already.	2023-09-05 15:45:16 +01:00
Florian Hahn	168e23c741	[VPlan] Remove reference to Instr when setting debug loc. (NFCI) This allows untangling references to underlying IR for various recipes.	2023-09-05 10:59:13 +01:00
Florian Hahn	3fa1b254b7	[VPlan] Print blend recipe as operand directly, instead of IR PHI. Update VPBlendRecipe::print() to print the result directly, instead of relying on the stored Phi pointer. This brings the recipe in line with how other recipes are printed.	2023-09-04 12:35:58 +01:00
Florian Hahn	fd66195777	[VPlan] Manage compare predicates in VPRecipeWithIRFlags. Extend VPRecipeWithIRFlags to also manage predicates for compares. This allows removing the custom ICmpULE opcode from VPInstruction which was a workaround for missing proper predicate handling. This simplifies the code a bit while also allowing compares with any predicates. It also fixes a case where the compare predixcate wasn't printed properly for VPReplicateRecipes. Discussed/split off from D150398. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D158992	2023-09-02 21:45:24 +01:00
Florian Hahn	32cb8f519e	[VPlan] Generalize variable names for ICmpULE operands (NFC) ICmp codegen for VPInstructionD will be extended for other predicates, and the operands could be any values (not just IV and TC as implied by the names). Suggested cleanup from 150398.	2023-08-28 15:47:04 +01:00
Florian Hahn	34d25924c4	[VPlan] Mark some VPInstruction opcodes as not having side effects. Mark some VPInstruction opcodes as not having side effects, preparation for D157037.	2023-08-22 20:05:57 +01:00
Florian Hahn	56f5738d85	[LV] Move induction ::execute impls to VPlanRecipes.cpp (NFC). All dependencies on code from LoopVectorize.cpp have been removed/refactored. Move the ::execute implementations to other recipe definitions in VPlanRecipes.cpp	2023-08-20 21:00:05 +01:00
Florian Hahn	ada2a455fc	[VPlan] Use VPBasicBlock to get incoming block for exit phi fixup (NFC) Retrieve block via VPlan infrastructure as suggested as independent cleanup in D150398.	2023-08-17 18:17:45 +01:00
Mel Chen	463e7cb892	[LV][VPlan] Refactor VPReductionRecipe to use reference for member RdxDesc This commit refactors the implementation of VPReductionRecipe to use reference instead of pointer for member RdxDesc. Because the member RdxDesc in VPReductionRecipe should not be a nullptr, using a reference will provide clearer semantics. Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D158058	2023-08-16 19:37:49 -07:00
Florian Hahn	aacaf3d580	[VPlan] Simplify VPDerivedIV truncation handling (NFCI). Address post-commit simplification suggestion for 8a56179bcd8c: Replace IsTruncated by conditionally setting TruncResultTy only if truncation is required.	2023-08-14 17:33:10 +01:00
Florian Hahn	8a56179bcd	[VPlan] Store induction kind & binop directly in VPDerviedIVRecipe(NFC) Limit the information stored in VPDerivedIVRecipe to the ingredients really needed.	2023-08-10 10:57:32 +01:00
Florian Hahn	698ae66092	[VPlan] Replace FMF in VPInstruction with VPRecipeWithIRFlags (NFC). Update VPInstruction to use VPRecipeWithIRFlags to manage FMFs for VPInstruction. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D157144	2023-08-08 20:13:11 +01:00
Florian Hahn	b6d994de0f	[VPlan] Address post-commit suggestions for af635a554 (NFC).	2023-08-08 12:59:34 +01:00
Florian Hahn	af635a5547	[VPlan] Model wrap flags directly, remove NUW opcodes (NFC) Model wrap flags directly using VPRecipeWithIRFlags and clean up the duplicated NUW opcodes. D157144 will build on this and also model FMFs for VPInstruction. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D157194	2023-08-08 12:12:30 +01:00
Florian Hahn	93c5bae00e	[VPlan] Use printOperands for VPInstruction. Use the printOperands for printing VPInstruction's operands to be more in line with other recipes and ensure consistent printing after D15719. Also removes some stray spaces in print output.	2023-08-08 11:31:21 +01:00
Florian Hahn	0b17e9d285	[VPlan] Move VPRecipeWithIRFlags::getFastMathFlags. (NFCI) Split off suggested refactoring from D157144. Also adds a assert to make sure this is only used when OpType is FPMathOp.	2023-08-07 12:35:53 +01:00
Mel Chen	425e9e81a0	[LV] Rename the Select[I\|F]Cmp reduction pattern to [I\|F]AnyOf. (NFC) Regarding this NFC change, please refer to the discussion in this thread. https://reviews.llvm.org/D150851#4467261 Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D155786	2023-08-03 00:37:19 -07:00
Florian Hahn	2265bb064b	[LV] Update generateInstruction to return produced value (NFC). Update generateInstruction to return the produced value instead of setting it for each opcode. This reduces the amount of duplicated code and is a preparation for D153696. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D154240	2023-07-05 19:53:59 +01:00
Florian Hahn	0a246a0c72	[LV] Use VPValues when creating GEP with all invariant indices. Update VPWidenGEPRecipe::execute to use the VPValue operands of the recipe when creating the GEP instruction. Fixes #63340.	2023-06-16 16:14:01 +01:00
Florian Hahn	8f781b96e2	Revert "[VPlan] Mark recurrence recipes as not having side-effects." This reverts commit 02369b75fdd7b5fc5d9b47f1b60587c225918511. At the moment, live-outs used only for the resume values in the scalar loop are not modeled in VPlan yet. This means first-order recurrence recipes could be removed, when a scalar epilogue is required and the only use of a FOR is outside the loop. Keep treating recurrence recipes as having side-effects for now, to avoid them being removed. Fixes #62954.	2023-06-06 11:35:26 +02:00
Florian Hahn	299f0ff60e	[VPlan] Print IR flags for VPRecipeWithIRFlags. Now that IR flags are modeled as part of VPRecipeWithIRFlags, include the flags when printing recipes. Depends on D150027. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D150029	2023-05-23 20:36:16 +01:00
Florian Hahn	8eaf7a75fe	[VPlan] Add missing ifdef after 96686796f606. Fixes build with debug printing disabled.	2023-05-22 10:44:17 +01:00
Florian Hahn	96686796f6	[VPlan] Move live-out printing to VPLiveOut::print (NFC). Preparation for D150398. This brings live-out printing in line with how printing for recipes is handled.	2023-05-22 09:53:53 +01:00
Florian Hahn	236a0e82df	[LV] Use VPValue to get expanded value for SCEV step expressions. Update skeleton creation logic to use SCEV expansion results from expanding the pre-header. This avoids another set of SCEV expansions that may happen after the CFG has been modified. Fixes #58811. Depends on D147964. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D147965	2023-05-11 16:49:19 +01:00
Florian Hahn	c096e91735	[VPlan] Address missed suggestions from D149082. This address 2 comments missed from D149082. It sets inbounds directly when creating the GEP and fixes the order in the enum.	2023-05-09 15:17:20 +01:00
Florian Hahn	5f3343985b	[VPlan] Use VPRecipeWithIRFlags for VPWidenGEPRecipe (NFCI). Extend VPRecipeWithIRFlags to also include InBounds and use for VPWidenGEPRecipe. The last remaining recipe that needs updating for MayGeneratePoisonRecipes is VPReplicateRecipe. Depends on D149081. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D149082	2023-05-09 12:33:28 +01:00
Florian Hahn	127b00b25c	[VPlan] Record IR flags on VPWidenRecipe directly (NFC). This patch introduces a VPRecipeWithIRFlags class to record various IR flags for a recipe. This allows de-coupling of IR flags from the underlying instructions. The main benefit is that it allows dropping of IR flags from recipes directly, without the need to go through State::MayGeneratePoisonRecipes. The plan is to remove MayGeneratePoisonRecipes once all relevant recipes are transitioned. It also allows dropping IR flags during VPlan-to-VPlan transforms, which will be used in a follow-up patch to implement truncateToMinimalBitwidths as VPlan-to-VPlan transform. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D149079	2023-05-08 17:28:50 +01:00
Florian Hahn	01fa764c9a	[VPlan] Assert instead of check if VF is vector when widening GEPs(NFC) VPWidenGEPRecipe should not be generated for scalar VFs. Replace check with an assert.	2023-05-06 09:25:56 +01:00
Florian Hahn	8bd02e5aef	[VPlan] Assert instead checking if VF is vec when widening calls (NFC) VPWidenCallRecipe should not be generated for scalar VFs. Replace check with an assert.	2023-05-05 18:21:57 +01:00
Florian Hahn	e3afe0b89d	[VPlan] Add VPWidenCastRecipe, split off from VPWidenRecipe (NFCI). To generate cast instructions, the result type is needed. To allow creating widened casts without underlying instruction, introduce a new VPWidenCastRecipe that also holds the result type. This functionality will be used in a follow-up patch to implement truncateToMinimalBitwidths as VPlan-to-VPlan transform. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D149081	2023-05-05 13:20:16 +01:00
Florian Hahn	29712ccda6	[VPlan] Assert instead of check if VF is vector when widening casts. VPWidenRecipes should not be generated for scalar VFs. Replace check with an assert. Suggested in preparation for D149081.	2023-05-05 09:02:33 +01:00
Florian Hahn	1b05e74982	[VPlan] Reorder cases in switch (NFC). Reorder cases to make sure they are ordered properly in preparation for D149081.	2023-05-04 21:40:22 +01:00
Florian Hahn	b85a402dd8	[VPlan] Introduce new entry block to VPlan for early SCEV expansion. This patch adds a new preheader block the VPlan to place SCEV expansions expansions like the trip count. This preheader block is disconnected at the moment, as the bypass blocks of the skeleton are not yet modeled in VPlan. The preheader block is executed before skeleton creation, so the SCEV expansion results can be used during skeleton creation. At the moment, the trip count expression and induction steps are expanded in the new preheader. The remainder of SCEV expansions will be moved gradually in the future. D147965 will update skeleton creation to use the steps expanded in the pre-header to fix #58811. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D147964	2023-05-04 14:00:13 +01:00
Jay Foad	593e25ffae	[Vectorize] Fix vectorization, scalarization and folding of llvm.is.fpclass llvm.is.fpclass is different from other vectorizable intrinsics in that it is overloaded on an argument type, not on the return type. Differential Revision: https://reviews.llvm.org/D148905	2023-04-24 13:42:08 +01:00
Florian Hahn	02369b75fd	[VPlan] Mark recurrence recipes as not having side-effects. Add support for FirstOrderRecurrenceSplice and VPFirstOrderRecurrencePHI recipes to mayHaveSideEffects. They both don't have side-effects.	2023-04-17 12:30:52 +01:00
Florian Hahn	2db031528e	[VPlan] Check VPValue step in isCanonical (NFCI). Update the isCanonical() implementations to check the VPValue step operand instead of the step in the induction descriptor. At the moment this is NFC, but it enables further optimizations if the step is replaced by a constant in D147783. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D147891	2023-04-16 14:48:03 +01:00
Florian Hahn	9be8d90e62	[VPlan] Add VPWidenSelectRecipe::getCond() (NFC). Add helper to access condition, as suggested in D144489.	2023-03-10 17:49:23 +01:00
Florian Hahn	54558fd8f3	[VPlan] Replace InvariantCond field from VPWidenSelectRecipe. There is no need to store information about invariance in the recipe. Replace the fields with checks of the operands using isDefinedOutsideVectorRegions. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D144489	2023-03-10 15:28:43 +01:00
Florian Hahn	a8adb38a96	[VPlan] Replace invariance fields from VPWidenGEPRecipe. There is no need to store information about invariance in the recipe. Replace the fields with checks of the operands using isDefinedOutsideVectorRegions. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D144487	2023-03-09 17:52:22 +01:00
Nikita Popov	ffe8f47d72	[IR] Add operator<< overload for CmpInst::Predicate (NFC) I regularly try and fail to use this while debugging.	2023-03-07 15:10:56 +01:00
Florian Hahn	be968dbeee	[VPlan] VPWidenCallRecipe has side-effects if the call has. Handle VPWidenCallRecipe in VPRecipeBase::mayHaveSideEffects by delegating to the underlying call.	2023-03-05 12:08:56 +01:00
Sander de Smalen	fe1b51ffee	[LoopVectorize] Remove runtime check and scalar tail loop when tail-folding. When using tail-folding and using the predicate for both data and control-flow (the next vector iteration's predicate is generated with the llvm.active.lane.mask intrinsic and then tested for the backedge), the LoopVectorizer still inserts a runtime check to see if the 'i + VF' may at any point overflow for the given trip-count. When it does, it falls back to a scalar epilogue loop. We can get rid of that runtime check in the pre-header and therefore also remove the scalar epilogue loop. This reduces code-size and avoids a runtime check. Consider the following loop: void foo(char * __restrict__ dst, char *src, unsigned long N) { for (unsigned long i=0; i<N; ++i) dst[i] = src[i] + 42; } If 'N' is e.g. ULONG_MAX, and the VF > 1, then the loop iteration counter will overflow when calculating the predicate for the next vector iteration at some point, because LLVM does: vector.ph: %active.lane.mask.entry = tail call <vscale x 16 x i1> @llvm.get.active.lane.mask.nxv16i1.i64(i64 0, i64 %N) vector.body: %index = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ] %active.lane.mask = phi <vscale x 16 x i1> [ %active.lane.mask.entry, %vector.ph ], [ %active.lane.mask.next, %vector.body ] ... %index.next = add i64 %index, 16 ; The add above may overflow, which would affect the lane mask and control flow. Hence a runtime check is needed. %active.lane.mask.next = tail call <vscale x 16 x i1> @llvm.get.active.lane.mask.nxv16i1.i64(i64 %index.next, i64 %N) %8 = extractelement <vscale x 16 x i1> %active.lane.mask.next, i64 0 br i1 %8, label %vector.body, label %for.cond.cleanup, !llvm.loop !7 The solution: What we can do instead is calculate the predicate before incrementing the loop iteration counter, such that the llvm.active.lane.mask is calculated from 'i' to 'tripcount > VF ? tripcount - VF : 0', i.e. vector.ph: %active.lane.mask.entry = tail call <vscale x 16 x i1> @llvm.get.active.lane.mask.nxv16i1.i64(i64 0, i64 %N) %N_minus_VF = select %N > 16 ? %N - 16 : 0 vector.body: %index = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ] %active.lane.mask = phi <vscale x 16 x i1> [ %active.lane.mask.entry, %vector.ph ], [ %active.lane.mask.next, %vector.body ] ... %active.lane.mask.next = tail call <vscale x 16 x i1> @llvm.get.active.lane.mask.nxv16i1.i64(i64 %index, i64 %N_minus_VF) %index.next = add i64 %index, %4 ; The add above may still overflow, but this time the active.lane.mask is not affected %8 = extractelement <vscale x 16 x i1> %active.lane.mask.next, i64 0 br i1 %8, label %vector.body, label %for.cond.cleanup, !llvm.loop !7 For N = 20, we'd then get: vector.ph: %active.lane.mask.entry = tail call <vscale x 16 x i1> @llvm.get.active.lane.mask.nxv16i1.i64(i64 0, i64 %N) ; %active.lane.mask.entry = <1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1> %N_minus_VF = select 20 > 16 ? 20 - 16 : 0 ; %N_minus_VF = 4 vector.body: (1st iteration) ... ; using <1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1> as predicate in the loop ... %active.lane.mask.next = tail call <vscale x 16 x i1> @llvm.get.active.lane.mask.nxv16i1.i64(i64 0, i64 4) ; %active.lane.mask.next = <1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0> %index.next = add i64 0, 16 ; %index.next = 16 %8 = extractelement <vscale x 16 x i1> %active.lane.mask.next, i64 0 ; %8 = 1 br i1 %8, label %vector.body, label %for.cond.cleanup, !llvm.loop !7 ; branch to %vector.body vector.body: (2nd iteration) ... ; using <1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0> as predicate in the loop ... %active.lane.mask.next = tail call <vscale x 16 x i1> @llvm.get.active.lane.mask.nxv16i1.i64(i64 16, i64 4) ; %active.lane.mask.next = <0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0> %index.next = add i64 16, 16 ; %index.next = 32 %8 = extractelement <vscale x 16 x i1> %active.lane.mask.next, i64 0 ; %8 = 0 br i1 %8, label %vector.body, label %for.cond.cleanup, !llvm.loop !7 ; branch to %for.cond.cleanup Reviewed By: fhahn, david-arm Differential Revision: https://reviews.llvm.org/D142109	2023-03-01 09:01:19 +00:00
Florian Hahn	c21ccebe6f	[VPlan] Use usesScalars in shouldPack. Suggested by @Ayal as follow-up improvement in D143864. I was unable to find a case where this actually changes generated code, but it is a unifying code to use common infrastructure.	2023-02-20 14:11:40 +00:00

1 2

79 Commits