llvm-project

Author	SHA1	Message	Date
Florian Hahn	6dda74cc51	[VPlan] Use createSelect in adjustRecipesForReductions (NFCI). Simplify the code and rename Result->NewExitingVPV as suggested by @ayalz in https://github.com/llvm/llvm-project/pull/70253.	2024-01-03 20:54:10 +00:00
Florian Hahn	f18536d642	[VPlan] Model address separately. (#72164 ) Move vector pointer generation to a separate VPVectorPointerRecipe. This untangles address computation from the memory recipes future and is also needed to enable explicit unrolling in VPlan. https://github.com/llvm/llvm-project/pull/72164	2024-01-01 19:51:15 +00:00
Florian Hahn	516cc98aff	[LV] Fix typo in comment (NFC).	2023-12-28 21:20:10 +00:00
Florian Hahn	a5891fa4d2	[VPlan] Initial modeling of VF * UF as VPValue. (#74761 ) This patch starts initial modeling of VF * UF in VPlan. Initially, introduce a dedicated VFxUF VPValue, which is then populated during VPlan::prepareToExecute. Initially, the VF * UF applies only to the main vector loop region. Once we extend the scope of VPlan in the future, we may want to associate different VFxUFs with different vector loop regions (e.g. the epilogue vector loop) This allows explicitly parameterizing recipes that rely on the VF * UF, like the canonical induction increment. At the moment, this mainly helps to avoid generating some duplicated calls to vscale with scalable vectors. It should also allow using EVL as induction increments explicitly in D99750. Referring to VF * UF is also needed in other places that we plan to migrate to VPlan, like the minimum trip count check during skeleton creation. The first version creates the value for VF * UF directly in prepareToExecute to limit the scope of the patch. A follow-on patch will model VF * UF computation explicitly in VPlan using recipes. Moved from Phabricator (https://reviews.llvm.org/D157322)	2023-12-08 18:30:30 +00:00
Graham Hunter	d0d5ef8133	[LV] Add support for linear arguments for vector function variants (#73941 ) If we have vectorized variants of a function which take linear parameters, we should be able to vectorize assuming the strides match.	2023-12-08 10:24:05 +00:00
Alexey Bataev	056367bb19	[LV]Support dropping of nneg flag for zext widencast recipes. (#74112 ) Compiler crashes when the assertion triggered for zext nneg instruction, that checks that the instruction cannot produce poison. Changed the base class for widencast recipe to handle dropping nneg flag to avoid compiler crash.	2023-12-05 09:17:23 -05:00
Florian Hahn	70535f5e60	[VPlan] Replace IR based truncateToMinimalBitwidths with VPlan version. This patch replaces the IR based truncateToMinimalBitwidths with a VPlan version. This has 3 benefits: 1) the VPlan-based version is simpler; we don't need to implement special codegen for each supported instruction type like the IR based one. 2) Removes a dependency on the cost-model after VPlan execution and 3) Removes a use of getVPValue that uses underlying values after VPlan execution (See removed FIXME). Depends on D149081. Depends on D149079. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D149903	2023-12-02 16:12:38 +00:00
Florian Hahn	cbf7b52a65	[VPlan] Properly update reduction live-out after placing select. After inserting a select for the final value, update the VPlan def-use chains. At the moment, the incorrect live-out doesn't cause a mis-compile, as computing the final reduction value is not yet modeled in VPlan.	2023-12-02 15:22:09 +00:00
Graham Hunter	104b7c624e	[LV] Add support for uniform parameters on vectorized function variants (#72891 ) Parameters marked as uniform take a scalar value, assuming the value is invariant in the scalar loop.	2023-11-28 15:01:32 +00:00
Florian Hahn	906f598263	[VPlan] Remove dead IsEpilogueVec argument from prepareToExecute (NFC).	2023-11-23 16:59:50 +00:00
Matthias Braun	2404477219	LoopVectorize: Add better heuristic for vectorized epilogue skip test (#72589 ) This is a follow-up to PR #72450 correcting the branch_weights used for the test whether the vectorized epilogue loop should be skipped.	2023-11-20 11:02:27 -08:00
Graham Hunter	4d64a2bcd3	[LV] Refactor vector function variant selection to prepare for uniform args (#68879 ) Parameters marked as uniform take a scalar value, assuming the value is invariant in the scalar loop. In order to support this, we need to stop asking for a vector function variant with a default shape assuming that all arguments will become vector arguments, and instead consider all available variants and their parameter types.	2023-11-20 13:30:03 +00:00
Florian Hahn	7fd021a092	[LV] Don't crash on vector masks during scalar VPReductionRecipe::exec. VPReductionRecipe may be executed for scalar VFs. Make sure to access part 0 of the condition, as it could be an active-lane-mask, which is a vector <1 x i1> Fixes https://github.com/llvm/llvm-project/issues/72720.	2023-11-18 21:52:22 +00:00
Florian Hahn	2a9aed1730	[LV] Retain mask-reversal comment as suggested after e5e71af. Address post-commit comment to retain comment.	2023-11-18 20:53:23 +00:00
Florian Hahn	e5e71affb7	[LV] Reverse mask up front, not when creating vector pointer. (#72163 ) Reverse mask early on when populating BlockInMask. This will enable separating mask management and address computation from the memory recipes in the future and is also needed to enable explicit unrolling in VPlan.	2023-11-17 13:59:35 +00:00
Nikita Popov	de176d8c54	[SCEV][LV] Invalidate LCSSA exit phis more thoroughly (#69909 ) This an alternative to #69886. The basic problem is that SCEV can look through trivial LCSSA phis. When the phi node later becomes non-trivial, we do invalidate it, but this doesn't catch uses that are not covered by the IR use-def walk, such as those in BECounts. Fix this by adding a special invalidation method for LCSSA phis, which will also invalidate all the SCEVUnknowns/SCEVAddRecExprs used by the LCSSA phi node and defined in the loop. We should probably also use this invalidation method in other places that add predecessors to exit blocks, such as loop unrolling and loop peeling. Fixes #69097. Fixes #66616. Fixes #63970.	2023-11-17 09:34:24 +01:00
Matthias Braun	a9cc6fc280	LoopVectorize: Set branch_weight for conditional branches (#72450 ) Consistently add `branch_weights` metadata in any condition branch created by `LoopVectorize.cpp`: - Will only add metadata if the original loop-latch branch had metadata assigned. - Most checks should rarely trigger so I am using a 127:1 ratio. - For the middle block we assume an equal distribution of modulo results.	2023-11-16 11:33:46 -08:00
Graham Hunter	b070629c10	[LV] Increase max VF if vectorized function variants exist (#66639 ) If there are function calls in the candidate loop and we have vectorized variants available, try some wider VFs in case the conservative initial maximum based on the widest types in the loop won't actually allow us to make use of those function variants.	2023-11-13 10:27:10 +00:00
Florian Hahn	34c2dcd5ac	[VPlan] Move initial skeleton construction to createInitialVPlan. (NFC) This patch moves creating the middle VPBBs and an initial empty vector loop region for the top-level loop to createInitialVPlan. This consolidates code to create the initial VPlan skeleton and enables adding other bits outside the main region during initial VPlan construction. In particular, D150398 will add the exit check & branch to the middle block. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D158333	2023-11-12 13:00:44 +00:00
Florian Hahn	ed6f4994d8	[VPlan] Handle conditional ordered reductions with scalar VFs. VPReductionRecipe::execute was not handling predicates for ordered reduction with scalar VFs, which was causing a crash. Thsi patch adds dedicated handling for scalar VFs when dealing with the condition. The other operands are already handled in a similar fashion below. Fixes #70988.	2023-11-11 12:55:40 +00:00
Simon Pilgrim	3ca4fe80d4	[Transforms] Use StringRef::starts_with/ends_with instead of startswith/endswith. NFC. startswith/endswith wrap starts_with/ends_with and will eventually go away (to more closely match string_view)	2023-11-06 16:50:18 +00:00
Florian Hahn	fd82b5b287	[LV] Support recieps without underlying instr in collectPoisonGenRec. Support recipes without underlying instruction in collectPoisonGeneratingRecipes by directly trying to dyn_cast_or_null the underlying value. Fixes https://github.com/llvm/llvm-project/issues/70590.	2023-11-03 10:21:14 +00:00
Igor Kirillov	70904226e1	[LoopVectorize] Enhance Vectorization decisions for predicate tail-folded loops with low trip counts (#69588 ) * Avoid using `CM_ScalarEpilogueNotAllowedLowTripLoop` for loops known to be predicate tail-folded, delegating to `areRuntimeChecksProfitable` to decide on the profitability of vectorizing loops with runtime checks. * Update the `areRuntimeChecksProfitable` function to consider the `ScalarEpilogueLowering` setting when assessing vectorization of a loop. With this patch, we can make more informed decisions for loops with low trip counts, especially when leveraging Profile-Guided Optimization (PGO) data.	2023-10-30 13:43:26 +00:00
Florian Hahn	b0b88643a1	[VPlan] Add initial anlysis to infer scalar type of VPValues. (#69013 ) This patch adds initial type inferrence for VPValues. It infers the scalar type of a VPValue, by bottom-up traversing through defining recipes until root nodes with known types are reached (e.g. live-ins or load recipes). The types are then propagated top down through operations. This is intended as building block for a VPlan-based cost model, which will need access to type information for VPValues/recipes. Initial testing is done by asserting the inferred type matches the type of the result value generated for a widen and replicate recipes.	2023-10-27 14:38:28 +01:00
Florian Hahn	4f56d47d05	[VPlan] Make ExpandedSCEVs argument const (NFC). The argument is only used in const contexts. Simplifies a follow-up diff.	2023-10-22 12:31:55 +01:00
Florian Hahn	ca01f2af78	[LV] Enforce order of reductions with intermediate stores in VPlan (NFC) Reductions with intermediate stores currently need to be fixed in order of their intermediate stores. Instead of doing this at fixup time after code has been generated, sort the reductions in adjustRecipesForReductions. This makes the order explicit in VPlan and will enable removing fixReductions with modeling computing the final reduction result in VPlan, followed by also modeling the intermediate stores explicitly.	2023-10-21 21:26:52 +01:00
Lou Knauer	852bac4439	[VPlan] Support scalable vectors in outer-loop vectorization This patch enables scalable vectors in the VPlan-native path. If a vectorization factor is specified via loop vectorization hints, that factor is used. If no vectorization factor is specified, but the target preferes scalable vectorization, a scalable vectorization factor is selected. Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D157484	2023-10-20 23:17:35 +01:00
Florian Hahn	2ec7bba77b	Recommit "[VPlan] Insert Trunc/Exts for reductions directly in VPlan." This reverts commit e4ea0997486000b460c4875a00301b73b3c0d6a7. The recommit fixes a reported crash by adding a missing check to make sure the cast recipes are only introduced when vectorizing. Test coverage added in 3cac608fbd0811b2f5c59c6e13148162ccd8543e. Original commit message: Update the code to create Trunc/Ext recipes directly in adjustRecipesForReductions instead of fixing it up later in fixReductions. This explicitly models the required conversions and also makes sure they are generated at the right place (instead of after the exit condition), hence the changes in a few tests.	2023-10-20 14:30:04 +01:00
Fangrui Song	e4ea099748	Revert "[VPlan] Insert Trunc/Exts for reductions directly in VPlan." This reverts commit fd311126349b8fe1684d62154a9fa5a7bbb0b713. There are two different crash reports on `fd31112634`	2023-10-18 23:25:31 -07:00
Florian Hahn	fd31112634	[VPlan] Insert Trunc/Exts for reductions directly in VPlan. Update the code to create Trunc/Ext recipes directly in adjustRecipesForReductions instead of fixing it up later in fixReductions. This explicitly models the required conversions and also makes sure they are generated at the right place (instead of after the exit condition), hence the changes in a few tests.	2023-10-17 19:17:40 +01:00
Yingwei Zheng	4718b4011f	[LV] Invalidate disposition of SCEV values after loop vectorization (#69230 ) This PR fixes the assertion failure of `SE.verify()` after loop vectorization.	2023-10-17 03:49:39 +08:00
Florian Hahn	b1115f8cce	[LV] Use LatchVPBB directly instead of going through region (NFC). Split off from D158333.	2023-10-13 20:08:31 +01:00
Rin	df8e0d057d	[AArch64][LoopVectorize] Use upper bound trip count instead of the constant TC when choosing max VF (#67697 ) This patch is based off of https://github.com/llvm/llvm-project/pull/67543. We are currently using the exact trip count to make decisions regarding the maximum VF. We can instead use the upper bound TC, which will be the same as the constant trip count when that is known.	2023-10-09 16:26:19 +01:00
Graham Hunter	3273ea40e5	[LV] Cache call vectorization decisions (#66521 ) LoopVectorize currently queries VFDatabase repeatedly for each CI, and each query to VFDatabase rescans all vector variants. This patch instead makes a decision for each call once per VF based on the cost of scalarization vs. function call to a vector variant of the function vs. a vector intrinsic, then caches the decision along with relevant info for use in planning and plan execution.	2023-10-09 11:23:19 +01:00
Florian Hahn	dae91f5dbc	[VPlan] Avoid VPTransformState::reset in fixReduction (NFCI). There's no need to repeatedly query and reset the state for LoopExitInstDef. This removes one of the last uses of VPTransformState::reset, by use a vector to store and update the results. No other code should try to retrieve the result from State outside the fixReductionCall.	2023-10-07 23:24:24 +01:00
Rin	d3e4702c0f	[AArch64] [LoopVectorize] Use either fixed-width or scalable VF when tail-folding (#67543 ) Since the getMaximisedVFForTarget function is called twice, once for fixed-width and once for scalable, it adds no value to always return a fixed-width VF. Instead, when we are tail-folding, we can use either fixed-width or scalable vectors.	2023-10-05 10:24:30 +01:00
Florian Hahn	07e715953b	[VPlan] Check users of LoopExitInstDef in VPlan directly. (NFCI) Instead of walking the IR def use chains of the generated code, adjust the generated VPInstruction if needed and check its users in VPlan.	2023-10-03 20:42:15 +01:00
Florian Hahn	97687b7aea	[VPlan] Add active-lane-mask as VPlan-to-VPlan transformation. This patch updates the mask creation code to always create compares of the form (ICMP_ULE, wide canonical IV, backedge-taken-count) up front when tail folding and introduce active-lane-mask as later transformation. This effectively makes (ICMP_ULE, wide canonical IV, backedge-taken-count) the canonical form for tail-folding early on. Introducing more specific active-lane-mask recipes is treated as a VPlan-to-VPlan optimization. This has the advantage of keeping the logic (and complexity) of introducing active-lane-mask recipes in a single place, instead of spreading the logic out across multiple functions. It also simplifies initial VPlan construction and enables treating introducing EVL as similar optimization. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D158779	2023-09-25 13:34:45 +01:00
Florian Hahn	1a9e45080f	[VPBuilder] Add setInsertPoint version taking a recipe directly (NFC). This helps to slightly simplify code when a recipe can be obtained easily. Suggested in D158779.	2023-09-25 12:17:53 +01:00
Youngsuk Kim	e5026f0179	[llvm] Remove uses of Type::getPointerTo() (NFC) Partial progress towards removing in-tree uses of `getPointerTo()`, by employing the following options: * Drop the call entirely if the sole purpose of it is to support a no-op bitcast (remove the no-op bitcast as well). * Replace with `PointerType::get()`/`PointerType::getUnqual()` This is a NFC cleanup effort. Reviewed By: barannikov88 Differential Revision: https://reviews.llvm.org/D155232	2023-09-22 19:44:38 -04:00
Florian Hahn	f23246a0bb	[LV] Directly add fast-math flags to select recipe (NFC). Now that VPInstruction can manage fast math flags via VPRecipeWithIRFlags, use them directly to model the fast-math flags of the select created for the final reduction value instead of adding them late.	2023-09-21 11:05:55 +01:00
Florian Hahn	1a9358c090	[LV] Relax over-strict assertion for reduction exit value selects. After f108c6c, (mul x, 1) is simplified to x, which can cause the select for the final reduction value when tail-folding to use the reduction value for both options. Relax the assertion to make sure this case is allowed. Note that the reduction is now redundant itself and could be further simplified. Fixes #66895.	2023-09-21 10:12:29 +01:00
Jeremy Morse	e54277fa10	[NFC][RemoveDIs] Use iterators over inst-pointers when using IRBuilder This patch adds a two-argument SetInsertPoint method to IRBuilder that takes a block/iterator instead of an instruction, and updates many call sites to use it. The motivating reason for doing this is given here [0], we'd like to pass around more information about the position of debug-info in the iterator object. That necessitates passing iterators around most of the time. [0] https://discourse.llvm.org/t/rfc-instruction-api-changes-needed-to-eliminate-debug-intrinsics-from-ir/68939 Differential Revision: https://reviews.llvm.org/D152468	2023-09-11 20:01:19 +01:00
Jeremy Morse	6942c64e81	[NFC][RemoveDIs] Prefer iterator-insertion over instructions Continuing the patch series to get rid of debug intrinsics [0], instruction insertion needs to be done with iterators rather than instruction pointers, so that we can communicate information in the iterator class. This patch adds an iterator-taking insertBefore method and converts various call sites to take iterators. These are all sites where such debug-info needs to be preserved so that a stage2 clang can be built identically; it's likely that many more will need to be changed in the future. At this stage, this is just changing the spelling of a few operations, which will eventually become signifiant once the debug-info bearing iterator is used. [0] https://discourse.llvm.org/t/rfc-instruction-api-changes-needed-to-eliminate-debug-intrinsics-from-ir/68939 Differential Revision: https://reviews.llvm.org/D152537	2023-09-11 11:48:45 +01:00
Florian Hahn	08de6508ab	[LV] Return debug loc directly from getDebugLocFromInstrOrOps (NFCI) The return value of the function is only used to get the debug location. Directly return the debug location, as this avoids an extra null check in the caller.	2023-09-08 16:29:09 +01:00
Florian Hahn	168e23c741	[VPlan] Remove reference to Instr when setting debug loc. (NFCI) This allows untangling references to underlying IR for various recipes.	2023-09-05 10:59:13 +01:00
Mel Chen	26aed5b9a8	[VPlan][LoopUtils] Remove unused parameter TTI This patch removes the member TTI from VPReductionRecipe, as the generation of reduction operations no longer requires TTI. Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D158148	2023-09-04 05:30:37 -07:00
Nuno Lopes	5a3fd5f3f5	[LoopVectorizer] Fix PR #65212 : vectorization of reduction loop wasn't respecting original store alignment	2023-09-03 16:35:05 +01:00
Florian Hahn	fd66195777	[VPlan] Manage compare predicates in VPRecipeWithIRFlags. Extend VPRecipeWithIRFlags to also manage predicates for compares. This allows removing the custom ICmpULE opcode from VPInstruction which was a workaround for missing proper predicate handling. This simplifies the code a bit while also allowing compares with any predicates. It also fixes a case where the compare predixcate wasn't printed properly for VPReplicateRecipes. Discussed/split off from D150398. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D158992	2023-09-02 21:45:24 +01:00
Fangrui Song	111fcb0df0	[llvm] Fix duplicate word typos. NFC Those fixes were taken from https://reviews.llvm.org/D137338	2023-09-01 18:25:16 -07:00

1 2 3 4 5 ...

1996 Commits