llvm-project

Author	SHA1	Message	Date
Andrew Rogers	4e2efa55c6	[llvm] export private symbols needed by unittests (#145767 ) ## Purpose Export a small number of private LLVM symbols so that unit tests can still build/run when LLVM is built as a Windows DLL or a shared library with default hidden symbol visibility. ## Background The effort to build LLVM as a WIndows DLL is tracked in #109483. Additional context is provided in [this discourse](https://discourse.llvm.org/t/psa-annotating-llvm-public-interface/85307). Some LLVM unit tests use internal/private symbols that are not part of LLVM's public interface. When building LLVM as a DLL or shared library with default hidden symbol visibility, the symbols are not available when the unit test links against the DLL or shared library. This problem can be solved in one of two ways: 1. Export the private symbols from the DLL. 2. Link the unit tests against the intermediate static libraries instead of the final LLVM DLL. This PR applies option 1. Based on the discussion of option 2 in #145448, this option is preferable. ## Overview * Adds a new `LLVM_ABI_FOR_TEST` export macro, which is currently just an alias for `LLVM_ABI`. * Annotates the sub-set of symbols under `llvm/lib` that are required to get unit tests building using the new macro.	2025-07-10 08:20:09 -07:00
Florian Hahn	6b3d2b629c	[VPlan] Add VPExpressionRecipe, replacing extended reduction recipes. (#144281 ) This patch adds a new recipe to combine multiple recipes into an 'expression' recipe, which should be considered as single entity for cost-modeling and transforms. The recipe needs to be 'decomposed', i.e. replaced by its individual recipes before execute. This subsumes VPExtendedReductionRecipe and VPMulAccumulateReductionRecipe and should make it easier to extend to include more types of bundled patterns, like e.g. extends folded into loads or various arithmetic instructions, if supported by the target. It allows avoiding re-creating the original recipes when converting to concrete recipes, together with removing the need to record various information. The current version of the patch still retains the original printing matching VPExtendedReductionRecipe and VPMulAccumulateReductionRecipe, but this specialized print could be replaced with printing the bundled recipes directly. PR: https://github.com/llvm/llvm-project/pull/144281	2025-07-01 20:44:50 +01:00
Florian Hahn	01b9828a66	[VPlan] Remove unneeded friend classes from VPValue (NFC). None of the removed classes makes use of the friendship relationship.	2025-06-05 21:40:21 +01:00
Florian Hahn	0f00a96fed	[VPlan] Simplify branch on False in VPlan transform (NFC). (#140409 ) Simplify branch on false, starting with the branch from the middle block to the scalar preheader. Initially this helps simplifying the initial VPlan construction. Depends on https://github.com/llvm/llvm-project/pull/140405. PR: https://github.com/llvm/llvm-project/pull/140409	2025-05-31 20:32:45 +01:00
Florian Hahn	c3cce7caf8	[VPlan] Remove unused VPUser constructors (NFC). Now all users construct VPUsers using VPUser(ArraryRef<VPValue *>). Remove the other unused constructors.	2025-05-31 12:20:32 +01:00
Florian Hahn	dcef154b5c	[VPlan] Replace VPRegionBlock with explicit CFG before execute (NFCI). (#117506 ) Building on top of https://github.com/llvm/llvm-project/pull/114305, replace VPRegionBlocks with explicit CFG before executing. This brings the final VPlan closer to the IR that is generated and helps to simplify codegen. It will also enable further simplifications of phi handling during execution and transformations that do not have to preserve the canonical IV required by loop regions. This for example could include replacing the canonical IV with an EVL based phi while completely removing the original canonical IV. PR: https://github.com/llvm/llvm-project/pull/117506	2025-05-24 19:17:16 +01:00
Elvis Wang	664c937b43	[VPlan] Implement VPExtendedReduction, VPMulAccumulateReductionRecipe and corresponding vplan transformations. (#137746 ) This patch introduce two new recipes. * VPExtendedReductionRecipe - cast + reduction. * VPMulAccumulateReductionRecipe - (cast) + mul + reduction. This patch also implements the transformation that match following patterns via vplan and converts to abstract recipes for better cost estimation. * VPExtendedReduction - reduce(cast(...)) * VPMulAccumulateReductionRecipe - reduce.add(mul(...)) - reduce.add(mul(ext(...), ext(...)) - reduce.add(ext(mul(ext(...), ext(...)))) The converted abstract recipes will be lower to the concrete recipes (widen-cast + widen-mul + reduction) just before recipe execution. Note that this patch still relies on legacy cost model the calculate the cost for these patters. Will enable vplan-based cost decision in #113903. Split from #113903.	2025-05-16 10:25:38 +08:00
Florian Hahn	bc03d6cce2	[VPlan] Introduce all loop regions as VPlan transform. (NFC) (#129402 ) Further simplify VPlan CFG builder by moving introduction of inner regions to a VPlan transform, building on https://github.com/llvm/llvm-project/pull/128419. The HCFG builder now only constructs plain CFGs. I will move it to VPlanConstruction as follow-up. Depends on https://github.com/llvm/llvm-project/pull/128419. PR: https://github.com/llvm/llvm-project/pull/129402	2025-04-16 13:30:45 +02:00
Sam Tebbs	b658a2e74a	[LV] Reduce register usage for scaled reductions (#133090 ) This PR accounts for scaled reductions in `calculateRegisterUsage` to reflect the fact that the number of lanes in their output is smaller than the VF. Depends on https://github.com/llvm/llvm-project/pull/126437	2025-04-11 14:31:08 +01:00
Florian Hahn	6a9e8fc50c	[VPlan] Introduce VPInstructionWithType, use instead of VPScalarCast(NFC) (#129706 ) There are some opcodes that currently require specialized recipes, due to their result type not being implied by their operands, including casts. This leads to duplication from defining multiple full recipes. This patch introduces a new VPInstructionWithType subclass that also stores the result type. The general idea is to have opcodes needing to specify a result type to use this general recipe. The current patch replaces VPScalarCastRecipe with VInstructionWithType, a similar patch for VPWidenCastRecipe will follow soon. There are a few proposed opcodes that should also benefit, without the need of workarounds: * https://github.com/llvm/llvm-project/pull/129508 * https://github.com/llvm/llvm-project/pull/119284 PR: https://github.com/llvm/llvm-project/pull/129706	2025-04-10 22:30:40 +01:00
Florian Hahn	ad9f15ab53	[VPlan] Introduce and use VPValue::replaceUsesOfWith (NFC). Adds an API matching LLVM's IR Value, which simplifies some code a bit.	2025-04-07 22:07:52 +01:00
Florian Hahn	5b38fb59df	[VPlan] Remove remaining references to VPScalarPHIRecipe (NFC). VPScalarPHIRecipe has been replaced by VPInstructions with PHI opcodes. Strip remaining dead references to VPScalarPHIRecipe.	2025-03-24 19:37:00 +00:00
Luke Lau	a4dc02c0e7	[VPlan] Rename VPReverseVectorPointerRecipe to VPVectorEndPointerRecipe. NFC (#131086 ) After #128718 lands there will be two ways of performing a reversed widened memory access, either by performing a consecutive unit-stride access and a reverse, or a strided access with a negative stride. Even though both produce a reversed vector, only the former needs VPReverseVectorPointerRecipe which computes a pointer to the last element of each part. A strided reverse still needs a pointer to the first element of each part so it will use VPVectorPointerRecipe. This renames VPReverseVectorPointerRecipe to VPVectorEndPointerRecipe to clarify that a reversed access may not necessarily need a pointer to the last element.	2025-03-19 00:09:15 +08:00
Florian Hahn	c0bf4b2c57	[VPlan] Remove unneeded VPValue::getLiveInIRValue() const (NFC). The accessor is not needed/used.	2025-02-28 17:01:19 +00:00
Luke Lau	e23ab73335	[VPlan] Don't convert widen recipes to VP intrinsics in EVL transform (#127180 ) This is a copy of #126177, since it was automatically and permanently closed because I messed up the source branch on my remote This patch proposes to avoid converting widening recipes to VP intrinsics during the EVL transform. IIUC we initially did this to avoid `vl` toggles on RISC-V. However we now have the RISCVVLOptimizer pass which mostly makes this redundant. Emitting regular IR instead of VP intrinsics allows more generic optimisations, both in the middle end and DAGCombiner, and we generally have better patterns in the RISC-V backend for non-VP nodes. Sticking to regular IR instructions is likely a lot less work than reimplementing all of these optimisations for VP intrinsics, and on SPEC CPU 2017 we get noticeably better code generation.	2025-02-22 19:38:11 +08:00
Florian Hahn	5008277322	[VPlan] Move auxiliary declarations out of VPlan.h (NFC). (#124104 ) Nothing in VPlan.h directly depends on VPTransformState, VPCostContext, VPFRange, VPlanPrinter or VPSlotTracker. Move them out to a separate header to reduce the size of widely used VPlan.h. This is a first step towards more cleanly separating declarations in VPlan. Besides reducing VPlan.h's size, this also allows including additional VPlan-related headers in VPlanHelpers.h for use there. An example is using VPDominatorTree in VPTransformState (https://github.com/llvm/llvm-project/pull/117138). PR: https://github.com/llvm/llvm-project/pull/124104	2025-02-02 13:44:07 +00:00
Florian Hahn	65cd9e4c2f	[VPlan] Make VPValue constructors protected. (NFC) Tighten access to constructors similar to ef1260acc0. VPValues should either be constructed by constructors of recipes defining them or should be live-ins created by VPlan (via getOrAddLiveIn).	2025-01-17 22:17:12 +00:00
Sam Tebbs	795e35a653	Reland "[LoopVectorizer] Add support for partial reductions" with non-phi operand fix. (#121744 ) This relands the reverted #120721 with a fix for cases where neither reduction operand are the reduction phi. Only 63114239cc8d26225a0ef9920baacfc7cc00fc58 and 63114239cc8d26225a0ef9920baacfc7cc00fc58 are new on top of the reverted PR. --------- Co-authored-by: Nicholas Guy <nicholas.guy@arm.com>	2025-01-13 11:20:35 +00:00
Zequan Wu	4d8f9594b2	Revert "Reland "[LoopVectorizer] Add support for partial reductions" (#120721 )" This reverts commit c858bf620c3ab2a4db53e84b9365b553c3ad1aa6 as it casuse optimization crash on -O2, see https://github.com/llvm/llvm-project/pull/120721#issuecomment-2563192057	2024-12-27 11:51:54 -08:00
Sam Tebbs	c858bf620c	Reland "[LoopVectorizer] Add support for partial reductions" (#120721 ) This re-lands the reverted #92418 When the VF is small enough so that dividing the VF by the scaling factor results in 1, the reduction phi execution thinks the VF is scalar and sets the reduction's output as a scalar value, tripping assertions expecting a vector value. The latest commit in this PR fixes that by using `State.VF` in the scalar check, rather than the divided VF. --------- Co-authored-by: Nicholas Guy <nicholas.guy@arm.com>	2024-12-24 12:08:17 +00:00
Florian Hahn	5f096fd221	Revert "[LoopVectorizer] Add support for partial reductions (#92418 )" This reverts commit 060d62b48aeb5080ffcae1dc56e41a06c6f56701. It looks like this is triggering an assertion when build llvm-test-suite on ARM64 macOS. Reproducer from MultiSource/Benchmarks/Ptrdist/bc/number.c target datalayout = "e-m:o-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-n32:64-S128-Fn32" target triple = "arm64-apple-macosx15.0.0" define void @test(i64 %idx.neg, i8 %0) #0 { entry: br label %while.body while.body: ; preds = %while.body, %entry %n1ptr.0.idx131 = phi i64 [ %n1ptr.0.add, %while.body ], [ %idx.neg, %entry ] %n2ptr.0.idx130 = phi i64 [ %n2ptr.0.add, %while.body ], [ 0, %entry ] %sum.1129 = phi i64 [ %add99, %while.body ], [ 0, %entry ] %n1ptr.0.add = add i64 %n1ptr.0.idx131, 1 %conv = sext i8 %0 to i64 %n2ptr.0.add = add i64 %n2ptr.0.idx130, 1 %1 = load i8, ptr null, align 1 %conv97 = sext i8 %1 to i64 %mul = mul i64 %conv97, %conv %add99 = add i64 %mul, %sum.1129 %cmp94 = icmp ugt i64 %n1ptr.0.idx131, 0 %cmp95 = icmp ne i64 %n2ptr.0.idx130, -1 %2 = and i1 %cmp94, %cmp95 br i1 %2, label %while.body, label %while.end.loopexit while.end.loopexit: ; preds = %while.body %add99.lcssa = phi i64 [ %add99, %while.body ] ret void } attributes #0 = { "target-cpu"="apple-m1" } > opt -p loop-vectorize Assertion failed: ((VF.isScalar() \|\| V->getType()->isVectorTy()) && "scalar values must be stored as (0, 0)"), function set, file VPlan.h, line 284.	2024-12-19 21:46:51 +00:00
Nicholas Guy	060d62b48a	[LoopVectorizer] Add support for partial reductions (#92418 ) Following on from https://github.com/llvm/llvm-project/pull/94499, this patch adds support to the Loop Vectorizer to emit the partial reduction intrinsics where they may be beneficial for the target. --------- Co-authored-by: Samuel Tebbs <samuel.tebbs@arm.com>	2024-12-19 11:42:40 +00:00
Florian Hahn	a7fda0e1e4	[VPlan] Introduce VPScalarPHIRecipe, use for can & EVL IV codegen (NFC). (#114305 ) Introduce a general recipe to generate a scalar phi. Lower VPCanonicalIVPHIRecipe and VPEVLBasedIVRecipe to VPScalarIVPHIrecipe before plan execution, avoiding the need for duplicated ::execute implementations. There are other cases that could benefit, including in-loop reduction phis and pointer induction phis. Builds on a similar idea as https://github.com/llvm/llvm-project/pull/82270. PR: https://github.com/llvm/llvm-project/pull/114305	2024-12-03 14:53:51 +00:00
Florian Hahn	b021464d35	[VPlan] Introduce scalar loop header in plan, remove VPLiveOut. (#109975 ) Update VPlan to include the scalar loop header. This allows retiring VPLiveOut, as the remaining live-outs can now be handled by adding operands to the wrapped phis in the scalar loop header. Note that the current version only includes the scalar loop header, no other loop blocks and also does not wrap it in a region block. PR: https://github.com/llvm/llvm-project/pull/109975	2024-10-31 21:36:44 +01:00
Shih-Po Hung	266ff98cba	[LV][VPlan] Use VF VPValue in VPVectorPointerRecipe (#110974 ) Refactors VPVectorPointerRecipe to use the VF VPValue to obtain the runtime VF, similar to #95305. Since only reverse vector pointers require the runtime VF, the patch sets VPUnrollPart::PartOpIndex to 1 for vector pointers and 2 for reverse vector pointers. As a result, the generation of reverse vector pointers is moved into a separate recipe.	2024-10-26 23:18:50 +08:00
Elvis Wang	b3edc764f7	[VPlan] Implement VPWidenCastRecipe::computeCost(). (NFCI) (#111339 ) This patch implement `VPWidenCastRecipe::computeCost()` and skip cast recipies in the in-loop reduction.	2024-10-22 12:23:49 +08:00
Florian Hahn	6fbbe152fa	[VPlan] Introduce VPWidenIntrinsicRecipe to separate from libcall. (#110486 ) This patch splits off intrinsic hanlding to a new VPWidenIntrinsicRecipe. VPWidenIntrinsicRecipes only need access to the intrinsic ID to widen and the scalar result type (in case the intrinsic is overloaded on the result type). It does not need access to an underlying IR call instruction or function. This means VPWidenIntrinsicRecipe can be created easily without access to underlying IR.	2024-10-08 22:37:20 +01:00
Graham Hunter	6f1a8c2da2	[LV] Vectorize histogram operations (#99851 ) This patch implements autovectorization support for the 'all-in-one' histogram intrinsic, which seems to have more support than the 'standalone' intrinsic. See https://discourse.llvm.org/t/rfc-vectorization-support-for-histogram-count-operations/74788/ for an overview of the work and my notes on the tradeoffs between the two approaches.	2024-09-27 13:08:55 +01:00
Florian Hahn	4eb9838409	[VPlan] Generalize VPValue::isDefinedOutsideLoopRegions. Update isDefinedOutsideLoopRegions to check if a recipe is defined outside any region. Split off already approved https://github.com/llvm/llvm-project/pull/95842 now that this can be tested separately after landing VPlan-based LICM https://github.com/llvm/llvm-project/issues/107501	2024-09-20 15:34:00 +01:00
Florian Hahn	256100489d	[VPlan] Rename isDefinedOutside[Vector]Regions -> [Loop] (NFC) Clarify name of helper, split off from https://github.com/llvm/llvm-project/pull/95842/files#r1765556732.	2024-09-19 11:20:31 +01:00
Florian Hahn	f0c5caa814	[VPlan] Add VPIRInstruction, use for exit block live-outs. (#100735 ) Add a new VPIRInstruction recipe to wrap existing IR instructions not to be modified during execution, execept for PHIs. For PHIs, a single VPValue operand is allowed, and it is used to add a new incoming value for the single predecessor VPBB. Expect PHIs, VPIRInstructions cannot have any operands. Depends on https://github.com/llvm/llvm-project/pull/100658. PR: https://github.com/llvm/llvm-project/pull/100735	2024-09-14 21:21:55 +01:00
Kolya Panchenko	00e40c9b5b	[LV] Support binary and unary operations with EVL-vectorization (#93854 ) The patch adds `VPWidenEVLRecipe` which represents `VPWidenRecipe` + EVL argument. The new recipe replaces `VPWidenRecipe` in `tryAddExplicitVectorLength` for each binary and unary operations. Follow up patches will extend support for remaining cases, like `FCmp` and `ICmp`	2024-09-06 11:41:36 -04:00
Mel Chen	4eb30cfb34	[LV][EVL] Support in-loop reduction using tail folding with EVL. (#90184 ) Following from #87816, add VPReductionEVLRecipe to describe vector predication reduction. Address one of TODOs from #76172.	2024-07-16 16:15:24 +08:00
Graham Hunter	22a7f6dcc4	Revert "[LV] Autovectorization for the all-in-one histogram intrinsic" (#98493 ) Reverts llvm/llvm-project#91458 to deal with post-commit reviewer requests.	2024-07-11 16:39:30 +01:00
Graham Hunter	1860fd049e	[LV] Autovectorization for the all-in-one histogram intrinsic (#91458 ) This patch implements limited loop vectorization support for the 'all-in-one' histogram intrinsic. The feature is disabled by default, and when enabled will only vectorize if there are no other users of values in the gather-modify-scatter sequence.	2024-07-11 15:33:30 +01:00
Florian Hahn	b841e2eca3	Recommit "[VPlan] First step towards VPlan cost modeling. (#92555 )" This reverts commit 6f538f6a2d3224efda985e9eb09012fa4275ea92. A number of crashes have been fixed by separate fixes, including ttps://github.com/llvm/llvm-project/pull/96622. This version of the PR also pre-computes the costs for branches (except the latch) instead of computing their costs as part of costing of replicate regions, as there may not be a direct correspondence between original branches and number of replicate regions. Original message: This adds a new interface to compute the cost of recipes, VPBasicBlocks, VPRegionBlocks and VPlan, initially falling back to the legacy cost model for all recipes. Follow-up patches will gradually migrate recipes to compute their own costs step-by-step. It also adds getBestPlan function to LVP which computes the cost of all VPlans and picks the most profitable one together with the most profitable VF. The VPlan selected by the VPlan cost model is executed and there is an assert to catch cases where the VPlan cost model and the legacy cost model disagree. Even though I checked a number of different build configurations on AArch64 and X86, there may be some differences that have been missed. Additional discussions and context can be found in @arcbbb's https://github.com/llvm/llvm-project/pull/67647 and https://github.com/llvm/llvm-project/pull/67934 which is an earlier version of the current PR. PR: https://github.com/llvm/llvm-project/pull/92555	2024-07-10 14:22:21 +01:00
Florian Hahn	f1f3c34b47	Revert "Recommit "[VPlan] First step towards VPlan cost modeling. (#92555 )"" This reverts commit 242cc200ccb24e22eaf54aed7b0b0c84cfc54c0b and eea150c84053035163f307b46549a2997a343ce9, as it is causing a build bot failure and there have been a number of crashes reported at https://github.com/llvm/llvm-project/pull/92555	2024-06-21 19:54:21 +01:00
Florian Hahn	242cc200cc	Recommit "[VPlan] First step towards VPlan cost modeling. (#92555 )" This reverts commit 6f538f6a2d3224efda985e9eb09012fa4275ea92. Extra tests for crashes discovered when building Chromium have been added in fb86cb7ec157689e, 3be7312f81ad2. Original message: This adds a new interface to compute the cost of recipes, VPBasicBlocks, VPRegionBlocks and VPlan, initially falling back to the legacy cost model for all recipes. Follow-up patches will gradually migrate recipes to compute their own costs step-by-step. It also adds getBestPlan function to LVP which computes the cost of all VPlans and picks the most profitable one together with the most profitable VF. The VPlan selected by the VPlan cost model is executed and there is an assert to catch cases where the VPlan cost model and the legacy cost model disagree. Even though I checked a number of different build configurations on AArch64 and X86, there may be some differences that have been missed. Additional discussions and context can be found in @arcbbb's https://github.com/llvm/llvm-project/pull/67647 and https://github.com/llvm/llvm-project/pull/67934 which is an earlier version of the current PR. PR: https://github.com/llvm/llvm-project/pull/92555	2024-06-20 17:32:52 +01:00
Arthur Eubanks	6f538f6a2d	Revert "Recommit "[VPlan] First step towards VPlan cost modeling. (#92555 )"" This reverts commit 90fd99c0795711e1cf762a02b29b0a702f86a264. This reverts commit 43e6f46936e177e47de6627a74b047ba27561b44. Causes crashes, see comments on https://github.com/llvm/llvm-project/pull/92555.	2024-06-14 17:47:08 +00:00
Florian Hahn	90fd99c079	Recommit "[VPlan] First step towards VPlan cost modeling. (#92555 )" This reverts commit 46080abe9b136821eda2a1a27d8a13ceac349f8c. Extra tests have been added in 52d29eb287. Original message: This adds a new interface to compute the cost of recipes, VPBasicBlocks, VPRegionBlocks and VPlan, initially falling back to the legacy cost model for all recipes. Follow-up patches will gradually migrate recipes to compute their own costs step-by-step. It also adds getBestPlan function to LVP which computes the cost of all VPlans and picks the most profitable one together with the most profitable VF. The VPlan selected by the VPlan cost model is executed and there is an assert to catch cases where the VPlan cost model and the legacy cost model disagree. Even though I checked a number of different build configurations on AArch64 and X86, there may be some differences that have been missed. Additional discussions and context can be found in @arcbbb's https://github.com/llvm/llvm-project/pull/67647 and https://github.com/llvm/llvm-project/pull/67934 which is an earlier version of the current PR. PR: https://github.com/llvm/llvm-project/pull/92555	2024-06-14 12:33:48 +01:00
Arthur Eubanks	46080abe9b	Revert "[VPlan] First step towards VPlan cost modeling. (#92555 )" This reverts commit 00798354c553d48d27006a2b06a904bd6013e31b. Causes crashes, see comments on https://github.com/llvm/llvm-project/pull/92555.	2024-06-13 16:37:21 +00:00
Florian Hahn	00798354c5	[VPlan] First step towards VPlan cost modeling. (#92555 ) This adds a new interface to compute the cost of recipes, VPBasicBlocks, VPRegionBlocks and VPlan, initially falling back to the legacy cost model for all recipes. Follow-up patches will gradually migrate recipes to compute their own costs step-by-step. It also adds getBestPlan function to LVP which computes the cost of all VPlans and picks the most profitable one together with the most profitable VF. The VPlan selected by the VPlan cost model is executed and there is an assert to catch cases where the VPlan cost model and the legacy cost model disagree. Even though I checked a number of different build configurations on AArch64 and X86, there may be some differences that have been missed. Additional discussions and context can be found in @arcbbb's https://github.com/llvm/llvm-project/pull/67647 and https://github.com/llvm/llvm-project/pull/67934 which is an earlier version of the current PR. PR: https://github.com/llvm/llvm-project/pull/92555	2024-06-13 14:26:18 +01:00
Florian Hahn	577785c5ca	[VPlan] Remove unused removeLastOperand (NFC). The last use of the function has been removed a while ago. Remove the unused function.	2024-05-18 18:43:20 +01:00
Florian Hahn	e2a72fa583	[VPlan] Introduce recipes for VP loads and stores. (#87816 ) Introduce new subclasses of VPWidenMemoryRecipe for VP (vector-predicated) loads and stores to address multiple TODOs from https://github.com/llvm/llvm-project/pull/76172 Note that the introduction of the new recipes also improves code-gen for VP gather/scatters by removing the redundant header mask. With the new approach, it is not sufficient to look at users of the widened canonical IV to find all uses of the header mask. In some cases, a widened IV is used instead of separately widening the canonical IV. To handle that, first collect all VPValues representing header masks (by looking at users of both the canonical IV and widened inductions that are canonical) and then checking all users (recursively) of those header masks. Depends on https://github.com/llvm/llvm-project/pull/87411. PR: https://github.com/llvm/llvm-project/pull/87816	2024-04-19 09:44:23 +01:00
Florian Hahn	a9bafe91dd	[VPlan] Split VPWidenMemoryInstructionRecipe (NFCI). (#87411 ) This patch introduces a new VPWidenMemoryRecipe base class and distinct sub-classes to model loads and stores. This is a first step in an effort to simplify and modularize code generation for widened loads and stores and enable adding further more specialized memory recipes. PR: https://github.com/llvm/llvm-project/pull/87411	2024-04-17 11:00:58 +01:00
Florian Hahn	34777c238b	[VPlan] Don't mark VPBlendRecipe as phi-like. VPBlendRecipes don't get lowered to phis and usually do not appear at the beginning of blocks, due to their masks appearing before them. This effectively relaxes an over-eager verifier message. Fixes https://github.com/llvm/llvm-project/issues/88297. Fixes https://github.com/llvm/llvm-project/issues/88804.	2024-04-16 21:24:25 +01:00
Florian Hahn	6254b6dd89	[VPlan] Version VPValue names in VPSlotTracker. (#81411 ) This patch restructures the way names for printing VPValues are handled. It moves the logic to generate names for printing to VPSlotTracker. VPSlotTracker will now version names of the same underlying value if it is used by multiple VPValues, by adding a .V suffix to the name. This fixes cases where at the moment the same name is printed for different VPValues. PR: https://github.com/llvm/llvm-project/pull/81411	2024-04-15 12:27:45 +01:00
Alexey Bataev	413a66f339	[LV, VP]VP intrinsics support for the Loop Vectorizer + adding new tail-folding mode using EVL. (#76172 ) This patch introduces generating VP intrinsics in the Loop Vectorizer. Currently the Loop Vectorizer supports vector predication in a very limited capacity via tail-folding and masked load/store/gather/scatter intrinsics. However, this does not let architectures with active vector length predication support take advantage of their capabilities. Architectures with general masked predication support also can only take advantage of predication on memory operations. By having a way for the Loop Vectorizer to generate Vector Predication intrinsics, which (will) provide a target-independent way to model predicated vector instructions. These architectures can make better use of their predication capabilities. Our first approach (implemented in this patch) builds on top of the existing tail-folding mechanism in the LV (just adds a new tail-folding mode using EVL), but instead of generating masked intrinsics for memory operations it generates VP intrinsics for loads/stores instructions. The patch adds a new VPlanTransforms to replace the wide header predicate compare with EVL and updates codegen for load/stores to use VP store/load with EVL. Other important part of this approach is how the Explicit Vector Length is computed. (VP intrinsics define this vector length parameter as Explicit Vector Length (EVL)). We use an experimental intrinsic `get_vector_length`, that can be lowered to architecture specific instruction(s) to compute EVL. Also, added a new recipe to emit instructions for computing EVL. Using VPlan in this way will eventually help build and compare VPlans corresponding to different strategies and alternatives. Differential Revision: https://reviews.llvm.org/D99750	2024-04-04 18:30:17 -04:00
Florian Hahn	911055e34f	[VPlan] Consistently use (Part, 0) for first lane scalar values (#80271 ) At the moment, some VPInstructions create only a single scalar value, but use VPTransformatState's 'vector' storage for this value. Those values are effectively uniform-per-VF (or in some cases uniform-across-VF-and-UF). Using the vector/per-part storage doesn't interact well with other recipes, that more accurately using (Part, Lane) to look up scalar values and prevents VPInstructions creating scalars from interacting with other recipes working with scalars. This PR tries to unify handling of scalars by using (Part, 0) for scalar values where only the first lane is demanded. This allows using VPInstructions with other recipes like VPScalarCastRecipe and is also needed when using VPInstructions in more cases otuside the vector loop region to generate scalars. Depends on https://github.com/llvm/llvm-project/pull/80269	2024-02-26 19:06:43 +00:00
Florian Hahn	9923d29cfa	[VPlan] Merge main VPlan verifer with HCFG verifier. Unify VPlan verifiers in verifyVPlanIsValid. This adds verification for various properties on blocks to the verifier used for VPlans generated by the inner loop vectorizer. It also adds def-use checks for the verifier used in the VPlan native path. This drops the separate flag to enable HCFG verification. Instead, all VPlans are verified once they have been created, if assertions are enabled. This also removes VPWidenPHIRecipe from VPHeaderPHIRecipe; it is used to model any phi node in the native path.	2024-02-20 16:43:57 +00:00

1 2 3

138 Commits