llvm-project

Author	SHA1	Message	Date
Shih-Po Hung	266ff98cba	[LV][VPlan] Use VF VPValue in VPVectorPointerRecipe (#110974 ) Refactors VPVectorPointerRecipe to use the VF VPValue to obtain the runtime VF, similar to #95305. Since only reverse vector pointers require the runtime VF, the patch sets VPUnrollPart::PartOpIndex to 1 for vector pointers and 2 for reverse vector pointers. As a result, the generation of reverse vector pointers is moved into a separate recipe.	2024-10-26 23:18:50 +08:00
Tex Riddell	c03d09ce3e	[aarch64] atan2 intrinsic lowering (p5) (#112611 ) This change is part of this proposal: https://discourse.llvm.org/t/rfc-all-the-math-intrinsics/78294 - `VecFuncs.def`: define intrinsic to sleef/armpl mapping - `LegalizerHelper.cpp`: add missing fewerElementsVector handling for the new atan2 intrinsic - `AArch64ISelLowering.cpp`: Add arch64 specializations for lowering like neon instructions - `AArch64LegalizerInfo.cpp`: Legalize atan2. Part 5 for Implement the atan2 HLSL Function #70096.	2024-10-24 17:53:12 -07:00
Florian Hahn	ddbb382a7c	[LV] Regenerate check-lines for some tests.	2024-10-23 04:34:13 +01:00
Paul Walker	5bb34803a4	[NFC] Migrate tests to use autoupdate for CHECK lines.	2024-10-22 12:55:15 +00:00
Florian Hahn	c7496cebac	[LV] Use SCEV to check if minimum iteration check is known. (#111310 ) Use SCEV to check if the minimum iteration check (TC < Step) is known to be false. This is a first step towards addressing https://github.com/llvm/llvm-project/issues/111098. To catch the exact case from the issue, we need to do extra work to make sure the wrap flags on the shl are preserved and used by SCEV. Note that skeleton creation will be gradually moved to VPlan and this simplification should be done as VPlan transform eventually. The current plan is to move skeleton creation to VPlan starting from parts closest to the parts already created by VPlan, starting with induction resume value creation (started with https://github.com/llvm/llvm-project/pull/110577), then memory and SCEV checks and finally minimum iteration checks. PR: https://github.com/llvm/llvm-project/pull/111310	2024-10-18 15:22:59 -07:00
Graham Hunter	091a235ec5	Revert "[AArch64][SVE] Enable max vector bandwidth for SVE" (#112873 ) Reverts llvm/llvm-project#109671 Reverting due to some performance regressions on neoverse-v1.	2024-10-18 11:05:55 +01:00
Florian Hahn	b497010854	[VPlan] Use VPInstruction::Name when assigning names (NFCI). This slightly improves the printing of VPInstructions. NFC except debug output.	2024-10-18 05:52:35 +01:00
Yingwei Zheng	095d49da76	[InstCombine] Set `samesign` when converting signed predicates into unsigned (#112642 ) Alive2: https://alive2.llvm.org/ce/z/6cqdt-	2024-10-17 20:43:48 +08:00
Graham Hunter	c980a20b10	[AArch64][SVE] Enable max vector bandwidth for SVE (#109671 ) Returns true for shouldMaximizeVectorBandwidth when the register type is a scalable vector and SVE or streaming SVE are available.	2024-10-17 13:17:24 +01:00
David Sherwood	671976ff59	[NFC][LoopVectorize] Add more simple early exit tests (#112529 ) I realised we are missing tests to cover more loops with multiple early exits - some countable and some uncountable. I've also added a few SVE versions of the test in the AArch64 directory. Once we can vectorise such early exit loops it's a good sanity check to make sure they also vectorise for SVE. Also, for some of the tests I expect there to be some divergence from the same tests in the top level directory once we start vectorising them.	2024-10-17 09:49:51 +01:00
Florian Hahn	3860e29e0e	[VPlan] Mark VPVectorPointerRecipe as not having sideeffects. VectorPointer doesn't read from memory or have any sideeffects. Mark it accordingly.	2024-10-16 06:10:19 +01:00
David Sherwood	72f339de45	[LoopVectorize] Use predicated version of getSmallConstantMaxTripCount (#109928 ) There are a number of places where we call getSmallConstantMaxTripCount without passing a vector of predicates: getSmallBestKnownTC isIndvarOverflowCheckKnownFalse computeMaxVF isMoreProfitable I've changed all of these to now pass in a predicate vector so that we get the benefit of making better vectorisation choices when we know the max trip count for loops that require SCEV predicate checks. I've tried to add tests that cover all the cases affected by these changes.	2024-10-11 10:10:15 +01:00
Florian Hahn	bb937e276d	[LV] Compute value of escaped induction based on the computed end value. (#110576 ) Update fixupIVUsers to compute the value for escaped inductions using the already computed end value of the induction (EndValue), but subtracting the step. This results in slightly simpler codegen, as we avoid computing the full transformed index at VectorTripCount - 1. PR: https://github.com/llvm/llvm-project/pull/110576	2024-10-10 20:04:46 +01:00
Florian Hahn	6fbbe152fa	[VPlan] Introduce VPWidenIntrinsicRecipe to separate from libcall. (#110486 ) This patch splits off intrinsic hanlding to a new VPWidenIntrinsicRecipe. VPWidenIntrinsicRecipes only need access to the intrinsic ID to widen and the scalar result type (in case the intrinsic is overloaded on the result type). It does not need access to an underlying IR call instruction or function. This means VPWidenIntrinsicRecipe can be created easily without access to underlying IR.	2024-10-08 22:37:20 +01:00
Florian Hahn	36fc291b6e	[VPlan] Implement VPBlendRecipe::computeCost. Implement VPBlendRecipe::computeCost. VPBlendRecipe is currently is also used if only the first lane is used. This also requires pre-computing costs for forced scalars and instructions considered profitable to scalarize. For those, the cost will be computed separately in the legacy cost model. This will also be needed when implementing VPReplicateRecipe::computeCost.	2024-10-08 21:33:42 +01:00
Florian Hahn	3ec6f805c5	[VPlan] Don't created GEP x, 0 for interleave group pointers. The GEP with offet 0 is redundant, remove it. This addresses a TODO from 7f74651837b ((#106431).	2024-10-08 12:08:13 +01:00
Florian Hahn	7f74651837	[VPlan] Use pointer to member 0 as VPInterleaveRecipe's pointer arg. (#106431 ) Update VPInterleaveRecipe to always use the pointer to member 0 as pointer argument. This in many cases helps to remove unneeded index adjustments and simplifies VPInterleaveRecipe::execute. In some rare cases, the address of member 0 does not dominate the insert position of the interleave group. In those cases a PtrAdd VPInstruction is emitted to compute the address of member 0 based on the address of the insert position. Alternatively we could hoist the recipe computing the address of member 0.	2024-10-06 22:53:13 +01:00
Benjamin Maxwell	01a1398971	[AArch64][Test] Update test variable names (NFC) (#110667 ) Simply by running update_test_checks.py with no changes. This is to make updating these tests for later changes easier.	2024-10-03 16:14:21 +01:00
Nikita Popov	9f3d1695eb	[SCEVExpander] Preserve gep nuw during expansion (#102133 ) When expanding SCEV adds to geps, transfer the nuw flag to the resulting gep. (Note that this doesn't apply to IV increment GEPs, which go through a different code path.)	2024-10-02 11:45:00 +02:00
Florian Hahn	0344123ffb	[VPlan] Manage FMFs for VPWidenCall via VPRecipeWithIRFlags. (NFC) Update VPWidenCallRecipe to manage fast-math flags directly via VPRecipeWithIRFlags. This addresses a TODO and allows adjusting the FMFs directly on the recipe. Also fixes printing for flags for VPWidenCallRecipe.	2024-10-01 13:20:34 +01:00
Graham Hunter	6f1a8c2da2	[LV] Vectorize histogram operations (#99851 ) This patch implements autovectorization support for the 'all-in-one' histogram intrinsic, which seems to have more support than the 'standalone' intrinsic. See https://discourse.llvm.org/t/rfc-vectorization-support-for-histogram-count-operations/74788/ for an overview of the work and my notes on the tradeoffs between the two approaches.	2024-09-27 13:08:55 +01:00
Benjamin Maxwell	50a1ab12ab	[LAA] Don't assume libcalls with output/input pointers can be vectorized (#108980 ) LoopAccessAnalysis currently does not check/track aliasing from the output pointers, but assumes vectorizing library calls with a mapping is safe. This can result in incorrect codegen if something like the following is vectorized: ``` for(int i=0; i<N; i++) { // No aliasing between input and output pointers detected. sincos(cos_out[0], sin_out+i, cos_out+i); } ``` Where for VF >= 2 `cos_out[1]` to `cos_out[VF-1]` is the cosine of the original value of `cos_out[0]` not the updated value.	2024-09-23 16:05:55 +01:00
Graham Hunter	785337e2d9	[LV][AArch64] Don't query registers for illegal scalable vector elts (#109411 ) When trying to maximize vector bandwidth we ask TTI for the number of registers required for a given operation. If the type of that operation happens to be something illegal for scalable vectors (e.g. <vscale x 4 x fp128>) then we would see a crash. Instead, just return a default value and let the cost model reject the invalid operation later.	2024-09-23 13:35:23 +01:00
Florian Hahn	53266f73f0	[VPlan] Run DCE after unrolling. This cleans up a number of dead recipes after unrolling if only their first or last parts are used. This simplifies a number of tests. Fixes https://github.com/llvm/llvm-project/issues/109581.	2024-09-22 22:08:46 +01:00
Florian Hahn	8ec406757c	[VPlan] Implement unrolling as VPlan-to-VPlan transform. (#95842 ) This patch implements explicit unrolling by UF as VPlan transform. In follow up patches this will allow simplifying VPTransform state (no need to store unrolled parts) as well as recipe execution (no need to generate code for multiple parts in an each recipe). It also allows for more general optimziations (e.g. avoid generating code for recipes that are uniform-across parts). It also unifies the logic dealing with unrolled parts in a single place, rather than spreading it out across multiple places (e.g. VPlan post processing for header-phi recipes previously.) In the initial implementation, a number of recipes still take the unrolled part as additional, optional argument, if their execution depends on the unrolled part. The computation for start/step values for scalable inductions changed slightly. Previously the step would be computed as scalar and then splatted, now vscale gets splatted and multiplied by the step in a vector mul. This has been split off https://github.com/llvm/llvm-project/pull/94339 which also includes changes to simplify VPTransfomState and recipes' ::execute. The current version mostly leaves existing ::execute untouched and instead sets VPTransfomState::UF to 1. A follow-up patch will clean up all references to VPTransformState::UF. Another follow-up patch will simplify VPTransformState to only store a single vector value per VPValue. PR: https://github.com/llvm/llvm-project/pull/95842	2024-09-21 19:47:37 +01:00
Florian Hahn	58e05779b4	[LV] Move test requiring AArch64 to target subdir. The test added in bd8fe9972e3f depends on the AArch64. Move it.	2024-09-21 12:54:59 +01:00
Florian Hahn	4eb9838409	[VPlan] Generalize VPValue::isDefinedOutsideLoopRegions. Update isDefinedOutsideLoopRegions to check if a recipe is defined outside any region. Split off already approved https://github.com/llvm/llvm-project/pull/95842 now that this can be tested separately after landing VPlan-based LICM https://github.com/llvm/llvm-project/issues/107501	2024-09-20 15:34:00 +01:00
Florian Hahn	a861ed411a	[VPlan] Add initial loop-invariant code motion transform. (#107894 ) Add initial transform to move out loop-invariant recipes. This also helps to fix a divergence between legacy and VPlan-based cost model due to legacy using ScalarEvolution::isLoopInvariant in some cases. Fixes https://github.com/llvm/llvm-project/issues/107501. PR: https://github.com/llvm/llvm-project/pull/107894	2024-09-20 11:22:03 +01:00
Florian Hahn	e584278289	[LV] Update tests to avoid loop invariant instructions. Update some tests with loop invariant instructions so the instructions cannot be hoisted out. This preserves the original test intention after https://github.com/llvm/llvm-project/pull/107894.	2024-09-19 18:50:10 +01:00
Shih-Po Hung	ffcff2f465	[VPlan][NFC] Fix the value name of VECTOR_GEP (#107544 ) This patch passes the string `"vector.gep"` to CreateGEP instead of CreateMul.	2024-09-18 19:22:36 +08:00
Florian Hahn	012dbec604	[VPlan] Handle ForceTargetInstructionCost in during precomputeCosts. Make sure ForceTargetInstruction is respected in precomputeCosts.	2024-09-15 10:53:43 +01:00
Florian Hahn	ea83e1c05a	[LV] Assign cost to all interleave members when not interleaving. At the moment, the full cost of all interleave group members is assigned to the instruction at the group's insert position, even if the decision was to not form an interleave group. This can lead to inaccurate cost estimates, e.g. if the instruction at the insert position is dead. If the decision is to not vectorize but scalarize or scather/gather, then the cost will be to total cost for all members. In those cases, assign individual the cost per member, to more closely reflect to choice per instruction. This fixes a divergence between legacy and VPlan-based cost model. Fixes https://github.com/llvm/llvm-project/issues/108098.	2024-09-11 21:04:34 +01:00
Florian Hahn	a794ee4559	[VPlan] Add VPValue for VF, use it for VPWidenIntOrFpInductionRecipe. (#95305 ) Similar to VFxUF, also add a VF VPValue to VPlan and use it to get the runtime VF in VPWidenIntOrFpInductionRecipe. Code for VF is only generated if there are users of VF, to avoid unnecessary test changes. PR: https://github.com/llvm/llvm-project/pull/95305	2024-09-10 10:41:35 +01:00
Florian Hahn	aa158bf402	[LV] Update tests to replace some code with loop varying instructions. Update some tests with loop-invariant instructions, where hoisting them out of the loop changes the vectorization decision. This should preserve their original spirit when making further improvements.	2024-09-09 14:10:12 +01:00
Florian Hahn	3bd161e98d	[LV] Honor forced scalars in setVectorizedCallDecision. Similarly to dd94537b4, setVectorizedCallDecision also did not consider ForcedScalars. This lead to VPlans not reflecting the decision by the legacy cost model (cost computation would use scalar cost, VPlan would have VPWidenCallRecipe). To fix this, check if the call has been forced to scalar in setVectorizedCallDecision. Note that this requires moving setVectorizedCallDecision after collectLoopUniforms (which sets ForcedScalars). collectLoopUniforms does not depend on call decisions and can safely be moved. Fixes https://github.com/llvm/llvm-project/issues/107051.	2024-09-03 21:06:32 +01:00
Philip Reames	1fbb6b4efc	[LV] Prefer FLT_MIN/MAX for fmin/fmax reductions with ninf (#107141 ) Analogous to 2c7786e94a1058bd4f96794a1d4f70dcb86e5cc5, cleanup a case where the vectorizer is emitting a non-canonical identity value given the available flags. We use largest/smallest value during ISEL, and VP expansion, but not during vectorization. Since the fmin/fmax/fminimum/fmaximum intrinsics don't require a start value, this difference is only visible when masking of inactive lanes is required. Primary motivation of this change is simply to remove a difference between version of code which reason about the identity value of a reduction so I can kill all but one off. In review, it was pointed out that this is actually a functional fix as well. The old code used inf on a noinf reduction instruction - whose result is poison! That wasn't the intent of the code.	2024-09-03 12:21:54 -07:00
Philip Reames	2c7786e94a	Prefer use of 0.0 over -0.0 for fadd reductions w/nsz (in IR) (#106770 ) This is a follow up to 924907bc6, and is mostly motivated by consistency but does include one additional optimization. In general, we prefer 0.0 over -0.0 as the identity value for an fadd. We use that value in several places, but don't in others. So, let's be consistent and use the same identity (when nsz allows) everywhere. This creates a bunch of test churn, but due to 924907bc6, most of that churn doesn't actually indicate a change in codegen. The exception is that this change enables the use of 0.0 for nsz, but not reasoc, fadd reductions. Or said differently, it allows the neutral value of an ordered fadd reduction to be 0.0.	2024-09-03 09:16:37 -07:00
Florian Hahn	dd94537b40	[LV] Update call widening decision when scalarzing calls. collectInstsToScalarize may decide to scalarize a call. If so, we have to update the widening decision for the call, otherwise the call won't be scalarized as expected during VPlan construction. This issue was uncovered by f82543d509.	2024-09-03 14:12:41 +01:00
Florian Hahn	954ed05c10	[VPlan] Simplify MUL operands at recipe construction. This moves the logic to create simplified operands using SCEV to MUL recipe creation. This is needed to match the behavior of the legacy's cost model. TODOs are to extend to other opcodes and move to a transform. Note that this also restricts the number of SCEV simplifications we apply to more precisely match the cases handled by the legacy cost model. Fixes https://github.com/llvm/llvm-project/issues/107015.	2024-09-02 21:25:31 +01:00
Florian Hahn	50a02e7c68	[VPlan] Pass intrinsic inst to TTI in VPWidenCallRecipe::computeCost. Follow-up to 9ccf825, adjust computeCost to also pass IntrinsicInst to TTI if available, as there are multiple places in TTI which use the IntrinsicInst. Fixes https://github.com/llvm/llvm-project/issues/107016.	2024-09-02 20:47:37 +01:00
Florian Hahn	b0de7fa466	[VPlan] Use op from underlying call in computeCost if needed. This fixes a divergence between legacy and VPlan-based cost model, e.g. if one of the operands has an first-order recurrence phi as operand.	2024-09-02 14:00:10 +01:00
Yingwei Zheng	380fa875ab	[InstCombine] Replace all dominated uses of condition with constants (#105510 ) This patch replaces all dominated uses of condition with true/false to improve context-sensitive optimizations. It eliminates a bunch of branches in llvm-opt-benchmark. As a side effect, it may introduce new phi nodes in some corner cases. See the following case: ``` define i1 @test(i1 %cmp, i1 %cond) { entry: br i1 %cond, label %bb1, label %bb2 bb1: br i1 %cmp, label %if.then, label %if.else if.then: br %bb2 if.else: br %bb2 bb2: %res = phi i1 [%cmp, %entry], [%cmp, %if.then], [%cmp, %if.else] ret i1 %res } ``` It will be simplified into: ``` define i1 @test(i1 %cmp, i1 %cond) { entry: br i1 %cond, label %bb1, label %bb2 bb1: br i1 %cmp, label %if.then, label %if.else if.then: br %bb2 if.else: br %bb2 bb2: %res = phi i1 [%cmp, %entry], [true, %if.then], [false, %if.else] ret i1 %res } ``` I am planning to fix this in late pipeline/CGP since this problem exists before the patch.	2024-09-01 09:49:23 +08:00
Philip Reames	4b553f4916	Regen a bunch of vectorizer tests to avoid naming churn in upcoming review	2024-08-30 10:13:02 -07:00
Paul Walker	ce5620ba9a	[LLVM][VPlan] Pick more optimal initial value for VPBlend. (#104019 ) By choosing an initial value whose mask is only used by the blend we can remove the need for the mask entirely.	2024-08-30 13:30:23 +01:00
Maciej Gabka	95d2d1cba0	Move stepvector intrinsic out of experimental namespace (#98043 ) This patch is moving out stepvector intrinsic from the experimental namespace. This intrinsic exists in LLVM for several years now, and is widely used.	2024-08-28 12:48:20 +01:00
Florian Hahn	885c4365c1	[VPlan] Skip branches marked as dead in cost precomputation. Don't consider the cost of branches marked to be skipped in VPlan cost pre-computation. Those aren't included in the legacy cost, so they should not be included in the VPlan cast.	2024-08-23 15:58:29 +01:00
Nikita Popov	a105877646	[InstCombine] Remove some of the complexity-based canonicalization (#91185 ) The idea behind this canonicalization is that it allows us to handle less patterns, because we know that some will be canonicalized away. This is indeed very useful to e.g. know that constants are always on the right. However, this is only useful if the canonicalization is actually reliable. This is the case for constants, but not for arguments: Moving these to the right makes it look like the "more complex" expression is guaranteed to be on the left, but this is not actually the case in practice. It fails as soon as you replace the argument with another instruction. The end result is that it looks like things correctly work in tests, while they actually don't. We use the "thwart complexity-based canonicalization" trick to handle this in tests, but it's often a challenge for new contributors to get this right, and based on the regressions this PR originally exposed, we clearly don't get this right in many cases. For this reason, I think that it's better to remove this complexity canonicalization. It will make it much easier to write tests for commuted cases and make sure that they are handled.	2024-08-21 12:02:54 +02:00
Florian Hahn	42555cdba4	[VPlan] Run VPlan optimizations on plans in native path. Update buildVPlans (used in native path) to also run general VPlan optimizations in another small step to align both codepaths.	2024-08-15 13:05:51 +01:00
Paul Walker	9e318bac5b	[LLVM] Regenerate some test outputs for llvm/test/Transforms/LoopVectorize.	2024-08-14 10:59:46 +00:00
Madhur Amilkanthwar	b73771cf0f	[AArch64] Increase scatter overhead on Neoverse-V2 (#101296 ) This patch increases scatter overhead on Neoverse-V2 to 13. This benefits s128 kernel from TSVC_2 test suite. SPEC 17, RAJAPerf, and Sptter are unaffected by this patch. This patch boosts s128 kernel's performance from TSVC test suite by about 40% as this enables vectorization. Also, handle minor code refactoring for gather related part.	2024-08-14 10:12:40 +05:30

1 2 3 4 5 ...

655 Commits