llvm-project

Author	SHA1	Message	Date
Rohit Aggarwal	dfb60bb919	Adding more vector calls for -fveclib=AMDLIBM (#109662 ) AMD has it's own implementation of vector calls. New vector calls are introduced in the library for exp10, log10, sincos and finite asin/acos Please refer [https://github.com/amd/aocl-libm-ose] --------- Co-authored-by: Rohit Aggarwal <Rohit.Aggarwal@amd.com>	2024-10-29 10:09:55 +00:00
Florian Hahn	0d0abb351b	[VPlan] Use ResumePhi to create reduction resume phis. (#110004 ) Use VPInstruction::ResumePhi to create phi nodes for reduction resume values in the scalar preheader, similar to how ResumePhis are used for first-order recurrence resume values after 9a5a8731e77. This allows simplifying createAndCollectMergePhiForReduction to only collect reduction resume phis when vectorizing epilogue loops and adding extra incoming edges from the main vector loop. Updating phis for the epilogue vector loops requires special attention, because additional incoming values from the bypass blocks need to be added. PR: https://github.com/llvm/llvm-project/pull/110004	2024-10-28 20:14:08 +01:00
Shih-Po Hung	266ff98cba	[LV][VPlan] Use VF VPValue in VPVectorPointerRecipe (#110974 ) Refactors VPVectorPointerRecipe to use the VF VPValue to obtain the runtime VF, similar to #95305. Since only reverse vector pointers require the runtime VF, the patch sets VPUnrollPart::PartOpIndex to 1 for vector pointers and 2 for reverse vector pointers. As a result, the generation of reverse vector pointers is moved into a separate recipe.	2024-10-26 23:18:50 +08:00
Florian Hahn	e724226da7	[VPlan] Return cost of 0 for VPWidenCastRecipe without underlying value. In some cases, VPWidenCastRecipes are created but not considered in the legacy cost model, including truncates/extends when evaluating a reduction in a smaller type. Return 0 for such casts for now, to avoid divergences between VPlan and legacy cost models. Fixes https://github.com/llvm/llvm-project/issues/113526.	2024-10-25 21:25:44 +02:00
Tex Riddell	c03d09ce3e	[aarch64] atan2 intrinsic lowering (p5) (#112611 ) This change is part of this proposal: https://discourse.llvm.org/t/rfc-all-the-math-intrinsics/78294 - `VecFuncs.def`: define intrinsic to sleef/armpl mapping - `LegalizerHelper.cpp`: add missing fewerElementsVector handling for the new atan2 intrinsic - `AArch64ISelLowering.cpp`: Add arch64 specializations for lowering like neon instructions - `AArch64LegalizerInfo.cpp`: Legalize atan2. Part 5 for Implement the atan2 HLSL Function #70096.	2024-10-24 17:53:12 -07:00
Florian Hahn	2dfb1c664c	[VPlan] Try to hoist Previous (and operands), if sinking fails for FORs. (#108945 ) In some cases, Previous (and its operands) can be hoisted. This allows supporting additional cases where sinking of all users of to FOR fails, e.g. due having to sink recipes with side-effects. This fixes a crash where we fail to create a scalar VPlan for a first-order recurrence, but can create a vector VPlan, because the trunc instruction of an IV which generates the previous value of the recurrence has been optimized to a truncated induction recipe, thus hoisting it to the beginning. Fixes https://github.com/llvm/llvm-project/issues/106523. PR: https://github.com/llvm/llvm-project/pull/108945	2024-10-23 13:12:03 -07:00
Florian Hahn	ddbb382a7c	[LV] Regenerate check-lines for some tests.	2024-10-23 04:34:13 +01:00
Paul Walker	5bb34803a4	[NFC] Migrate tests to use autoupdate for CHECK lines.	2024-10-22 12:55:15 +00:00
Ramkumar Ramachandra	f719cfa868	LAA: be less conservative in isNoWrap (#112553 ) isNoWrap has exactly one caller which handles Assume = true separately, but too conservatively. Instead, pass Assume to isNoWrap, so it is threaded into getPtrStride, which has the correct handling for the Assume flag. Also note that the Stride == 1 check in isNoWrap is incorrect: getPtrStride returns Strides == 1 or -1, except when isNoWrapAddRec or Assume are true, assuming ShouldCheckWrap is true; we can include the case of -1 Stride, and when isNoWrapAddRec is true. With this change, passing Assume = true to getPtrStride could return a non-unit stride, and we correctly handle that case as well.	2024-10-22 09:55:51 +01:00
Florian Hahn	2a6b09e0d3	[LV] Use type from InsertPos for cost computation of interleave groups. Previously the legacy cost model would pick the type for the cost computation depending on the order of the members in the input IR. This is incompatible with the VPlan-based cost model (independent of original IR order) and also doesn't match code-gen, which uses the type of the insert position. Update the legacy cost model to use the type (and address space) from the Group's insert position. This brings the legacy cost model in line with the legacy cost model and fixes a divergence between both models. Note that the X86 cost model seems to assign different costs to groups with i64 and double types. Added a TODO to check. Fixes https://github.com/llvm/llvm-project/issues/112922.	2024-10-18 19:12:40 -07:00
Florian Hahn	c7496cebac	[LV] Use SCEV to check if minimum iteration check is known. (#111310 ) Use SCEV to check if the minimum iteration check (TC < Step) is known to be false. This is a first step towards addressing https://github.com/llvm/llvm-project/issues/111098. To catch the exact case from the issue, we need to do extra work to make sure the wrap flags on the shl are preserved and used by SCEV. Note that skeleton creation will be gradually moved to VPlan and this simplification should be done as VPlan transform eventually. The current plan is to move skeleton creation to VPlan starting from parts closest to the parts already created by VPlan, starting with induction resume value creation (started with https://github.com/llvm/llvm-project/pull/110577), then memory and SCEV checks and finally minimum iteration checks. PR: https://github.com/llvm/llvm-project/pull/111310	2024-10-18 15:22:59 -07:00
Alexey Bataev	f148d5791b	[LV]Initial support for safe distance in predicated DataWithEVL vectorization mode. Enabled initial support for max safe distance in DataWithEVL mode. If max safe distance is required, need to emit special code: CMP = icmp ult AVL, MAX_SAFE_DISTANCE SAFE_AVL = select CMP, AVL, MAX_SAFE_DISTANCE EVL = call i32 @llvm.experimental.get.vector.length(i64 SAFE_AVL) while vectorize the loop in DataWithEVL tail folding mode. Reviewers: fhahn Reviewed By: fhahn Pull Request: https://github.com/llvm/llvm-project/pull/102897	2024-10-18 15:51:49 -04:00
Graham Hunter	091a235ec5	Revert "[AArch64][SVE] Enable max vector bandwidth for SVE" (#112873 ) Reverts llvm/llvm-project#109671 Reverting due to some performance regressions on neoverse-v1.	2024-10-18 11:05:55 +01:00
Florian Hahn	b497010854	[VPlan] Use VPInstruction::Name when assigning names (NFCI). This slightly improves the printing of VPInstructions. NFC except debug output.	2024-10-18 05:52:35 +01:00
Florian Hahn	b060661da8	[SCEVExpander] Expand UDiv avoiding UB when in seq_min/max. (#92177 ) Update SCEVExpander to introduce an SafeUDivMode, which is set when expanding operands of SCEVSequentialMinMaxExpr. In this mode, the expander will make sure that the divisor of the expanded UDiv is neither 0 nor poison. Fixes https://github.com/llvm/llvm-project/issues/89958. PR https://github.com/llvm/llvm-project/pull/92177	2024-10-17 13:55:20 -07:00
David Sherwood	76f3776185	[NFC][LoopVectorize] Restructure simple early exit tests (#112721 ) The previous simple_early_exit.ll was growing too large and difficult to manage. Instead I've decided to refactor the tests by splitting out into notional groups: 1. single_early_exit.ll: loops with a single uncountable exit that do not have live-outs from the loop. 2. single_early_exit_live_outs.ll: loops with a single uncountable exit with live-outs. 3. multi_early_exit.ll: loops with multiple early exits, i.e. a mixture of countable and uncountable exits, but with no live-outs from the loop. 4. multi_early_exit_live_outs.ll: as above, but with live-outs. 5. single_early_exit_unsafe_ptrs.ll: loops with a single uncountable exit, but with pointers that are not unconditionally dereferenceable. 6. unsupported_early_exit.ll: loops with uncountable exits that we cannot yet vectorise. 7. early_exit_legality.ll: tests the debug output from LoopVectorizationLegality to make sure we handle different scenarios correctly. Only the last test now requires asserts. Over time some of these tests should start vectorising as more support is added. I also tried to rename the multi early exit tests to make it clear there what mixture of countable and uncountable exits are present.	2024-10-17 16:50:59 +01:00
Yingwei Zheng	095d49da76	[InstCombine] Set `samesign` when converting signed predicates into unsigned (#112642 ) Alive2: https://alive2.llvm.org/ce/z/6cqdt-	2024-10-17 20:43:48 +08:00
Graham Hunter	c980a20b10	[AArch64][SVE] Enable max vector bandwidth for SVE (#109671 ) Returns true for shouldMaximizeVectorBandwidth when the register type is a scalable vector and SVE or streaming SVE are available.	2024-10-17 13:17:24 +01:00
David Sherwood	671976ff59	[NFC][LoopVectorize] Add more simple early exit tests (#112529 ) I realised we are missing tests to cover more loops with multiple early exits - some countable and some uncountable. I've also added a few SVE versions of the test in the AArch64 directory. Once we can vectorise such early exit loops it's a good sanity check to make sure they also vectorise for SVE. Also, for some of the tests I expect there to be some divergence from the same tests in the top level directory once we start vectorising them.	2024-10-17 09:49:51 +01:00
Florian Hahn	24423107ab	[LV] Add additional trip count expansion tests for #92177 . Extra tests for https://github.com/llvm/llvm-project/pull/92177, split off the PR.	2024-10-16 07:40:04 +01:00
Florian Hahn	bbff5b8891	[VPlan] Use alloc-type to compute interleave group offset. Use getAllocTypeSize to get compute the offset to the start of interleave groups instead getScalarSizeInBits, which may return 0 for pointers. This is in line with the analysis building the interleave groups and fixes a mis-compile reported for https://github.com/llvm/llvm-project/pull/106431.	2024-10-16 07:21:58 +01:00
Florian Hahn	cc5b5ca34b	[LV] Add test where interleave group start pointer is incorrect. Test case from https://github.com/llvm/llvm-project/pull/106431.	2024-10-16 07:13:38 +01:00
Florian Hahn	3860e29e0e	[VPlan] Mark VPVectorPointerRecipe as not having sideeffects. VectorPointer doesn't read from memory or have any sideeffects. Mark it accordingly.	2024-10-16 06:10:19 +01:00
Florian Hahn	34cdd67c85	[VPlan] Use VPWidenIntrinsicRecipe to vp.select. (#110489 ) Use VPWidenIntrinsicRecipe (https://github.com/llvm/llvm-project/pull/110486) to create vp.select intrinsics. This potentially offers an alternative to duplicating EVL recipes for all existing recipes. There are some recipes that will need duplicates (at least at the moment), due to extra code-gen needs (e.g. widening loads and stores). But in cases the intrinsic can directly be used, creating the widened intrinsic directly would reduce the need to duplicate some recipes. PR: https://github.com/llvm/llvm-project/pull/110489	2024-10-15 21:48:15 +01:00
Florian Hahn	7f06d8afb0	[SCEV] Retain SCEVSequentialMinMaxExpr if an operand may trigger UB. (#110824 ) Retain SCEVSequentialMinMaxExpr if an operand may trigger UB, e.g. if there is an UDiv operand that may divide by 0 or poison PR: https://github.com/llvm/llvm-project/pull/110824	2024-10-14 13:08:49 +01:00
Florian Hahn	65da32c634	[LV] Account for any-of reduction when computing costs of blend phis. Any-of reductions are narrowed to i1. Update the legacy cost model to use the correct type when computing the cost of a phi that gets lowered to selects (BLEND). This fixes a divergence between legacy and VPlan-based cost models after 36fc291b6ec6d. Fixes https://github.com/llvm/llvm-project/issues/111874.	2024-10-11 11:27:22 +01:00
David Sherwood	72f339de45	[LoopVectorize] Use predicated version of getSmallConstantMaxTripCount (#109928 ) There are a number of places where we call getSmallConstantMaxTripCount without passing a vector of predicates: getSmallBestKnownTC isIndvarOverflowCheckKnownFalse computeMaxVF isMoreProfitable I've changed all of these to now pass in a predicate vector so that we get the benefit of making better vectorisation choices when we know the max trip count for loops that require SCEV predicate checks. I've tried to add tests that cover all the cases affected by these changes.	2024-10-11 10:10:15 +01:00
Florian Hahn	bb937e276d	[LV] Compute value of escaped induction based on the computed end value. (#110576 ) Update fixupIVUsers to compute the value for escaped inductions using the already computed end value of the induction (EndValue), but subtracting the step. This results in slightly simpler codegen, as we avoid computing the full transformed index at VectorTripCount - 1. PR: https://github.com/llvm/llvm-project/pull/110576	2024-10-10 20:04:46 +01:00
Florian Hahn	01cbbc52dc	[VPlan] Request lane 0 for pointer arg in PtrAdd. After 7f74651, the pointer operand may be replicated of a PtrAdd. Instead of requesting a single scalar, request lane 0, which correctly handles the case when there is a scalar-per-lane. Fixes https://github.com/llvm/llvm-project/issues/111606.	2024-10-09 13:18:54 +01:00
Florian Hahn	6fbbe152fa	[VPlan] Introduce VPWidenIntrinsicRecipe to separate from libcall. (#110486 ) This patch splits off intrinsic hanlding to a new VPWidenIntrinsicRecipe. VPWidenIntrinsicRecipes only need access to the intrinsic ID to widen and the scalar result type (in case the intrinsic is overloaded on the result type). It does not need access to an underlying IR call instruction or function. This means VPWidenIntrinsicRecipe can be created easily without access to underlying IR.	2024-10-08 22:37:20 +01:00
Florian Hahn	36fc291b6e	[VPlan] Implement VPBlendRecipe::computeCost. Implement VPBlendRecipe::computeCost. VPBlendRecipe is currently is also used if only the first lane is used. This also requires pre-computing costs for forced scalars and instructions considered profitable to scalarize. For those, the cost will be computed separately in the legacy cost model. This will also be needed when implementing VPReplicateRecipe::computeCost.	2024-10-08 21:33:42 +01:00
Florian Hahn	3ec6f805c5	[VPlan] Don't created GEP x, 0 for interleave group pointers. The GEP with offet 0 is redundant, remove it. This addresses a TODO from 7f74651837b ((#106431).	2024-10-08 12:08:13 +01:00
Luke Lau	366e469db9	[RISCV] Add cost tests for more interleave factors. NFC This shows how we're not properly scaling the cost with the number of factors, i.e. a factor 8 interleave costs the same as a factor 2 interleave at VF=2.	2024-10-08 17:16:17 +08:00
Luke Lau	e98875af4c	[RISCV] Add scalable interleave cost tests. NFC This gets the cost from the recipe output rather than the individual instruction cost. The factor 3 test was left alone since we don't support anything else other than factor 2 for scalable vectors currently.	2024-10-08 13:39:13 +08:00
Florian Hahn	7f74651837	[VPlan] Use pointer to member 0 as VPInterleaveRecipe's pointer arg. (#106431 ) Update VPInterleaveRecipe to always use the pointer to member 0 as pointer argument. This in many cases helps to remove unneeded index adjustments and simplifies VPInterleaveRecipe::execute. In some rare cases, the address of member 0 does not dominate the insert position of the interleave group. In those cases a PtrAdd VPInstruction is emitted to compute the address of member 0 based on the address of the insert position. Alternatively we could hoist the recipe computing the address of member 0.	2024-10-06 22:53:13 +01:00
Florian Hahn	89d2a9de05	[VPlan] Add additional FOR hoisting test. Additional tests for https://github.com/llvm/llvm-project/pull/108945.	2024-10-06 14:36:33 +01:00
Florian Hahn	45b526afa2	[LV] Honor uniform-after-vectorization in setVectorizedCallDecision. The legacy cost model always computes the cost for uniforms as cost of VF = 1, but VPWidenCallRecipes would be created, as setVectorizedCallDecisions would not consider uniform calls. Fix setVectorizedCallDecision to set to Scalarize, if the call is uniform-after-vectorization. This fixes a bug in VPlan construction uncovered by the VPlan-based cost model. Fixes https://github.com/llvm/llvm-project/issues/111040.	2024-10-06 10:35:06 +01:00
Florian Hahn	68210c7c26	[VPlan] Only generate first lane for VPPredInstPHI if no others used. IF only the first lane of the result is used, only generate the first lane. Fixes https://github.com/llvm/llvm-project/issues/111042.	2024-10-05 19:15:05 +01:00
Shih-Po Hung	26fca7256e	[VPlan][NFC] Use patterns in test check (#111086 )	2024-10-04 17:19:07 +08:00
Benjamin Maxwell	01a1398971	[AArch64][Test] Update test variable names (NFC) (#110667 ) Simply by running update_test_checks.py with no changes. This is to make updating these tests for later changes easier.	2024-10-03 16:14:21 +01:00
Florian Hahn	9de327c94d	[LV] Generalize predication checks from 2c8836c899 for operands. This fixes another case where the VPlan-based and legacy cost models disagree. If any of the operands is predicated, it can't be trivially hoisted and we should consider the cost for evaluating it each loop iteration. Fixes https://github.com/llvm/llvm-project/issues/108697.	2024-10-02 20:16:41 +01:00
Florian Hahn	6d6eea92e3	[LV] Use SCEV to simplify wide binop operand to constant. The legacy cost model uses SCEV to determine if the second operand of a binary op is a constant. Update the VPlan construction logic to mirror the current legacy behavior, to fix a difference in the cost models. Fixes https://github.com/llvm/llvm-project/issues/109528. Fixes https://github.com/llvm/llvm-project/issues/110440.	2024-10-02 13:45:49 +01:00
Nikita Popov	9f3d1695eb	[SCEVExpander] Preserve gep nuw during expansion (#102133 ) When expanding SCEV adds to geps, transfer the nuw flag to the resulting gep. (Note that this doesn't apply to IV increment GEPs, which go through a different code path.)	2024-10-02 11:45:00 +02:00
David Sherwood	0b2403197f	[LoopVectorize] In LoopVectorize.cpp start using getSymbolicMaxBackedgeTakenCount (#108833 ) LoopVectorizationLegality currently only treats a loop as legal to vectorise if PredicatedScalarEvolution::getBackedgeTakenCount returns a valid SCEV, or more precisely that the loop must have an exact backedge taken count. Therefore, in LoopVectorize.cpp we can safely replace all calls to getBackedgeTakenCount with calls to getSymbolicMaxBackedgeTakenCount, since the result is the same. This also helps prepare the loop vectoriser for PR #88385.	2024-10-02 10:28:54 +01:00
Florian Hahn	0344123ffb	[VPlan] Manage FMFs for VPWidenCall via VPRecipeWithIRFlags. (NFC) Update VPWidenCallRecipe to manage fast-math flags directly via VPRecipeWithIRFlags. This addresses a TODO and allows adjusting the FMFs directly on the recipe. Also fixes printing for flags for VPWidenCallRecipe.	2024-10-01 13:20:34 +01:00
Ramkumar Ramachandra	f2ad39b77b	LV/test: improve a couple of tests, regen with UTC (#107225 ) Add noalias, where applicable, to eliminate unnecessary memory check, and regen with UTC.	2024-09-30 15:47:17 +01:00
Mel Chen	f8373cb0f9	[LV] Reuse VPReplicateRecipe to handle scalar stores in exit block. (#106342 ) This patch separates the computation of the final reduction result and the intermediate stores of reduction. --------- Co-authored-by: Florian Hahn <flo@fhahn.com>	2024-09-30 15:35:09 +08:00
Florian Hahn	2c8836c899	[LV] Don't consider predicated insts as invariant unconditionally in CM. Predicated instructions cannot hoisted trivially, so don't treat them as uniform value in the cost model. This fixes a difference between legacy and VPlan-based cost model. Fixes https://github.com/llvm/llvm-project/issues/110295.	2024-09-29 20:31:24 +01:00
Florian Hahn	2f7ccaf4a8	[SCEV] Add predicate in SolveLinEq to ensure B is a multiple of A. (#108777 ) This can help in cases where pointer alignment info is missing, e.g. https://github.com/llvm/llvm-project/pull/108210 The predicate is formed for the complex expression that's passed to SolveLinEquationWithOverflow and the checks could probably be pushed closer to the root nodes, which in some cases may be cheaper to check. PR: https://github.com/llvm/llvm-project/pull/108777	2024-09-28 14:19:57 +01:00
Graham Hunter	6f1a8c2da2	[LV] Vectorize histogram operations (#99851 ) This patch implements autovectorization support for the 'all-in-one' histogram intrinsic, which seems to have more support than the 'standalone' intrinsic. See https://discourse.llvm.org/t/rfc-vectorization-support-for-histogram-count-operations/74788/ for an overview of the work and my notes on the tradeoffs between the two approaches.	2024-09-27 13:08:55 +01:00

1 2 3 4 5 ...

2709 Commits