llvm-project

Author	SHA1	Message	Date
Florian Hahn	17bad1a9da	[LV] Bail out on header phis in shouldConsiderInvariant. This fixes an infinite recursion in rare cases. Fixes https://github.com/llvm/llvm-project/issues/113794.	2024-11-01 20:51:25 +00:00
Yingwei Zheng	96b14f2ccb	[Reland][InstCombine] Fix FMF propagation in `foldSelectIntoOp` (#114499 ) Relands #114356. Compared to the last version, this patch only merges poison-generating/nsz flags from the select to fix LV regression in `llvm/test/Transforms/PhaseOrdering/AArch64/predicated-reduction.ll`.	2024-11-01 12:22:57 +08:00
Florian Hahn	b021464d35	[VPlan] Introduce scalar loop header in plan, remove VPLiveOut. (#109975 ) Update VPlan to include the scalar loop header. This allows retiring VPLiveOut, as the remaining live-outs can now be handled by adding operands to the wrapped phis in the scalar loop header. Note that the current version only includes the scalar loop header, no other loop blocks and also does not wrap it in a region block. PR: https://github.com/llvm/llvm-project/pull/109975	2024-10-31 21:36:44 +01:00
Ami-zhang	1897bf61f0	[LoongArch] Enable FeatureExtLSX for generic-la64 processor (#113421 ) This commit makes the `generic` target to support FP and LSX, as discussed in #110211. Thereby, it allows 128-bit vector to be enabled by default in the loongarch64 backend.	2024-10-31 15:58:15 +08:00
David Green	9735c05186	[ValueTracking] Compute KnownFP state from recursive select/phi. (#113686 ) Given a recursive phi with select: %p = phi [ 0, entry ], [ %sel, loop] %sel = select %c, %other, %p The fp state can be calculated using the knowledge that the select/phi pair can only be the initial state (0 here) or from %other. This adds a short-cut into computeKnownFPClass for PHI to detect that the select is recursive back to the phi, and if so use the state from the other operand. This helps to address a regression from #83200.	2024-10-31 07:50:44 +00:00
Luke Lau	14045de250	[RISCV] Account for factor in interleave memory op costs (#111511 ) Currently we cost an interleaved memory op as if it were a load/store of the widened vector type, but this was undercosting in all cases when compared to the measured performance of todays hardware. On the x280 at NF=2 and spacemit-x60 at NF=2,3 and 4, a segmented load is carried out as a wide load and NF LMUL shuffle ops: https://github.com/preames/bp3-microarch#vlseg_lmul_x_sew_throughput All other NFs go through a slow path. On the spacemit-x60 this is proportional to VLMAX * NF, and on the x280 proportional to the number of segments. This patch increases the cost by implementing a wide load + NF LMUL shuffle op cost for the lowest common denominator NF=2, and then a slower cost proportional to VL for the other NFs. In a follow up patch we can add a tuning flag to use the faster cost model for NF=3 and 4 on the spacemit-x60. Note that the FIXME about illegal vectors seems to have been fixed in #100436	2024-10-31 05:36:46 +08:00
David Sherwood	7f498a865f	[CostModel][LoopVectorize] Move some loop vectoriser tests (#113702 ) Many tests that were in test/Analysis/CostModel were actually loop vectoriser tests. I've moved them as follows: Analysis/CostModel/X86 -> Transforms/LoopVectorize/X86/CostModel Analysis/CostModel/AArch64/arith-fp-frem.ll -> Transforms/LoopVectorize/AArch64/arith-fp-frem-costs.ll	2024-10-30 13:50:02 +00:00
Rohit Aggarwal	dfb60bb919	Adding more vector calls for -fveclib=AMDLIBM (#109662 ) AMD has it's own implementation of vector calls. New vector calls are introduced in the library for exp10, log10, sincos and finite asin/acos Please refer [https://github.com/amd/aocl-libm-ose] --------- Co-authored-by: Rohit Aggarwal <Rohit.Aggarwal@amd.com>	2024-10-29 10:09:55 +00:00
Florian Hahn	0d0abb351b	[VPlan] Use ResumePhi to create reduction resume phis. (#110004 ) Use VPInstruction::ResumePhi to create phi nodes for reduction resume values in the scalar preheader, similar to how ResumePhis are used for first-order recurrence resume values after 9a5a8731e77. This allows simplifying createAndCollectMergePhiForReduction to only collect reduction resume phis when vectorizing epilogue loops and adding extra incoming edges from the main vector loop. Updating phis for the epilogue vector loops requires special attention, because additional incoming values from the bypass blocks need to be added. PR: https://github.com/llvm/llvm-project/pull/110004	2024-10-28 20:14:08 +01:00
Shih-Po Hung	266ff98cba	[LV][VPlan] Use VF VPValue in VPVectorPointerRecipe (#110974 ) Refactors VPVectorPointerRecipe to use the VF VPValue to obtain the runtime VF, similar to #95305. Since only reverse vector pointers require the runtime VF, the patch sets VPUnrollPart::PartOpIndex to 1 for vector pointers and 2 for reverse vector pointers. As a result, the generation of reverse vector pointers is moved into a separate recipe.	2024-10-26 23:18:50 +08:00
Florian Hahn	e724226da7	[VPlan] Return cost of 0 for VPWidenCastRecipe without underlying value. In some cases, VPWidenCastRecipes are created but not considered in the legacy cost model, including truncates/extends when evaluating a reduction in a smaller type. Return 0 for such casts for now, to avoid divergences between VPlan and legacy cost models. Fixes https://github.com/llvm/llvm-project/issues/113526.	2024-10-25 21:25:44 +02:00
Tex Riddell	c03d09ce3e	[aarch64] atan2 intrinsic lowering (p5) (#112611 ) This change is part of this proposal: https://discourse.llvm.org/t/rfc-all-the-math-intrinsics/78294 - `VecFuncs.def`: define intrinsic to sleef/armpl mapping - `LegalizerHelper.cpp`: add missing fewerElementsVector handling for the new atan2 intrinsic - `AArch64ISelLowering.cpp`: Add arch64 specializations for lowering like neon instructions - `AArch64LegalizerInfo.cpp`: Legalize atan2. Part 5 for Implement the atan2 HLSL Function #70096.	2024-10-24 17:53:12 -07:00
Florian Hahn	2dfb1c664c	[VPlan] Try to hoist Previous (and operands), if sinking fails for FORs. (#108945 ) In some cases, Previous (and its operands) can be hoisted. This allows supporting additional cases where sinking of all users of to FOR fails, e.g. due having to sink recipes with side-effects. This fixes a crash where we fail to create a scalar VPlan for a first-order recurrence, but can create a vector VPlan, because the trunc instruction of an IV which generates the previous value of the recurrence has been optimized to a truncated induction recipe, thus hoisting it to the beginning. Fixes https://github.com/llvm/llvm-project/issues/106523. PR: https://github.com/llvm/llvm-project/pull/108945	2024-10-23 13:12:03 -07:00
Florian Hahn	ddbb382a7c	[LV] Regenerate check-lines for some tests.	2024-10-23 04:34:13 +01:00
Paul Walker	5bb34803a4	[NFC] Migrate tests to use autoupdate for CHECK lines.	2024-10-22 12:55:15 +00:00
Ramkumar Ramachandra	f719cfa868	LAA: be less conservative in isNoWrap (#112553 ) isNoWrap has exactly one caller which handles Assume = true separately, but too conservatively. Instead, pass Assume to isNoWrap, so it is threaded into getPtrStride, which has the correct handling for the Assume flag. Also note that the Stride == 1 check in isNoWrap is incorrect: getPtrStride returns Strides == 1 or -1, except when isNoWrapAddRec or Assume are true, assuming ShouldCheckWrap is true; we can include the case of -1 Stride, and when isNoWrapAddRec is true. With this change, passing Assume = true to getPtrStride could return a non-unit stride, and we correctly handle that case as well.	2024-10-22 09:55:51 +01:00
Florian Hahn	2a6b09e0d3	[LV] Use type from InsertPos for cost computation of interleave groups. Previously the legacy cost model would pick the type for the cost computation depending on the order of the members in the input IR. This is incompatible with the VPlan-based cost model (independent of original IR order) and also doesn't match code-gen, which uses the type of the insert position. Update the legacy cost model to use the type (and address space) from the Group's insert position. This brings the legacy cost model in line with the legacy cost model and fixes a divergence between both models. Note that the X86 cost model seems to assign different costs to groups with i64 and double types. Added a TODO to check. Fixes https://github.com/llvm/llvm-project/issues/112922.	2024-10-18 19:12:40 -07:00
Florian Hahn	c7496cebac	[LV] Use SCEV to check if minimum iteration check is known. (#111310 ) Use SCEV to check if the minimum iteration check (TC < Step) is known to be false. This is a first step towards addressing https://github.com/llvm/llvm-project/issues/111098. To catch the exact case from the issue, we need to do extra work to make sure the wrap flags on the shl are preserved and used by SCEV. Note that skeleton creation will be gradually moved to VPlan and this simplification should be done as VPlan transform eventually. The current plan is to move skeleton creation to VPlan starting from parts closest to the parts already created by VPlan, starting with induction resume value creation (started with https://github.com/llvm/llvm-project/pull/110577), then memory and SCEV checks and finally minimum iteration checks. PR: https://github.com/llvm/llvm-project/pull/111310	2024-10-18 15:22:59 -07:00
Alexey Bataev	f148d5791b	[LV]Initial support for safe distance in predicated DataWithEVL vectorization mode. Enabled initial support for max safe distance in DataWithEVL mode. If max safe distance is required, need to emit special code: CMP = icmp ult AVL, MAX_SAFE_DISTANCE SAFE_AVL = select CMP, AVL, MAX_SAFE_DISTANCE EVL = call i32 @llvm.experimental.get.vector.length(i64 SAFE_AVL) while vectorize the loop in DataWithEVL tail folding mode. Reviewers: fhahn Reviewed By: fhahn Pull Request: https://github.com/llvm/llvm-project/pull/102897	2024-10-18 15:51:49 -04:00
Graham Hunter	091a235ec5	Revert "[AArch64][SVE] Enable max vector bandwidth for SVE" (#112873 ) Reverts llvm/llvm-project#109671 Reverting due to some performance regressions on neoverse-v1.	2024-10-18 11:05:55 +01:00
Florian Hahn	b497010854	[VPlan] Use VPInstruction::Name when assigning names (NFCI). This slightly improves the printing of VPInstructions. NFC except debug output.	2024-10-18 05:52:35 +01:00
Florian Hahn	b060661da8	[SCEVExpander] Expand UDiv avoiding UB when in seq_min/max. (#92177 ) Update SCEVExpander to introduce an SafeUDivMode, which is set when expanding operands of SCEVSequentialMinMaxExpr. In this mode, the expander will make sure that the divisor of the expanded UDiv is neither 0 nor poison. Fixes https://github.com/llvm/llvm-project/issues/89958. PR https://github.com/llvm/llvm-project/pull/92177	2024-10-17 13:55:20 -07:00
David Sherwood	76f3776185	[NFC][LoopVectorize] Restructure simple early exit tests (#112721 ) The previous simple_early_exit.ll was growing too large and difficult to manage. Instead I've decided to refactor the tests by splitting out into notional groups: 1. single_early_exit.ll: loops with a single uncountable exit that do not have live-outs from the loop. 2. single_early_exit_live_outs.ll: loops with a single uncountable exit with live-outs. 3. multi_early_exit.ll: loops with multiple early exits, i.e. a mixture of countable and uncountable exits, but with no live-outs from the loop. 4. multi_early_exit_live_outs.ll: as above, but with live-outs. 5. single_early_exit_unsafe_ptrs.ll: loops with a single uncountable exit, but with pointers that are not unconditionally dereferenceable. 6. unsupported_early_exit.ll: loops with uncountable exits that we cannot yet vectorise. 7. early_exit_legality.ll: tests the debug output from LoopVectorizationLegality to make sure we handle different scenarios correctly. Only the last test now requires asserts. Over time some of these tests should start vectorising as more support is added. I also tried to rename the multi early exit tests to make it clear there what mixture of countable and uncountable exits are present.	2024-10-17 16:50:59 +01:00
Yingwei Zheng	095d49da76	[InstCombine] Set `samesign` when converting signed predicates into unsigned (#112642 ) Alive2: https://alive2.llvm.org/ce/z/6cqdt-	2024-10-17 20:43:48 +08:00
Graham Hunter	c980a20b10	[AArch64][SVE] Enable max vector bandwidth for SVE (#109671 ) Returns true for shouldMaximizeVectorBandwidth when the register type is a scalable vector and SVE or streaming SVE are available.	2024-10-17 13:17:24 +01:00
David Sherwood	671976ff59	[NFC][LoopVectorize] Add more simple early exit tests (#112529 ) I realised we are missing tests to cover more loops with multiple early exits - some countable and some uncountable. I've also added a few SVE versions of the test in the AArch64 directory. Once we can vectorise such early exit loops it's a good sanity check to make sure they also vectorise for SVE. Also, for some of the tests I expect there to be some divergence from the same tests in the top level directory once we start vectorising them.	2024-10-17 09:49:51 +01:00
Florian Hahn	24423107ab	[LV] Add additional trip count expansion tests for #92177 . Extra tests for https://github.com/llvm/llvm-project/pull/92177, split off the PR.	2024-10-16 07:40:04 +01:00
Florian Hahn	bbff5b8891	[VPlan] Use alloc-type to compute interleave group offset. Use getAllocTypeSize to get compute the offset to the start of interleave groups instead getScalarSizeInBits, which may return 0 for pointers. This is in line with the analysis building the interleave groups and fixes a mis-compile reported for https://github.com/llvm/llvm-project/pull/106431.	2024-10-16 07:21:58 +01:00
Florian Hahn	cc5b5ca34b	[LV] Add test where interleave group start pointer is incorrect. Test case from https://github.com/llvm/llvm-project/pull/106431.	2024-10-16 07:13:38 +01:00
Florian Hahn	3860e29e0e	[VPlan] Mark VPVectorPointerRecipe as not having sideeffects. VectorPointer doesn't read from memory or have any sideeffects. Mark it accordingly.	2024-10-16 06:10:19 +01:00
Florian Hahn	34cdd67c85	[VPlan] Use VPWidenIntrinsicRecipe to vp.select. (#110489 ) Use VPWidenIntrinsicRecipe (https://github.com/llvm/llvm-project/pull/110486) to create vp.select intrinsics. This potentially offers an alternative to duplicating EVL recipes for all existing recipes. There are some recipes that will need duplicates (at least at the moment), due to extra code-gen needs (e.g. widening loads and stores). But in cases the intrinsic can directly be used, creating the widened intrinsic directly would reduce the need to duplicate some recipes. PR: https://github.com/llvm/llvm-project/pull/110489	2024-10-15 21:48:15 +01:00
Florian Hahn	7f06d8afb0	[SCEV] Retain SCEVSequentialMinMaxExpr if an operand may trigger UB. (#110824 ) Retain SCEVSequentialMinMaxExpr if an operand may trigger UB, e.g. if there is an UDiv operand that may divide by 0 or poison PR: https://github.com/llvm/llvm-project/pull/110824	2024-10-14 13:08:49 +01:00
Florian Hahn	65da32c634	[LV] Account for any-of reduction when computing costs of blend phis. Any-of reductions are narrowed to i1. Update the legacy cost model to use the correct type when computing the cost of a phi that gets lowered to selects (BLEND). This fixes a divergence between legacy and VPlan-based cost models after 36fc291b6ec6d. Fixes https://github.com/llvm/llvm-project/issues/111874.	2024-10-11 11:27:22 +01:00
David Sherwood	72f339de45	[LoopVectorize] Use predicated version of getSmallConstantMaxTripCount (#109928 ) There are a number of places where we call getSmallConstantMaxTripCount without passing a vector of predicates: getSmallBestKnownTC isIndvarOverflowCheckKnownFalse computeMaxVF isMoreProfitable I've changed all of these to now pass in a predicate vector so that we get the benefit of making better vectorisation choices when we know the max trip count for loops that require SCEV predicate checks. I've tried to add tests that cover all the cases affected by these changes.	2024-10-11 10:10:15 +01:00
Florian Hahn	bb937e276d	[LV] Compute value of escaped induction based on the computed end value. (#110576 ) Update fixupIVUsers to compute the value for escaped inductions using the already computed end value of the induction (EndValue), but subtracting the step. This results in slightly simpler codegen, as we avoid computing the full transformed index at VectorTripCount - 1. PR: https://github.com/llvm/llvm-project/pull/110576	2024-10-10 20:04:46 +01:00
Florian Hahn	01cbbc52dc	[VPlan] Request lane 0 for pointer arg in PtrAdd. After 7f74651, the pointer operand may be replicated of a PtrAdd. Instead of requesting a single scalar, request lane 0, which correctly handles the case when there is a scalar-per-lane. Fixes https://github.com/llvm/llvm-project/issues/111606.	2024-10-09 13:18:54 +01:00
Florian Hahn	6fbbe152fa	[VPlan] Introduce VPWidenIntrinsicRecipe to separate from libcall. (#110486 ) This patch splits off intrinsic hanlding to a new VPWidenIntrinsicRecipe. VPWidenIntrinsicRecipes only need access to the intrinsic ID to widen and the scalar result type (in case the intrinsic is overloaded on the result type). It does not need access to an underlying IR call instruction or function. This means VPWidenIntrinsicRecipe can be created easily without access to underlying IR.	2024-10-08 22:37:20 +01:00
Florian Hahn	36fc291b6e	[VPlan] Implement VPBlendRecipe::computeCost. Implement VPBlendRecipe::computeCost. VPBlendRecipe is currently is also used if only the first lane is used. This also requires pre-computing costs for forced scalars and instructions considered profitable to scalarize. For those, the cost will be computed separately in the legacy cost model. This will also be needed when implementing VPReplicateRecipe::computeCost.	2024-10-08 21:33:42 +01:00
Florian Hahn	3ec6f805c5	[VPlan] Don't created GEP x, 0 for interleave group pointers. The GEP with offet 0 is redundant, remove it. This addresses a TODO from 7f74651837b ((#106431).	2024-10-08 12:08:13 +01:00
Luke Lau	366e469db9	[RISCV] Add cost tests for more interleave factors. NFC This shows how we're not properly scaling the cost with the number of factors, i.e. a factor 8 interleave costs the same as a factor 2 interleave at VF=2.	2024-10-08 17:16:17 +08:00
Luke Lau	e98875af4c	[RISCV] Add scalable interleave cost tests. NFC This gets the cost from the recipe output rather than the individual instruction cost. The factor 3 test was left alone since we don't support anything else other than factor 2 for scalable vectors currently.	2024-10-08 13:39:13 +08:00
Florian Hahn	7f74651837	[VPlan] Use pointer to member 0 as VPInterleaveRecipe's pointer arg. (#106431 ) Update VPInterleaveRecipe to always use the pointer to member 0 as pointer argument. This in many cases helps to remove unneeded index adjustments and simplifies VPInterleaveRecipe::execute. In some rare cases, the address of member 0 does not dominate the insert position of the interleave group. In those cases a PtrAdd VPInstruction is emitted to compute the address of member 0 based on the address of the insert position. Alternatively we could hoist the recipe computing the address of member 0.	2024-10-06 22:53:13 +01:00
Florian Hahn	89d2a9de05	[VPlan] Add additional FOR hoisting test. Additional tests for https://github.com/llvm/llvm-project/pull/108945.	2024-10-06 14:36:33 +01:00
Florian Hahn	45b526afa2	[LV] Honor uniform-after-vectorization in setVectorizedCallDecision. The legacy cost model always computes the cost for uniforms as cost of VF = 1, but VPWidenCallRecipes would be created, as setVectorizedCallDecisions would not consider uniform calls. Fix setVectorizedCallDecision to set to Scalarize, if the call is uniform-after-vectorization. This fixes a bug in VPlan construction uncovered by the VPlan-based cost model. Fixes https://github.com/llvm/llvm-project/issues/111040.	2024-10-06 10:35:06 +01:00
Florian Hahn	68210c7c26	[VPlan] Only generate first lane for VPPredInstPHI if no others used. IF only the first lane of the result is used, only generate the first lane. Fixes https://github.com/llvm/llvm-project/issues/111042.	2024-10-05 19:15:05 +01:00
Shih-Po Hung	26fca7256e	[VPlan][NFC] Use patterns in test check (#111086 )	2024-10-04 17:19:07 +08:00
Benjamin Maxwell	01a1398971	[AArch64][Test] Update test variable names (NFC) (#110667 ) Simply by running update_test_checks.py with no changes. This is to make updating these tests for later changes easier.	2024-10-03 16:14:21 +01:00
Florian Hahn	9de327c94d	[LV] Generalize predication checks from 2c8836c899 for operands. This fixes another case where the VPlan-based and legacy cost models disagree. If any of the operands is predicated, it can't be trivially hoisted and we should consider the cost for evaluating it each loop iteration. Fixes https://github.com/llvm/llvm-project/issues/108697.	2024-10-02 20:16:41 +01:00
Florian Hahn	6d6eea92e3	[LV] Use SCEV to simplify wide binop operand to constant. The legacy cost model uses SCEV to determine if the second operand of a binary op is a constant. Update the VPlan construction logic to mirror the current legacy behavior, to fix a difference in the cost models. Fixes https://github.com/llvm/llvm-project/issues/109528. Fixes https://github.com/llvm/llvm-project/issues/110440.	2024-10-02 13:45:49 +01:00
Nikita Popov	9f3d1695eb	[SCEVExpander] Preserve gep nuw during expansion (#102133 ) When expanding SCEV adds to geps, transfer the nuw flag to the resulting gep. (Note that this doesn't apply to IV increment GEPs, which go through a different code path.)	2024-10-02 11:45:00 +02:00

1 2 3 4 5 ...

2716 Commits