llvm-project

Author	SHA1	Message	Date
Ramkumar Ramachandra	03da079968	[LoopUtils] Saturate at INT_MAX when estimating TC (#129683 ) getLoopEstimatedTripCount returns std::nullopt when the trip count would overflow the return type, but since it is an estimate anyway, we might as well saturate at UINT_MAX, improving results.	2025-03-05 18:19:39 +00:00
Luke Lau	5e54c92314	[VPlan] Fix crash when unrolling in-loop reduction chains (#129840 ) If an in-loop reduction is chained e.g. WIDEN-REDUCTION-PHI ir<%rdx> = phi ir<0>, ir<%add2> REDUCE ir<%add1> = ir<%rdx> + reduce.add (ir<%x>) REDUCE ir<%add2> = ir<%add1> + reduce.add (ir<%y>) When we try to unroll the second add reduction, we crash because we currently expect the chain to be a VPReductionPHIRecipe, when in fact it's the previous reduction. This relaxes the cast to a dyn_cast, so we end up unrolling to: WIDEN-REDUCTION-PHI ir<%rdx> = phi ir<0>, ir<%add2> WIDEN-REDUCTION-PHI ir<%rdx>.1 = phi ir<0>, ir<%add2>.1, ir<1> WIDEN-REDUCTION-PHI ir<%rdx>.2 = phi ir<0>, ir<%add2>.2, ir<2> WIDEN-REDUCTION-PHI ir<%rdx>.3 = phi ir<0>, ir<%add2>.3, ir<3> REDUCE ir<%add1> = ir<%rdx> + reduce.add (ir<%x>) REDUCE ir<%add1>.1 = ir<%rdx>.1 + reduce.add (ir<%x>.1) REDUCE ir<%add1>.2 = ir<%rdx>.2 + reduce.add (ir<%x>.2) REDUCE ir<%add1>.3 = ir<%rdx>.3 + reduce.add (ir<%x>.3) REDUCE ir<%add2> = ir<%add1> + reduce.add (ir<%y>) REDUCE ir<%add2>.1 = ir<%add1>.1 + reduce.add (ir<%y>.1) REDUCE ir<%add2>.2 = ir<%add1>.2 + reduce.add (ir<%y>.2) REDUCE ir<%add2>.3 = ir<%add1>.3 + reduce.add (ir<%y>.3) This fixes a crash when building 525.x264_r from SPEC CPU 2017 on AArch64 with -mllvm -prefer-inloop-reductions	2025-03-05 19:13:23 +08:00
Krzysztof Drewniak	e697c99b63	[AMDGPU] Add custom MachineValueType entries for buffer fat poiners (#127692 ) The old hack of returning v5/v6i32 for the fat and strided buffer pointers was causing issuse during vectorization queries that expected to be able to construct a VectorType from the return value of `MVT getPointerType()`. On example is in the test attached to this PR, which used to crash. Now, we define the custom MVT entries, the 160-bit amdgpuBufferFatPointer and 192-bit amdgpuBufferStridedPointer, which are used to represent ptr addrspace(7) and ptr addrspace(9) respectively. Neither of these types will be present at the time of lowering to a SelectionDAG or other MIR - MVT::amdgpuBufferFatPointer is eliminated by the LowerBufferFatPointers pass and amdgpu::bufferStridedPointer is not currently used outside of the SPIR-V translator (which does its own lowering). An alternative solution would be to add MVT::i160 and MVT::i192. We elect not to do this now as it would require changes to unrelated code and runs the risk of breaking any SelectionDAG code that assumes that the MVT series are all powers of two (and so can be split apart and merged back together) in ways that wouldn't be obvious if someone tried to use MVT::i160 in codegen. If i160 is added at some future point, these custom types can be retired.	2025-03-04 17:19:06 -06:00
Jim Lin	03505a004f	[RISCV] Enable scalable loop vectorization for fmax/fmin reductions with f16/bf16 type for zvfhmin/zvfbfmin (#129629 ) This PR enable scalable loop vectorization for fmax and fmin reductions with f16/bf16 type when only zvfhmin/zvfbfmin are enabled. After https://github.com/llvm/llvm-project/pull/128800, we can promote the fmax/fmin reductions with f16/bf16 type to f32 reductions for zvfhmin/zvfbfmin.	2025-03-04 16:49:24 +08:00
Ramkumar Ramachandra	80bdfcd411	[LoopUtils] Don't wrap in getLoopEstimatedTripCount (#129080 ) getLoopEstimatedTripCount returns the trip count based on profiling data, and its documentation says that it could return 0 when the trip count is zero, but this is not the case: a valid trip count can never be zero, and it returns 0 when the unsigned ExitCount is incremented by 1 and wraps. Some callers are careful about checking for this zero value in an std::optional, but it makes for an API with footguns, as a std::optional return value indicates that a non-nullopt value would be a valid trip count. Fix this by explicitly returning std::nullopt when the return value would wrap, and strip additional checks in callers. This also fixes a minor bug in LoopVectorize.	2025-03-04 08:43:08 +00:00
Mel Chen	9b4ad2fe50	[LV][EVL] Support fixed-order recurrence idiom with EVL tail folding. (#124093 ) This patch converts the llvm.vector.splice intrinsic to llvm.experimental.vp.splice, ensuring that fixed-order recurrences execute correctly when tail folding by EVL is enable. Due to the non-VFxUF penultimate EVL issue, the EVL from the previous iteration will be preserved and used in llvm.experimental.vp.splice.	2025-03-03 21:27:13 +08:00
Florian Hahn	f937b17e85	[LV] Don't query SCEV for non-invariant values in cost model. This fixes a divergence between VPlan and legacy cost model, matching behavior further up in getInstructionCost as well. Fixes https://github.com/llvm/llvm-project/issues/129236.	2025-03-02 10:55:52 +00:00
Benjamin Maxwell	89e7f4d31b	[LV] Teach the vectorizer to cost and vectorize modf and sincospi intrinsics (#129064 ) Follow on to #128035. It is a small extension to support vectorizing `llvm.modf.` and `llvm.sincospi.` too. This renames the test files from `sincos.ll` -> `multiple-result-intrinsics.ll` to group together the similar tests (which make up most of this PR).	2025-02-28 12:56:12 +00:00
Florian Hahn	6ce41db6b0	[VPlan] Preserve DebugLoc for VPBranchOnMaskRecipe. Update code to set and generate debug location for branch recipe	2025-02-27 20:19:42 +00:00
Florian Hahn	1e1b9bccc0	[VPlan] Simplify BLEND %a, %b, NOT(%m) -> BLEND %b, %a, %m. (#128375 ) Avoid negations for normalized blends by reordering operands. PR: https://github.com/llvm/llvm-project/pull/128375	2025-02-27 17:43:24 +00:00
Florian Hahn	649f4dcc19	[LV] Fix tests after 8150ab93f741. PR #124119 wasn't rebased & tested before merging. Update the failing tests.	2025-02-27 12:15:24 +00:00
John Brawn	8150ab93f7	[LoopVectorize] Use CodeSize as the cost kind for minsize (#124119 ) Functions marked with minsize should aim for minimum code size, so the vectorizer should use CodeSize for the cost kind and also the cost we compare should be the cost for the entire loop: it shouldn't be divided by the number of vector elements and block costs shouldn't be divided by the block probability. Possibly we should also be doing this for optsize as well, but there are a lot of tests that assume the current behaviour and the definition of optsize is less clear than minsize (for minsize the goal is to "keep the code size of this function as small as possible" whereas for optsize it's "keep the code size of this function low").	2025-02-27 11:07:02 +00:00
Benjamin Maxwell	3307b0374a	[LV] Teach the loop vectorizer llvm.sincos is trivially vectorizable (#128035 ) Depends on #123210	2025-02-27 09:37:06 +00:00
Florian Hahn	7b6abd827f	[LV] Remove stray check lines after be28365ca78.	2025-02-26 21:14:01 +00:00
Florian Hahn	be28365ca7	[LV] Generate check lines for if-conversion.ll The limited check lines make it difficult to reason about test changes in https://github.com/llvm/llvm-project/pull/128375.	2025-02-26 20:39:58 +00:00
Florian Hahn	4277c21059	[VPlan] Introduce explicit broadcasts for live-ins. (#124644 ) Add a new VPInstruction::Broadcast opcode and use it to materialize explicit broadcasts of live-ins. The initial patch only materlizes the broadcasts if the vector preheader dominates all uses that need it. Later patches will pick the best valid insert point, thus retiring implicit hoisting of broadcasts from VPTransformsState::get(). PR: https://github.com/llvm/llvm-project/pull/124644	2025-02-26 13:57:51 +00:00
Elvis Wang	8009c1fd81	[LV][VPlan] Prevent calculate cost for skiped instructions in precomputeCosts(). (#127966 ) Skip calculating instruction costs for exit conditions in precomputeCosts() when it should be skipped. Reported from: https://github.com/llvm/llvm-project/issues/115744#issuecomment-2670479463 Godbolt for reduced test cases: https://godbolt.org/z/fr4YMeqcv	2025-02-25 11:09:09 +08:00
Florian Hahn	65e44b4301	[LV] Add tests with deref assumptions and non-constant sizes.	2025-02-23 18:21:22 +00:00
Luke Lau	e23ab73335	[VPlan] Don't convert widen recipes to VP intrinsics in EVL transform (#127180 ) This is a copy of #126177, since it was automatically and permanently closed because I messed up the source branch on my remote This patch proposes to avoid converting widening recipes to VP intrinsics during the EVL transform. IIUC we initially did this to avoid `vl` toggles on RISC-V. However we now have the RISCVVLOptimizer pass which mostly makes this redundant. Emitting regular IR instead of VP intrinsics allows more generic optimisations, both in the middle end and DAGCombiner, and we generally have better patterns in the RISC-V backend for non-VP nodes. Sticking to regular IR instructions is likely a lot less work than reimplementing all of these optimisations for VP intrinsics, and on SPEC CPU 2017 we get noticeably better code generation.	2025-02-22 19:38:11 +08:00
Florian Hahn	52ded67249	[LAA] Always require non-wrapping pointers for runtime checks. (#127543 ) Currently we only check if the pointers involved in runtime checks do not wrap if we need to perform dependency checks. If that's not the case, we generate runtime checks, even if the pointers may wrap (see test/Analysis/LoopAccessAnalysis/runtime-checks-may-wrap.ll). If the pointer wraps, then we swap start and end of the runtime check, leading to incorrect checks. An Alive2 proof of what the runtime checks are checking conceptually (on i4 to have it complete in reasonable time) showing the incorrect result should be https://alive2.llvm.org/ce/z/KsHzn8 Depends on https://github.com/llvm/llvm-project/pull/127410 to avoid more regressions. PR: https://github.com/llvm/llvm-project/pull/127543	2025-02-20 19:00:23 +01:00
Florian Hahn	404af37175	[VPlan] Remove stale assertion in HCFG builder. The assertion was left over from a time when VPBBs still had an associated condition bit. This is not the case any more (comment was stale). In case a branch on condition is needed, a BranchOnCond VPInstruction is added when constructing recipes. That's also where it is checked if the condition is available. Exposed by 38376dee9.	2025-02-20 17:01:49 +01:00
Florian Hahn	04b5c63ddf	[LV] Add inbounds to interleave test. In preparation for https://github.com/llvm/llvm-project/pull/127543	2025-02-20 16:33:01 +01:00
Ramkumar Ramachandra	3b9f9645e0	[LV] Regen a couple of tests with UTC (#127785 )	2025-02-20 15:27:51 +00:00
Florian Hahn	a96444af44	[VPlan] Remove dead exit block handling code in HCFGBuilder. The mapping of IR ExitBB to a VPBB isn't used. It also sets an incorrect VPBB for the ExitBB; the regions successor is the middle block, no the exit block. It also unnecessarily triggers an assertion after 38376dee922.	2025-02-19 18:51:45 +01:00
Florian Hahn	38376dee92	[VPlan] Build initial VPlan 0 using HCFGBuilder for inner loops. (NFC) (#124432 ) Use HCFGBuilder to build an initial VPlan 0, which wraps all input instructions in VPInstructions and update tryToBuildVPlanWithVPRecipes to replace the VPInstructions with widened recipes. At the moment, widened recipes are created based on the underlying instruction of the VPInstruction. Masks are also still created based on the input IR basic blocks and the loop CFG is flattened in the main loop processing the VPInstructions. This patch also incldues support for Switch instructions in HCFGBuilder using just a VPInstruction with Instruction::Switch opcode. There are multiple follow-ups planned: * Perform predication on the VPlan directly, * Unify code constructing VPlan 0 to be shared by both inner and outer loop code paths. * Construct VPlan 0 once, clone subsequent ones for VFs PR: https://github.com/llvm/llvm-project/pull/124432	2025-02-18 16:12:29 +01:00
Florian Hahn	6c627831f9	[VPlan] Use VPlan predecessors in VPWidenPHIRecipe (NFC). (#126388 ) Update VPWidenPHIRecipe to use the predecessors in VPlan to determine the incoming blocks instead of tracking them separately. This brings VPWidenPHIRecipe in line with the other phi recipes. PR: https://github.com/llvm/llvm-project/pull/126388	2025-02-17 16:40:37 +01:00
Benjamin Maxwell	e0e67a6207	[LV] Add initial support for vectorizing literal struct return values (#109833 ) This patch adds initial support for vectorizing literal struct return values. Currently, this is limited to the case where the struct is homogeneous (all elements have the same type) and not packed. The users of the call also must all be `extractvalue` instructions. The intended use case for this is vectorizing intrinsics such as: ``` declare { float, float } @llvm.sincos.f32(float %x) ``` Mapping them to structure-returning library calls such as: ``` declare { <4 x float>, <4 x float> } @Sleef_sincosf4_u10advsimd(<4 x float>) ``` Or their widened form (such as `@llvm.sincos.v4f32` in this case). Implementing this required two main changes: 1. Supporting widening `extractvalue` 2. Adding support for vectorized struct types in LV * This is mostly limited to parts of the cost model and scalarization Since the supported use case is narrow, the required changes are relatively small.	2025-02-17 09:51:35 +00:00
Florian Hahn	e5f5517f91	[VPlan] Create IR basic block for middle.block in VPlan. Create a IR BB directly for the middle.block, instead of creating the IR BB during skeleton creation and then replacing the middle VPBB with a VPIRBB. This moves another part of skeleton creation to VPlan and simplififes the code slightly by removing code to disconnect the middle block and vector preheader + the corresponding DT update. NFC modulo IR block naming and block creation order, which changes the IR names for the blocks.	2025-02-15 21:54:16 +01:00
Mel Chen	be82705192	[LV][EVL] Enhance fixed-order recurrence tests for tail folding with EVL. NFC (#126507 ) Test that we do not vectorize the loop using folding by EVL, when a fixed-order recurrence has external users. TODO: Support external users by extractelement the EVL-th lane.	2025-02-14 16:43:30 +08:00
Florian Hahn	65640c1d4c	[AssumeBundles] Dereferenceable used in bundle only applies at assume. (#126117 ) Update LangRef and code using `Dereferenceable` in assume bundles to only use the information if it is safe at the point of use. `Dereferenceable` in an assume bundle is only guaranteed at the point of the assumption, but may not be guaranteed at later points, because the pointer may have been freed. Update code using `Dereferenceable` to only use it if the pointer cannot be freed. This can further be refined to check if the pointer could be freed between assume and use. This follows up on https://github.com/llvm/llvm-project/pull/123196. With that change, it should be safe to expose dereferenceable assumptions more widely as in https://github.com/llvm/llvm-project/pull/121789 PR: https://github.com/llvm/llvm-project/pull/126117	2025-02-13 20:41:23 +01:00
Nicholas Guy	9c89faa62b	[LoopVectorizer][AArch64] Add support for partial reduce subtraction (#123636 )	2025-02-13 10:35:45 +00:00
David Sherwood	efc72347fd	[AArch64] Improve getPartialReductionCost for fixed-width VFs (#126538 ) NEON does not have a version of udot/sdot that accumulates into 64-bit integer values, so we should return Invalid from getPartialReductionCost for 64-bit types and fixed-width VFs. In theory, if the 64-bit versions of SVE udot/sdot are available we could use those, but we don't currently have lowering support for that.	2025-02-11 15:10:39 +00:00
Florian Hahn	e258bca950	[VPlan] Only skip expansion for SCEVUnknown if it isn't an instruction. (#125235 ) Update getOrCreateVPValueForSCEVExpr to only skip expansion of SCEVUnknown if the underlying value isn't an instruction. Instructions may be defined in a loop and using them without expansion may break LCSSA form. SCEVExpander will take care of preserving LCSSA if needed. We could also try to pass LoopInfo, but there are some users of the function where it won't be available and main benefit from skipping expansion is slightly more concise VPlans. Note that SCEVExpander is now used to expand SCEVUnknown with floats. Adjust the check in expandCodeFor to only check the types and casts if the type of the value is different to the requested type. Otherwise we crash when trying to expand a float and requesting a float type. Fixes https://github.com/llvm/llvm-project/issues/121518. PR: https://github.com/llvm/llvm-project/pull/125235	2025-02-11 13:03:12 +01:00
Florian Hahn	3706dfef66	[LV] Forget LCSSA phi with new pred before other SCEV invalidation. (#119897 ) `forgetLcssaPhiWithNewPredecessor` performs additional invalidation if there is an existing SCEV for the phi, but earlier `forgetBlockAndLoopDispositions` or `forgetLoop` may already invalidate the SCEV for the phi. Change the order to first call `forgetLcssaPhiWithNewPredecessor` to ensure it runs before its SCEV gets invalidated too eagerly. Fixes https://github.com/llvm/llvm-project/issues/119665. PR: https://github.com/llvm/llvm-project/pull/119897	2025-02-10 16:29:42 +00:00
David Sherwood	0010a3c97e	[NFC][LoopVectorize] Add more partial reduction tests (#126525 ) * Adds variants of dotp (dotp_i8_to_i64_has_neon_dotprod, dotp_i16_to_i64_has_neon_dotprod) that show how the loop vectoriser has generated fixed-width partial reductions without any matching NEON udot instruction. * Adds loops that could also benefit from partial reductions once the work is done to recognise patterns such as %zext = zext i8 %load to i32 %acc.next = add i32 %acc, %zext See zext_add_reduc_i8_i32, etc. I intend to follow up with a patch to add support for vectorising such patterns.	2025-02-10 16:04:43 +00:00
Elvis Wang	2e3729bf40	[LV] Prevent query the computeCost() when VF=1 in emitInvalidCostRemarks(). (#117288 ) We should only query the computeCost() when the VF is vector.	2025-02-10 08:40:28 +08:00
Hassnaa Hamdi	e9a20f77ee	Reland "[LV]: Teach LV to recursively (de)interleave." (#125094 ) This patch relands the changes from "[LV]: Teach LV to recursively (de)interleave.#122989" Reason for revert: - The patch exposed an assert in the vectorizer related to VF difference between legacy cost model and VPlan-based cost model because of uncalculated cost for VPInstruction which is created by VPlanTransforms as a replacement to 'or disjoint' instruction. VPlanTransforms do that instructions change when there are memory interleaving and predicated blocks, but that change didn't cause problems because at most cases the cost difference between legacy/new models is not noticeable. - Issue is fixed by #125434 Original patch: https://github.com/llvm/llvm-project/pull/89018 Reviewed-by: paulwalker-arm, Mel-Chen	2025-02-09 19:21:54 +00:00
Simon Pilgrim	70906f0514	[LV][X86] Regenerate interleaved load/store costs. NFC. update_analyze_test_checks has improved the checks since these were last updated. Reduce noise diffs in future patches.	2025-02-09 15:02:41 +00:00
Florian Hahn	32c4493d5f	[VPlan] Add incoming values for all predecessor to ResumePHI (NFCI). Follow-up as discussed when using VPInstruction::ResumePhi for all resume values (#112147). This patch explicitly adds incoming values for each predecessor in VPlan. This simplifies codegen and allows transformations adjusting the predecessors of blocks with NFC modulo incoming block order in phis.	2025-02-09 11:20:20 +00:00
Florian Hahn	9266b48c5b	[VPlan] Add outer loop tests with wide phis in inner loop. Add test coverage with phis outside a header block with multiple incoming values.	2025-02-08 18:09:45 +00:00
Florian Hahn	cea799afc6	[LV] Add ordered reduction test with live-in. Extra test for https://github.com/llvm/llvm-project/pull/124644.	2025-02-07 20:50:46 +00:00
Florian Hahn	1611059f5d	[VPlan] Compute cost for binary op VPInstruction with underlying values. (#125434 ) As exposed by https://github.com/llvm/llvm-project/pull/125094, we are missing cost computation for some binary VPInstructions we created based on original IR instructions. Their cost should be considered. PR: https://github.com/llvm/llvm-project/pull/125434	2025-02-07 15:27:31 +00:00
David Sherwood	1930524bbd	[LoopVectorize] Fix cost model assert when vectorising calls (#125716 ) The legacy and vplan cost models did not agree because VPWidenCallRecipe::computeCost only calculates the cost of the call instruction, whereas LoopVectorizationCostModel::setVectorizedCallDecision in some cases adds on the cost of a synthesised mask argument. However, this mask is always 'splat(i1 true)' which should be hoisted out of the loop during codegen. In order to synchronise the two cost models I have two options: 1) Also add the cost of the splat to the vplan model, or 2) Remove the cost of the splat from the legacy model. I chose 2) because I feel this more closely represents what the final code will look like. There is an argument that we should take account of such broadcast costs in the preheader when deciding if it's profitable to vectorise a loop, however there isn't currently a mechanism to do this. We currently only take account of the runtime checks when assessing profitability and what the minimum trip count should be. However, I don't believe this work needs doing as part of this PR.	2025-02-07 09:36:52 +00:00
James Chesterman	ac158aa13b	[LoopVectorizer] Allow partial reductions to be made in predicated loops (#124268 ) Does a select on the input rather than the output. This way the mask has the same number of lanes as the other operand in the select instruction.	2025-02-07 09:09:10 +00:00
Mel Chen	4d3148d926	[LV][EVL] Fix the check for legality of folding with EVL. (#125678 ) The current legality check for folding with EVL has incomplete verification for VF. This patch fixes the VF check, ensuring that tail folding with EVL is enabled only when a scalable VF is available. This allows loops that prefer tail folding with EVL but cannot use scalable VF vectorization to still be vectorized using a fixed VF, rather than abandoning vectorization entirely.	2025-02-07 12:53:10 +08:00
David Sherwood	f07cd36a5d	[LoopVectorize] Add the cost of VPInstruction::AnyOf to vplan (#125058 ) This patch adds an initial implementation of VPInstruction::computeCost with support for only one instruction so far - VPInstruction::AnyOf. This is only used when vectorising loops with uncountable early exits.	2025-02-05 16:31:14 +00:00
Sam Tebbs	c7995a6905	[AArch64] Disallow vscale x 1 partial reductions (#125252 ) We don't want to allow partial reductions resulting in a vscale x 1 type as we can't lower it in the backend.	2025-02-05 13:34:43 +00:00
Mikhail R. Gadelha	e78be31639	[RISCV] Added cost model for fmuladd (#125683 ) This patch updates the cost model for fmuladd on vector types to scale with LMUL. This was found when analyzing a hot loop in 519.lbm_r that was unprofitably vectorized, but doesn't directly impact that case and is split off so it doesn't get forgotten. Unlike other FP arithmetic ops, it's not scaled by 2 because the scalar cost isn't scaled by 2.	2025-02-05 09:33:24 -03:00
Mel Chen	8d037b9256	[LV][EVL] Skip tryAddExplicitVectorLength for plans with scalar VF. (#125497 ) The plans with scalar VF should not be transformed the plans folded by EVL. TODO: Move the scalar VF checking into `LoopVectorizationCostModel ::foldTailWithEVL()`.	2025-02-05 15:02:33 +08:00
Mel Chen	b9fa35fc07	[LV][EVL] Pre-commit test cases for preventing to transform plans with scalar VF. NFC (#125499 ) Pre-commit for #125497.	2025-02-04 15:00:35 +08:00

1 2 3 4 5 ...

2937 Commits