llvm-project

Author	SHA1	Message	Date
Florian Hahn	4e9894498e	[VPlan] Truncate VFxUF if needed in VPWidenPointerInduction::execute. Create truncate if needed after 56b05a0d6. Note that this preserves the original behavior pre 56b05a0d6. If truncate would strip any set bits, then the explicit computation in the narrower type would wrap.	2025-03-16 11:37:58 +00:00
Florian Hahn	6a8d5f22ff	[VPlan] Don't access canonical IV in VPWidenPointerInduction::execute. This updates VPWidenPointerInductionRecipe::execute to not use the canonical IV to determine the insert point. Instead, it relies on the current recipe position. In cases where this is not sufficient, set the insert point to the first non-phi instruction, to ensure phis are created together.	2025-03-15 21:32:48 +00:00
Florian Hahn	aadfa9f6c8	[LV] Add additional tests for narrowing interleave groups. Extend test coverage for https://github.com/llvm/llvm-project/pull/106441.	2025-03-15 21:13:49 +00:00
Florian Hahn	37a57ca257	[FMF] Set all bits if needed when setting individual flags. (#131321 ) Currently fast() won't return true if all flags are set via setXXX, which is surprising. Update setters to set all bits if needed to make sure isFast() consistently returns the expected result. PR: https://github.com/llvm/llvm-project/pull/131321	2025-03-15 18:46:26 +00:00
Florian Hahn	56b05a0d6b	[VPlan] Use VFxUF in VPWidenPointerInductionRecipe. Use VFxUF VPValue instead of computing VF * UF explicitly.	2025-03-15 18:18:53 +00:00
Jeremy Morse	792a6f8119	[RemoveDIs] Remove "try-debuginfo-iterators..." test flags (#130298 ) These date back to when the non-intrinsic format of variable locations was still being tested and was behind a compile-time flag, so not all builds / bots would correctly run them. The solution at the time, to get at least some test coverage, was to have tests opt-in to non-intrinsic debug-info if it was built into LLVM. Nowadays, non-intrinsic format is the default and has been on for more than a year, there's no need for this flag to exist. (I've downgraded the flag from "try" to explicitly requesting non-intrinsic format in some places, so that we can deal with tests that are explicitly about non-intrinsic format in their own commit).	2025-03-14 15:50:49 +00:00
David Sherwood	3b6d0093aa	[LV][NFC] Refactor code for extracting first active element (#131118 ) Refactor the code to extract the first active element of a vector in the early exit block, in preparation for PR #130766. I've replaced the VPInstruction::ExtractFirstActive nodes with a combination of a new VPInstruction::FirstActiveLane node and a Instruction::ExtractElement node.	2025-03-14 11:14:09 +00:00
Luke Lau	26324bc1bf	[VPlan] Move FOR splice cost into VPInstruction::FirstOrderRecurrenceSplice (#129645 ) After #124093 we now support fixed-order recurrences with EVL tail folding by replacing VPInstruction::FirstOrderRecurrenceSplice with a VP splice intrinsic. However the costing for the splice is currently done in VPFirstOrderRecurrencePHIRecipe, so when we add the VP splice intrinsic we end up costing it twice. This fixes it by splitting out the cost for the splice into FirstOrderRecurrenceSplice so that it's not duplicated when we replace it. We still have to keep the VF=1 checks in VPFirstOrderRecurrencePHIRecipe since the splice might end up dead and discarded, e.g. in the test @pr97452_scalable_vf1_for.	2025-03-14 15:33:32 +08:00
Luke Lau	315c02aa02	[VPlan] Fix crash with inloop fmuladd reductions with blend (#131154 ) When visiting in-loop reduction links, we previously crashed if we had an fmuladd with a blend after it in the chain. This fixes it by lifting the existing blend folding to also handle fmuladd. This also simplifies the code structure slightly for an upcoming patch I want to post to handle in-loop AnyOf reductions. I removed the PhiR->isInLoop() check since it's already guarded at the top of the parent Header->Phis() loop.	2025-03-14 09:08:32 +08:00
Florian Hahn	dfb661cd1c	[LAA] Add extra tests for #128061 . Extend test coverage for https://github.com/llvm/llvm-project/pull/128061.	2025-03-13 21:42:32 +00:00
Florian Hahn	02575f887b	[VPlan] Use VPInstruction for VPScalarPHIRecipe. (NFCI) (#129767 ) Now that all phi nodes manage their incoming blocks through the VPlan-predecessors, there should be no need for having a dedicate recipe, it should be sufficient to allow PHI opcodes in VPInstruction. Follow-ups will also migrate VPWidenPHIRecipe and possibly others, building on top of https://github.com/llvm/llvm-project/pull/129388. PR: https://github.com/llvm/llvm-project/pull/129767	2025-03-13 18:35:07 +00:00
Mel Chen	5d5e706691	[VPlan] Restrict hoisting of broadcast operations using VPDominatorTree (#117138 ) This patch restricts broadcast operations from being hoisted to the vector preheader unless the basic block that defines the broadcasted value properly dominates the vector preheader. This prevents potential use-before-definition issues when the broadcasted value is defined within the plan. VPDominatorTree is used to confirm this restriction while still allowing safe hoisting for broadcasted values defined outside the plan. Issue https://github.com/llvm/llvm-project/issues/117139	2025-03-13 07:16:04 -07:00
Mel Chen	ffe202ca00	Revert "[LV] Limits the splat operations be hoisted must not be defined by a recipe. (#117138 )" This reverts commit 1ff10fa82fff83bb2f0a5c1ffde6203b52bc9619.	2025-03-13 07:16:04 -07:00
Florian Hahn	62994c3291	[VPlan] Also introduce explicit broadcasts for values from entry VPBB. Update and generalize materializeBroadcasts to also introduce explicit broadcasts for VPValues defined in the Plans Entry block. This fixes a crash when trying to insert the broadcasts generated by VPTransformState::get after the generating instruction, which isn't possible after invoke instructions. Fixes https://github.com/llvm/llvm-project/issues/128838.	2025-03-12 22:03:19 +00:00
Florian Hahn	8132c4f554	[VPlan] Also introduce broadcasts for live-ins used in vec preheader. Slightly generalize materializeLiveInBroadcasts to also introduce broadcasts for live-ins used in the vector preheader. This should cover all live-ins. If the live-in is used in the vector preheader, insert the broadcast at the beginning of the block.	2025-03-11 21:19:14 +00:00
David Sherwood	26ecf97895	[LoopVectorize] Further improve cost model for early exit loops (#126235 ) Following on from #125058, this patch takes into account the work done in the vector early exit block when assessing the profitability of vectorising the loop. I have renamed areRuntimeChecksProfitable to isOutsideLoopWorkProfitable and we now pass in the early exit costs. As part of this, I have added the ExtractFirstActive opcode to VPInstruction::computeCost. It's worth pointing out that when we assess profitability of the loop we calculate a minimum trip count and compare that against the maximum trip count. However, since the loop has an early exit the runtime trip count can still end up being less than the minimum. Alternatively, we may never take the early exit at all at runtime and so we have the opposite problem of over-estimating the cost of the loop. The loop vectoriser cannot simultaneously take two contradictory positions and so I feel the only sensible thing to do is be conservative and assume the loop will be more expensive than loops without early exits. We may find in future that we need to adjust the cost according to the probability of taking the early exit. This will become even more important once we support multiple early exits. However, we have to start somewhere and we can always revisit this later.	2025-03-11 11:48:55 +00:00
David Sherwood	055db3ec33	[LV] Optimise latch exit induction users for some early exit loops (#128880 ) This is the first of two PRs that attempts to improve the IR generated in the exit blocks of vectorised loops with uncountable early exits. In this PR I am improving the generated code for users of induction variables in early exit loops that have a unique exit block, when exiting via the latch. I have moved some of the code for calculating the exit values in latch exit blocks from `optimizeInductionExitUsers` into a new function `optimizeLatchExitInductionUser`. I intend to follow this up very soon with another patch to optimise the code for induction users in the vector.early.exit block.	2025-03-11 10:13:16 +00:00
Mel Chen	1ff10fa82f	[LV] Limits the splat operations be hoisted must not be defined by a recipe. (#117138 ) Issue https://github.com/llvm/llvm-project/issues/117139	2025-03-11 17:59:12 +08:00
Florian Hahn	f84f4e1e05	[LV] Don't crash on inner loops with RT checks in VPlan-native path. Assert that we only generate runtime checks for inner loops in emitMemRuntimeChecks, instead of returning nullptr in the VPlan-native path, which is causing crashes and incorrect code.	2025-03-10 20:28:56 +00:00
Sushant Gokhale	c4808741e8	[AArch64][CostModel] Alter sdiv/srem cost where the divisor is constant (#123552 ) This patch revises the cost model for sdiv/srem and draws its inspiration from the udiv/urem patch #122236 The typical codegen for the different scenarios has been mentioned as notes/comments in the code itself( this is done owing to lot of scenarios such that it would be difficult to mention them here in the patch description).	2025-03-09 22:26:39 -07:00
Florian Hahn	437d587e48	[LV] Add outer loop test with different successor orders in inner latch.	2025-03-09 18:13:06 +00:00
Florian Hahn	fd267082ee	[VPlan] Refactor VPlan creation, add transform introducing region (NFC). (#128419 ) Create an empty VPlan first, then let the HCFG builder create a plain CFG for the top-level loop (w/o a top-level region). The top-level region is introduced by a separate VPlan-transform. This is instead of creating the vector loop region before building the VPlan CFG for the input loop. This simplifies the HCFG builder (which should probably be renamed) and moves along the roadmap ('buildLoop') outlined in [1]. As follow-up, I plan to also preserve the exit branches in the initial VPlan out of the CFG builder, including connections to the exit blocks. The conversion from plain CFG with potentially multiple exits to a single entry/exit region will be done as VPlan transform in a follow-up. This is needed to enable VPlan-based predication. Currently early exit support relies on building the block-in masks on the original CFG, because exiting branches and conditions aren't preserved in the VPlan. So in order to switch to VPlan-based predication, we will have to preserve them in the initial plain CFG, so the exit conditions are available explicitly when we convert to single entry/exit regions. Another follow-up is updating the outer loop handling to also introduce VPRegionBlocks for nested loops as transform. Currently the existing logic in the builder will take care of creating VPRegionBlocks for nested loops, but not the top-level loop. [1] https://llvm.org/devmtg/2023-10/slides/techtalks/Hahn-VPlan-StatusUpdateAndRoadmap.pdf PR: https://github.com/llvm/llvm-project/pull/128419	2025-03-09 15:05:35 +00:00
Florian Hahn	8dd160f476	Revert "[VPlan] Fold NOT into predicate of wide compares." (#130347 ) Reverts llvm/llvm-project#129430 this seems to have introduced a divergence between legacy and VPlan-based cost model https://lab.llvm.org/buildbot/#/builders/30/builds/17159	2025-03-07 21:18:49 +00:00
Florian Hahn	cb3ce30ca8	[VPlan] Fold NOT into predicate of wide compares. (#129430 ) Add simplification to fold negation into a compare, if the negation is the only user of the compare. This removes a number of redundant negations. Alive2 Proofs for FPCMP test changes: https://alive2.llvm.org/ce/z/WGDz9U PR: https://github.com/llvm/llvm-project/pull/129430	2025-03-07 20:32:43 +00:00
Ramkumar Ramachandra	ddffb74afd	[LV] Strip unreachable SCEV-check blocks (#130079 ) emitSCEVChecks checks if SCEVCheckCond matches zero, and returns nullptr. However, it sets SCEVCheckCond as used before it does this, which prevents it from being removed during cleanup, resulting in unreachable blocks being emitted. Fix this.	2025-03-06 19:30:25 +00:00
Luke Lau	c6e2cbe5fd	[LV] Regenerate select-cmp-predicated.ll with UTC. NFC The main select-cmp.ll tests seem to be generated with UTC after it should probably be converted to UTC beforehand.	2025-03-06 16:20:03 +08:00
Ramkumar Ramachandra	03da079968	[LoopUtils] Saturate at INT_MAX when estimating TC (#129683 ) getLoopEstimatedTripCount returns std::nullopt when the trip count would overflow the return type, but since it is an estimate anyway, we might as well saturate at UINT_MAX, improving results.	2025-03-05 18:19:39 +00:00
Luke Lau	5e54c92314	[VPlan] Fix crash when unrolling in-loop reduction chains (#129840 ) If an in-loop reduction is chained e.g. WIDEN-REDUCTION-PHI ir<%rdx> = phi ir<0>, ir<%add2> REDUCE ir<%add1> = ir<%rdx> + reduce.add (ir<%x>) REDUCE ir<%add2> = ir<%add1> + reduce.add (ir<%y>) When we try to unroll the second add reduction, we crash because we currently expect the chain to be a VPReductionPHIRecipe, when in fact it's the previous reduction. This relaxes the cast to a dyn_cast, so we end up unrolling to: WIDEN-REDUCTION-PHI ir<%rdx> = phi ir<0>, ir<%add2> WIDEN-REDUCTION-PHI ir<%rdx>.1 = phi ir<0>, ir<%add2>.1, ir<1> WIDEN-REDUCTION-PHI ir<%rdx>.2 = phi ir<0>, ir<%add2>.2, ir<2> WIDEN-REDUCTION-PHI ir<%rdx>.3 = phi ir<0>, ir<%add2>.3, ir<3> REDUCE ir<%add1> = ir<%rdx> + reduce.add (ir<%x>) REDUCE ir<%add1>.1 = ir<%rdx>.1 + reduce.add (ir<%x>.1) REDUCE ir<%add1>.2 = ir<%rdx>.2 + reduce.add (ir<%x>.2) REDUCE ir<%add1>.3 = ir<%rdx>.3 + reduce.add (ir<%x>.3) REDUCE ir<%add2> = ir<%add1> + reduce.add (ir<%y>) REDUCE ir<%add2>.1 = ir<%add1>.1 + reduce.add (ir<%y>.1) REDUCE ir<%add2>.2 = ir<%add1>.2 + reduce.add (ir<%y>.2) REDUCE ir<%add2>.3 = ir<%add1>.3 + reduce.add (ir<%y>.3) This fixes a crash when building 525.x264_r from SPEC CPU 2017 on AArch64 with -mllvm -prefer-inloop-reductions	2025-03-05 19:13:23 +08:00
Krzysztof Drewniak	e697c99b63	[AMDGPU] Add custom MachineValueType entries for buffer fat poiners (#127692 ) The old hack of returning v5/v6i32 for the fat and strided buffer pointers was causing issuse during vectorization queries that expected to be able to construct a VectorType from the return value of `MVT getPointerType()`. On example is in the test attached to this PR, which used to crash. Now, we define the custom MVT entries, the 160-bit amdgpuBufferFatPointer and 192-bit amdgpuBufferStridedPointer, which are used to represent ptr addrspace(7) and ptr addrspace(9) respectively. Neither of these types will be present at the time of lowering to a SelectionDAG or other MIR - MVT::amdgpuBufferFatPointer is eliminated by the LowerBufferFatPointers pass and amdgpu::bufferStridedPointer is not currently used outside of the SPIR-V translator (which does its own lowering). An alternative solution would be to add MVT::i160 and MVT::i192. We elect not to do this now as it would require changes to unrelated code and runs the risk of breaking any SelectionDAG code that assumes that the MVT series are all powers of two (and so can be split apart and merged back together) in ways that wouldn't be obvious if someone tried to use MVT::i160 in codegen. If i160 is added at some future point, these custom types can be retired.	2025-03-04 17:19:06 -06:00
Jim Lin	03505a004f	[RISCV] Enable scalable loop vectorization for fmax/fmin reductions with f16/bf16 type for zvfhmin/zvfbfmin (#129629 ) This PR enable scalable loop vectorization for fmax and fmin reductions with f16/bf16 type when only zvfhmin/zvfbfmin are enabled. After https://github.com/llvm/llvm-project/pull/128800, we can promote the fmax/fmin reductions with f16/bf16 type to f32 reductions for zvfhmin/zvfbfmin.	2025-03-04 16:49:24 +08:00
Ramkumar Ramachandra	80bdfcd411	[LoopUtils] Don't wrap in getLoopEstimatedTripCount (#129080 ) getLoopEstimatedTripCount returns the trip count based on profiling data, and its documentation says that it could return 0 when the trip count is zero, but this is not the case: a valid trip count can never be zero, and it returns 0 when the unsigned ExitCount is incremented by 1 and wraps. Some callers are careful about checking for this zero value in an std::optional, but it makes for an API with footguns, as a std::optional return value indicates that a non-nullopt value would be a valid trip count. Fix this by explicitly returning std::nullopt when the return value would wrap, and strip additional checks in callers. This also fixes a minor bug in LoopVectorize.	2025-03-04 08:43:08 +00:00
Mel Chen	9b4ad2fe50	[LV][EVL] Support fixed-order recurrence idiom with EVL tail folding. (#124093 ) This patch converts the llvm.vector.splice intrinsic to llvm.experimental.vp.splice, ensuring that fixed-order recurrences execute correctly when tail folding by EVL is enable. Due to the non-VFxUF penultimate EVL issue, the EVL from the previous iteration will be preserved and used in llvm.experimental.vp.splice.	2025-03-03 21:27:13 +08:00
Florian Hahn	f937b17e85	[LV] Don't query SCEV for non-invariant values in cost model. This fixes a divergence between VPlan and legacy cost model, matching behavior further up in getInstructionCost as well. Fixes https://github.com/llvm/llvm-project/issues/129236.	2025-03-02 10:55:52 +00:00
Benjamin Maxwell	89e7f4d31b	[LV] Teach the vectorizer to cost and vectorize modf and sincospi intrinsics (#129064 ) Follow on to #128035. It is a small extension to support vectorizing `llvm.modf.` and `llvm.sincospi.` too. This renames the test files from `sincos.ll` -> `multiple-result-intrinsics.ll` to group together the similar tests (which make up most of this PR).	2025-02-28 12:56:12 +00:00
Florian Hahn	6ce41db6b0	[VPlan] Preserve DebugLoc for VPBranchOnMaskRecipe. Update code to set and generate debug location for branch recipe	2025-02-27 20:19:42 +00:00
Florian Hahn	1e1b9bccc0	[VPlan] Simplify BLEND %a, %b, NOT(%m) -> BLEND %b, %a, %m. (#128375 ) Avoid negations for normalized blends by reordering operands. PR: https://github.com/llvm/llvm-project/pull/128375	2025-02-27 17:43:24 +00:00
Florian Hahn	649f4dcc19	[LV] Fix tests after 8150ab93f741. PR #124119 wasn't rebased & tested before merging. Update the failing tests.	2025-02-27 12:15:24 +00:00
John Brawn	8150ab93f7	[LoopVectorize] Use CodeSize as the cost kind for minsize (#124119 ) Functions marked with minsize should aim for minimum code size, so the vectorizer should use CodeSize for the cost kind and also the cost we compare should be the cost for the entire loop: it shouldn't be divided by the number of vector elements and block costs shouldn't be divided by the block probability. Possibly we should also be doing this for optsize as well, but there are a lot of tests that assume the current behaviour and the definition of optsize is less clear than minsize (for minsize the goal is to "keep the code size of this function as small as possible" whereas for optsize it's "keep the code size of this function low").	2025-02-27 11:07:02 +00:00
Benjamin Maxwell	3307b0374a	[LV] Teach the loop vectorizer llvm.sincos is trivially vectorizable (#128035 ) Depends on #123210	2025-02-27 09:37:06 +00:00
Florian Hahn	7b6abd827f	[LV] Remove stray check lines after be28365ca78.	2025-02-26 21:14:01 +00:00
Florian Hahn	be28365ca7	[LV] Generate check lines for if-conversion.ll The limited check lines make it difficult to reason about test changes in https://github.com/llvm/llvm-project/pull/128375.	2025-02-26 20:39:58 +00:00
Florian Hahn	4277c21059	[VPlan] Introduce explicit broadcasts for live-ins. (#124644 ) Add a new VPInstruction::Broadcast opcode and use it to materialize explicit broadcasts of live-ins. The initial patch only materlizes the broadcasts if the vector preheader dominates all uses that need it. Later patches will pick the best valid insert point, thus retiring implicit hoisting of broadcasts from VPTransformsState::get(). PR: https://github.com/llvm/llvm-project/pull/124644	2025-02-26 13:57:51 +00:00
Elvis Wang	8009c1fd81	[LV][VPlan] Prevent calculate cost for skiped instructions in precomputeCosts(). (#127966 ) Skip calculating instruction costs for exit conditions in precomputeCosts() when it should be skipped. Reported from: https://github.com/llvm/llvm-project/issues/115744#issuecomment-2670479463 Godbolt for reduced test cases: https://godbolt.org/z/fr4YMeqcv	2025-02-25 11:09:09 +08:00
Florian Hahn	65e44b4301	[LV] Add tests with deref assumptions and non-constant sizes.	2025-02-23 18:21:22 +00:00
Luke Lau	e23ab73335	[VPlan] Don't convert widen recipes to VP intrinsics in EVL transform (#127180 ) This is a copy of #126177, since it was automatically and permanently closed because I messed up the source branch on my remote This patch proposes to avoid converting widening recipes to VP intrinsics during the EVL transform. IIUC we initially did this to avoid `vl` toggles on RISC-V. However we now have the RISCVVLOptimizer pass which mostly makes this redundant. Emitting regular IR instead of VP intrinsics allows more generic optimisations, both in the middle end and DAGCombiner, and we generally have better patterns in the RISC-V backend for non-VP nodes. Sticking to regular IR instructions is likely a lot less work than reimplementing all of these optimisations for VP intrinsics, and on SPEC CPU 2017 we get noticeably better code generation.	2025-02-22 19:38:11 +08:00
Florian Hahn	52ded67249	[LAA] Always require non-wrapping pointers for runtime checks. (#127543 ) Currently we only check if the pointers involved in runtime checks do not wrap if we need to perform dependency checks. If that's not the case, we generate runtime checks, even if the pointers may wrap (see test/Analysis/LoopAccessAnalysis/runtime-checks-may-wrap.ll). If the pointer wraps, then we swap start and end of the runtime check, leading to incorrect checks. An Alive2 proof of what the runtime checks are checking conceptually (on i4 to have it complete in reasonable time) showing the incorrect result should be https://alive2.llvm.org/ce/z/KsHzn8 Depends on https://github.com/llvm/llvm-project/pull/127410 to avoid more regressions. PR: https://github.com/llvm/llvm-project/pull/127543	2025-02-20 19:00:23 +01:00
Florian Hahn	404af37175	[VPlan] Remove stale assertion in HCFG builder. The assertion was left over from a time when VPBBs still had an associated condition bit. This is not the case any more (comment was stale). In case a branch on condition is needed, a BranchOnCond VPInstruction is added when constructing recipes. That's also where it is checked if the condition is available. Exposed by 38376dee9.	2025-02-20 17:01:49 +01:00
Florian Hahn	04b5c63ddf	[LV] Add inbounds to interleave test. In preparation for https://github.com/llvm/llvm-project/pull/127543	2025-02-20 16:33:01 +01:00
Ramkumar Ramachandra	3b9f9645e0	[LV] Regen a couple of tests with UTC (#127785 )	2025-02-20 15:27:51 +00:00
Florian Hahn	a96444af44	[VPlan] Remove dead exit block handling code in HCFGBuilder. The mapping of IR ExitBB to a VPBB isn't used. It also sets an incorrect VPBB for the ExitBB; the regions successor is the middle block, no the exit block. It also unnecessarily triggers an assertion after 38376dee922.	2025-02-19 18:51:45 +01:00

1 2 3 4 5 ...

2963 Commits