llvm-project

Author	SHA1	Message	Date
Florian Hahn	bb937e276d	[LV] Compute value of escaped induction based on the computed end value. (#110576 ) Update fixupIVUsers to compute the value for escaped inductions using the already computed end value of the induction (EndValue), but subtracting the step. This results in slightly simpler codegen, as we avoid computing the full transformed index at VectorTripCount - 1. PR: https://github.com/llvm/llvm-project/pull/110576	2024-10-10 20:04:46 +01:00
Florian Hahn	01cbbc52dc	[VPlan] Request lane 0 for pointer arg in PtrAdd. After 7f74651, the pointer operand may be replicated of a PtrAdd. Instead of requesting a single scalar, request lane 0, which correctly handles the case when there is a scalar-per-lane. Fixes https://github.com/llvm/llvm-project/issues/111606.	2024-10-09 13:18:54 +01:00
Florian Hahn	36fc291b6e	[VPlan] Implement VPBlendRecipe::computeCost. Implement VPBlendRecipe::computeCost. VPBlendRecipe is currently is also used if only the first lane is used. This also requires pre-computing costs for forced scalars and instructions considered profitable to scalarize. For those, the cost will be computed separately in the legacy cost model. This will also be needed when implementing VPReplicateRecipe::computeCost.	2024-10-08 21:33:42 +01:00
Florian Hahn	3ec6f805c5	[VPlan] Don't created GEP x, 0 for interleave group pointers. The GEP with offet 0 is redundant, remove it. This addresses a TODO from 7f74651837b ((#106431).	2024-10-08 12:08:13 +01:00
Florian Hahn	7f74651837	[VPlan] Use pointer to member 0 as VPInterleaveRecipe's pointer arg. (#106431 ) Update VPInterleaveRecipe to always use the pointer to member 0 as pointer argument. This in many cases helps to remove unneeded index adjustments and simplifies VPInterleaveRecipe::execute. In some rare cases, the address of member 0 does not dominate the insert position of the interleave group. In those cases a PtrAdd VPInstruction is emitted to compute the address of member 0 based on the address of the insert position. Alternatively we could hoist the recipe computing the address of member 0.	2024-10-06 22:53:13 +01:00
Florian Hahn	45b526afa2	[LV] Honor uniform-after-vectorization in setVectorizedCallDecision. The legacy cost model always computes the cost for uniforms as cost of VF = 1, but VPWidenCallRecipes would be created, as setVectorizedCallDecisions would not consider uniform calls. Fix setVectorizedCallDecision to set to Scalarize, if the call is uniform-after-vectorization. This fixes a bug in VPlan construction uncovered by the VPlan-based cost model. Fixes https://github.com/llvm/llvm-project/issues/111040.	2024-10-06 10:35:06 +01:00
Florian Hahn	68210c7c26	[VPlan] Only generate first lane for VPPredInstPHI if no others used. IF only the first lane of the result is used, only generate the first lane. Fixes https://github.com/llvm/llvm-project/issues/111042.	2024-10-05 19:15:05 +01:00
Florian Hahn	9de327c94d	[LV] Generalize predication checks from 2c8836c899 for operands. This fixes another case where the VPlan-based and legacy cost models disagree. If any of the operands is predicated, it can't be trivially hoisted and we should consider the cost for evaluating it each loop iteration. Fixes https://github.com/llvm/llvm-project/issues/108697.	2024-10-02 20:16:41 +01:00
Florian Hahn	6d6eea92e3	[LV] Use SCEV to simplify wide binop operand to constant. The legacy cost model uses SCEV to determine if the second operand of a binary op is a constant. Update the VPlan construction logic to mirror the current legacy behavior, to fix a difference in the cost models. Fixes https://github.com/llvm/llvm-project/issues/109528. Fixes https://github.com/llvm/llvm-project/issues/110440.	2024-10-02 13:45:49 +01:00
Florian Hahn	2c8836c899	[LV] Don't consider predicated insts as invariant unconditionally in CM. Predicated instructions cannot hoisted trivially, so don't treat them as uniform value in the cost model. This fixes a difference between legacy and VPlan-based cost model. Fixes https://github.com/llvm/llvm-project/issues/110295.	2024-09-29 20:31:24 +01:00
Florian Hahn	68ed1728bf	[VPlan] Unify mayWriteToMemory and mayHaveSideEffects logic for VPInst. Unify logic for mayWriteToMemory and mayHaveSideEffects for VPInstruction, with the later relying on the former. Also extend to handle binary operators. Split off from https://github.com/llvm/llvm-project/pull/106441	2024-09-26 19:16:43 +01:00
Florian Hahn	53266f73f0	[VPlan] Run DCE after unrolling. This cleans up a number of dead recipes after unrolling if only their first or last parts are used. This simplifies a number of tests. Fixes https://github.com/llvm/llvm-project/issues/109581.	2024-09-22 22:08:46 +01:00
Florian Hahn	8ec406757c	[VPlan] Implement unrolling as VPlan-to-VPlan transform. (#95842 ) This patch implements explicit unrolling by UF as VPlan transform. In follow up patches this will allow simplifying VPTransform state (no need to store unrolled parts) as well as recipe execution (no need to generate code for multiple parts in an each recipe). It also allows for more general optimziations (e.g. avoid generating code for recipes that are uniform-across parts). It also unifies the logic dealing with unrolled parts in a single place, rather than spreading it out across multiple places (e.g. VPlan post processing for header-phi recipes previously.) In the initial implementation, a number of recipes still take the unrolled part as additional, optional argument, if their execution depends on the unrolled part. The computation for start/step values for scalable inductions changed slightly. Previously the step would be computed as scalar and then splatted, now vscale gets splatted and multiplied by the step in a vector mul. This has been split off https://github.com/llvm/llvm-project/pull/94339 which also includes changes to simplify VPTransfomState and recipes' ::execute. The current version mostly leaves existing ::execute untouched and instead sets VPTransfomState::UF to 1. A follow-up patch will clean up all references to VPTransformState::UF. Another follow-up patch will simplify VPTransformState to only store a single vector value per VPValue. PR: https://github.com/llvm/llvm-project/pull/95842	2024-09-21 19:47:37 +01:00
Florian Hahn	4eb9838409	[VPlan] Generalize VPValue::isDefinedOutsideLoopRegions. Update isDefinedOutsideLoopRegions to check if a recipe is defined outside any region. Split off already approved https://github.com/llvm/llvm-project/pull/95842 now that this can be tested separately after landing VPlan-based LICM https://github.com/llvm/llvm-project/issues/107501	2024-09-20 15:34:00 +01:00
Florian Hahn	a861ed411a	[VPlan] Add initial loop-invariant code motion transform. (#107894 ) Add initial transform to move out loop-invariant recipes. This also helps to fix a divergence between legacy and VPlan-based cost model due to legacy using ScalarEvolution::isLoopInvariant in some cases. Fixes https://github.com/llvm/llvm-project/issues/107501. PR: https://github.com/llvm/llvm-project/pull/107894	2024-09-20 11:22:03 +01:00
David Sherwood	e762d4dac7	[LoopVectorize] Teach LoopVectorizationLegality about more early exits (#107004 ) This patch is split off from PR #88385 and concerns only the code related to the legality of vectorising early exit loops. It is the first step in adding support for vectorisation of a simple class of loops that typically involves searching for something, i.e. for (int i = 0; i < n; i++) { if (p[i] == val) return i; } return n; or for (int i = 0; i < n; i++) { if (p1[i] != p2[i]) return i; } return n; In this initial commit LoopVectorizationLegality will only consider early exit loops legal for vectorising if they follow these criteria: 1. There are no stores in the loop. 2. The loop must have only one early exit like those shown in the above example. I have referred to such exits as speculative early exits, to distinguish from existing support for early exits where the exit-not-taken count is known exactly at compile time. 3. The early exit block dominates the latch block. 4. The latch block must have an exact exit count. 5. There are no loads after the early exit block. 6. The loop must not contain reductions or recurrences. I don't see anything fundamental blocking vectorisation of such loops, but I just haven't done the work to support them yet. 7. We must be able to prove at compile-time that loops will not contain faulting loads. Tests have been added here: Transforms/LoopVectorize/AArch64/simple_early_exit.ll	2024-09-19 09:41:25 +01:00
Florian Hahn	c48a1ebec1	[LV] Remove force-vector-width/force-vector-interleave from X86 test. Update target-specific test to not force VF/UF, but instead use the cost-model. There are similar tests arleady outside X86 and those force VF & UF. With this change, the target specific test checks the cost model. Changes in picked VF/UF are limited to test_pr62954_scalar_epilogue_required, and should preserve the original spirit of the test.	2024-09-17 08:59:24 +01:00
Philip Reames	2c7786e94a	Prefer use of 0.0 over -0.0 for fadd reductions w/nsz (in IR) (#106770 ) This is a follow up to 924907bc6, and is mostly motivated by consistency but does include one additional optimization. In general, we prefer 0.0 over -0.0 as the identity value for an fadd. We use that value in several places, but don't in others. So, let's be consistent and use the same identity (when nsz allows) everywhere. This creates a bunch of test churn, but due to 924907bc6, most of that churn doesn't actually indicate a change in codegen. The exception is that this change enables the use of 0.0 for nsz, but not reasoc, fadd reductions. Or said differently, it allows the neutral value of an ordered fadd reduction to be 0.0.	2024-09-03 09:16:37 -07:00
Simon Pilgrim	6c8746b6e3	[Analysis] getIntrinsicForCallSite - add vectorization support for acos/asin/atan and cosh/sinh/tanh libcalls (#106844 ) Followup to #106584 - ensure acos/asin/atan and cosh/sinh/tanh libcalls correctly map to the llvm intrinsic equivalents	2024-09-03 10:05:56 +01:00
Simon Pilgrim	4d412bedcc	[LoopVectorize][X86] amdlibm-calls.ll - add missing sinh and f64 test coverage to all functions Shows failure to vectorise acos/asin/atan and cosh/sinh/tanh libcalls if they don't have a corresponding veclib mapping	2024-08-31 11:48:22 +01:00
Philip Reames	4b553f4916	Regen a bunch of vectorizer tests to avoid naming churn in upcoming review	2024-08-30 10:13:02 -07:00
Simon Pilgrim	d58d105cda	[Analysis] isTriviallyVectorizable - add vectorization support for acos/asin/atan and cosh/sinh/tanh intrinsics (#106584 ) Show fallback cases in amdlibm tests where it doesn't have that specific op	2024-08-30 16:49:23 +01:00
Simon Pilgrim	81acc84997	[LoopVectorize][X86] amdlibm-calls.ll - add 2/4/8/16 vector widths test checks for fallback to llvm intrinsics Check for cases where there isn't a amdlib call but it still vectorises the math call	2024-08-29 17:31:55 +01:00
Simon Pilgrim	2f95298727	[LoopVectorize][X86] amdlibm-calls.ll - add additional 2/4/8/16 vector widths test checks This should cover most amdlibm functions, but still not added every VF combo (e.g. 2f32/16f64 often vectorises to the llvm intrinsic for that vector type)	2024-08-29 14:27:31 +01:00
Simon Pilgrim	c57abc66e2	[LoopVectorize][X86] amdlibm-calls.ll - cleanup test checks for 2/4/8/16 vector widths This cleans up the existing tests and shows the gaps in the test checks (for instance we're often testing VF4 + VF16 but not VF8 even though amdlibm supports it).	2024-08-29 14:27:31 +01:00
Florian Hahn	0a272d3a17	[LV] Use SCEV to analyze second operand for cost query. Improve operand analysis using SCEV for cost purposes. This fixes a divergence between legacy and VPlan-based cost-modeling after 533e6bbd0d34. Fixes https://github.com/llvm/llvm-project/issues/106248.	2024-08-29 12:08:27 +01:00
Florian Hahn	533e6bbd0d	[VPlan] Simplify live-ins if they are SCEVConstant. The legacy cost model in some parts checks if any of the operands are constants via SCEV. Update VPlan construction to replace live-ins that are constants via SCEV with such constants. This means VPlans (and codegen) reflects what we computing the cost of and removes another case where the legacy and VPlan cost model diverged. Fixes https://github.com/llvm/llvm-project/issues/105722.	2024-08-26 09:15:58 +01:00
Nikita Popov	a105877646	[InstCombine] Remove some of the complexity-based canonicalization (#91185 ) The idea behind this canonicalization is that it allows us to handle less patterns, because we know that some will be canonicalized away. This is indeed very useful to e.g. know that constants are always on the right. However, this is only useful if the canonicalization is actually reliable. This is the case for constants, but not for arguments: Moving these to the right makes it look like the "more complex" expression is guaranteed to be on the left, but this is not actually the case in practice. It fails as soon as you replace the argument with another instruction. The end result is that it looks like things correctly work in tests, while they actually don't. We use the "thwart complexity-based canonicalization" trick to handle this in tests, but it's often a challenge for new contributors to get this right, and based on the regressions this PR originally exposed, we clearly don't get this right in many cases. For this reason, I think that it's better to remove this complexity canonicalization. It will make it much easier to write tests for commuted cases and make sure that they are handled.	2024-08-21 12:02:54 +02:00
Florian Hahn	2ab910c08c	[LV] Check pointer user are in loop when checking for uniform pointers. Widening decisions are not set for users outside the loop. Avoid crashing by only calling isVectorizedMemAccessUse for users in the loop. Fixes https://github.com/llvm/llvm-project/issues/102934.	2024-08-13 09:23:44 +01:00
Florian Hahn	cd08fadd03	[LV] Include chains feeding inductions in cost precomputation. Include chain of ops feeding inductions in cost precomputation for inductions, not just the induction increment. In VPlan, those instructions will be cleaned up, as both phi and increment are generated by VPWidenIntOrFpInductionRecipe independently. Fixes https://github.com/llvm/llvm-project/issues/101337.	2024-08-12 14:45:43 +01:00
Florian Hahn	db0603cb7b	[LV] Only OR unique edges when creating block-in masks. This removes redundant ORs of matching masks. Follow-up to f0df4fbd0c7b to reduce the number of redundant ORs for masks.	2024-08-12 10:17:40 +01:00
Florian Hahn	5a42a677aa	[VPlan] Mark VPVectorPointer as only using the first part of the ptr. VPVectorPointerRecipe only uses the first part of the pointer operand, so mark it accordingly. Follow-up suggested as part of https://github.com/llvm/llvm-project/pull/99808.	2024-08-12 08:46:55 +01:00
Farzon Lotfi	efc6b50d2d	[LoopVectorize][X86][AMDLibm] Add Missing AMD LibM trig vector intrinsics (#101125 ) Adding the following linked to their docs: - [amd_vrs16_acosf](`9c0b67293b/scripts/libalm.def (L221)`) - [amd_vrd2_cosh](`9c0b67293b/scripts/libalm.def (L124)`) - [amd_vrs16_tanhf](`9c0b67293b/scripts/libalm.def (L224)`)	2024-08-11 22:11:09 -04:00
Florian Hahn	60680f7181	[LV] Handle SwitchInst in ::isPredicatedInst. After f0df4fbd0c7b, isPredicatedInst needs to handle SwitchInst as well. Handle it the same as BranchInst. This fixes a crash in the newly added test and improves the results for one of the existing tests in predicate-switch.ll Should fix https://lab.llvm.org/buildbot/#/builders/113/builds/2099.	2024-08-11 20:56:58 +01:00
Florian Hahn	f0df4fbd0c	[LV] Support generating masks for switch terminators. (#99808 ) Update createEdgeMask to created masks where the terminator in Src is a switch. We need to handle 2 separate cases: 1. Dst is not the default desintation. Dst is reached if any of the cases with destination == Dst are taken. Join the conditions for each case where destination == Dst using a logical OR. 2. Dst is the default destination. Dst is reached if none of the cases with destination != Dst are taken. Join the conditions for each case where the destination is != Dst using a logical OR and negate it. Edge masks are created for every destination of cases and/or default when requesting a mask where the source is a switch. Fixes https://github.com/llvm/llvm-project/issues/48188. PR: https://github.com/llvm/llvm-project/pull/99808	2024-08-11 20:38:36 +02:00
Florian Hahn	fdb9f96fa2	[LV] Consider earlier stores to invariant reduction address as dead. For invariant stores to an address of a reduction, only the latest store will be generated outside the loop. Consider earlier stores as dead. This fixes a difference between the legacy and VPlan-based cost model. Fixes https://github.com/llvm/llvm-project/issues/96294.	2024-08-04 20:54:26 +01:00
Florian Hahn	855703537e	[LV] Add more tests with switches. Extra tests for https://github.com/llvm/llvm-project/pull/99808, including cost model tests.	2024-08-01 19:30:48 +01:00
Farzon Lotfi	378fe2fc23	[X86][LoopVectorize] Add support for arc and hyperbolic trig functions (#99383 ) This change is part 2 x86 Loop Vectorization of : https://github.com/llvm/llvm-project/pull/96222 It also has veclib call loop vectorization hence the test cases in `llvm/test/Transforms/LoopVectorize/X86/veclib-calls.ll` finally the last pr missed tests for `llvm/test/CodeGen/X86/fp-strict-libcalls-msvc32.ll` and `llvm/test/CodeGen/X86/vec-libcalls.ll` so added those aswell. No evidence was found for arc and hyperbolic trig glibc vector math functions https://github.com/lattera/glibc/blob/master/sysdeps/x86/fpu/bits/math-vector.h so no new `_ZGVbN2v_` and `_ZGVdN4v_` . So no new tests in `llvm/test/Transforms/LoopVectorize/X86/libm-vector-calls-VF2-VF8.ll` Also no new svml and no new tests to: `llvm/test/Transforms/LoopVectorize/X86/svml-calls.ll` There was not enough evidence that there were svml arc and hyperbolic trig vector implementations, Documentation was scarces so looked at test cases in [numpy](`32bf2a9842/linux/avx512/svml_z0_acos_d_la.s (L8)`). Someone with more experience with svml should investigate. ## Note amd libm doesn't have a vector hyperbolic sine api hence why youi might notice there are no tests for `sinh`. ## History This change is part of https://github.com/llvm/llvm-project/issues/87367's investigation on supporting IEEE math operations as intrinsics. Which was discussed in this RFC: https://discourse.llvm.org/t/rfc-all-the-math-intrinsics/78294 This change adds loop vectorization for `acos`, `asin`, `atan`, `cosh`, `sinh`, and `tanh`. resolves #70079 resolves #70080 resolves #70081 resolves #70083 resolves #70084 resolves #95966	2024-07-28 20:57:43 -04:00
Florian Hahn	a3092152ac	[VPlan] Don't create live-outs for induction increments. Follow up to fc9cd3272b5 to also skip creating live-outs for IV increments, as those are also generated independent of VPlan for now.	2024-07-25 21:34:55 +01:00
Simon Pilgrim	010dcfd85f	[CostModel][X86] Improve add/sub/mul overflow intrinsic costs Noticed due to x86 changes in #97463	2024-07-25 16:01:05 +01:00
Florian Hahn	72532c9219	[LV] Don't predicate divs with invariant divisor when folding tail (#98904 ) When folding the tail, at least one of the lanes must execute unconditionally. If the divisor is loop-invariant no predication is needed, as predication would not prevent the divide-by-0 on the executed lane. Depends on https://github.com/llvm/llvm-project/pull/98892. PR: https://github.com/llvm/llvm-project/pull/98904	2024-07-25 12:21:09 +01:00
Florian Hahn	05f986e143	[LV] Add tests for loops with switches.	2024-07-21 10:11:38 +01:00
Florian Hahn	710dab6e18	[VPlan] Remove VPPredInstPHIRecipes without users after region merging. After merging replicate regions, VPPredInstPHIRecipes may become unused. Remove them directly instead of moving them to the merged region.	2024-07-20 13:21:32 +01:00
Florian Hahn	008df3cf85	[LV] Check isPredInst instead of isScalarWithPred in uniform analysis. (#98892 ) Any instruction marked as uniform will result in a uniform VPReplicateRecipe. If it requires predication, it will be placed in a replicate region, even if isScalarWithPredication returns false. Check isPredicatedInst instead of isScalarWithPredication to avoid generating uniform VPReplicateRecipes placed inside a replicate region. This fixes an assertion when using scalable VFs. Fixes https://github.com/llvm/llvm-project/issues/80416. Fixes https://github.com/llvm/llvm-project/issues/94328. Fixes https://github.com/llvm/llvm-project/issues/99625. PR: https://github.com/llvm/llvm-project/pull/98892	2024-07-19 12:02:25 +01:00
Florian Hahn	2bb65660ae	[LV] Allow re-processing of operands of instrs feeding interleave group Follow up to d216615518 to update dead interleave group pointer detection to allow re-processing of operands of instructions determined to only feed interleave groups. This is needed because instructions feeding interleave group pointers can become dead in any order, as per the newly added test case.	2024-07-17 21:37:28 +01:00
Florian Hahn	d216615518	[LV] Process dead interleave pointer ops in reverse order. Process dead interleave pointer ops in reverse order. This also catches cases where the same base pointer is used by multiple different interleave groups. This fixes another case where the legacy cost model inaccuarately estimates cost, surfaced by b841e2eca3b5c8.	2024-07-17 11:43:42 +01:00
Florian Hahn	967eba0754	[LV] Add test cases for tail-folding sdiv/udiv/urem feeding geps. Based on reduced tests from https://github.com/llvm/llvm-project/issues/94328.	2024-07-15 11:45:07 +01:00
Florian Hahn	8fcb822da6	[LV] Add uses of result to pointer-runtime-checks-unprofitable.ll test. Otherwise %p.2 is not used and will be removed by VPlan transforms, leading to a difference between legacy and VPlan-based cost.	2024-07-15 09:59:46 +01:00
Florian Hahn	fc9cd3272b	[VPlan] Don't add live-outs for IV phis. Resume and exit values for inductions are currently still created outside of VPlan and independent of the induction recipes. Don't add live-outs for now, as the additional unneeded users can pessimize other anlysis. Fixes https://github.com/llvm/llvm-project/issues/98660.	2024-07-14 20:49:03 +01:00
Florian Hahn	7a49d80f58	[VPlan] Skip users outside loop in check for exit pre-compute candidates When collecting candidates to pre-compute cost for operands of exit conditions, skip users outside the loop when checking if they are in ExistInstrs. The users outside the loop should be ignored, as they won't make a value live in the VPlan. This fixes a failure when building for X86 with sanitizers on macOS after b841e2eca3b5c (https://green.lab.llvm.org/job/llvm.org/job/clang-stage2-cmake-RgSan/287/)	2024-07-11 22:04:39 +01:00

1 2 3 4 5 ...

800 Commits