llvm-project

Author	SHA1	Message	Date
Florian Hahn	c48a1ebec1	[LV] Remove force-vector-width/force-vector-interleave from X86 test. Update target-specific test to not force VF/UF, but instead use the cost-model. There are similar tests arleady outside X86 and those force VF & UF. With this change, the target specific test checks the cost model. Changes in picked VF/UF are limited to test_pr62954_scalar_epilogue_required, and should preserve the original spirit of the test.	2024-09-17 08:59:24 +01:00
Philip Reames	2c7786e94a	Prefer use of 0.0 over -0.0 for fadd reductions w/nsz (in IR) (#106770 ) This is a follow up to 924907bc6, and is mostly motivated by consistency but does include one additional optimization. In general, we prefer 0.0 over -0.0 as the identity value for an fadd. We use that value in several places, but don't in others. So, let's be consistent and use the same identity (when nsz allows) everywhere. This creates a bunch of test churn, but due to 924907bc6, most of that churn doesn't actually indicate a change in codegen. The exception is that this change enables the use of 0.0 for nsz, but not reasoc, fadd reductions. Or said differently, it allows the neutral value of an ordered fadd reduction to be 0.0.	2024-09-03 09:16:37 -07:00
Simon Pilgrim	6c8746b6e3	[Analysis] getIntrinsicForCallSite - add vectorization support for acos/asin/atan and cosh/sinh/tanh libcalls (#106844 ) Followup to #106584 - ensure acos/asin/atan and cosh/sinh/tanh libcalls correctly map to the llvm intrinsic equivalents	2024-09-03 10:05:56 +01:00
Simon Pilgrim	4d412bedcc	[LoopVectorize][X86] amdlibm-calls.ll - add missing sinh and f64 test coverage to all functions Shows failure to vectorise acos/asin/atan and cosh/sinh/tanh libcalls if they don't have a corresponding veclib mapping	2024-08-31 11:48:22 +01:00
Philip Reames	4b553f4916	Regen a bunch of vectorizer tests to avoid naming churn in upcoming review	2024-08-30 10:13:02 -07:00
Simon Pilgrim	d58d105cda	[Analysis] isTriviallyVectorizable - add vectorization support for acos/asin/atan and cosh/sinh/tanh intrinsics (#106584 ) Show fallback cases in amdlibm tests where it doesn't have that specific op	2024-08-30 16:49:23 +01:00
Simon Pilgrim	81acc84997	[LoopVectorize][X86] amdlibm-calls.ll - add 2/4/8/16 vector widths test checks for fallback to llvm intrinsics Check for cases where there isn't a amdlib call but it still vectorises the math call	2024-08-29 17:31:55 +01:00
Simon Pilgrim	2f95298727	[LoopVectorize][X86] amdlibm-calls.ll - add additional 2/4/8/16 vector widths test checks This should cover most amdlibm functions, but still not added every VF combo (e.g. 2f32/16f64 often vectorises to the llvm intrinsic for that vector type)	2024-08-29 14:27:31 +01:00
Simon Pilgrim	c57abc66e2	[LoopVectorize][X86] amdlibm-calls.ll - cleanup test checks for 2/4/8/16 vector widths This cleans up the existing tests and shows the gaps in the test checks (for instance we're often testing VF4 + VF16 but not VF8 even though amdlibm supports it).	2024-08-29 14:27:31 +01:00
Florian Hahn	0a272d3a17	[LV] Use SCEV to analyze second operand for cost query. Improve operand analysis using SCEV for cost purposes. This fixes a divergence between legacy and VPlan-based cost-modeling after 533e6bbd0d34. Fixes https://github.com/llvm/llvm-project/issues/106248.	2024-08-29 12:08:27 +01:00
Florian Hahn	533e6bbd0d	[VPlan] Simplify live-ins if they are SCEVConstant. The legacy cost model in some parts checks if any of the operands are constants via SCEV. Update VPlan construction to replace live-ins that are constants via SCEV with such constants. This means VPlans (and codegen) reflects what we computing the cost of and removes another case where the legacy and VPlan cost model diverged. Fixes https://github.com/llvm/llvm-project/issues/105722.	2024-08-26 09:15:58 +01:00
Nikita Popov	a105877646	[InstCombine] Remove some of the complexity-based canonicalization (#91185 ) The idea behind this canonicalization is that it allows us to handle less patterns, because we know that some will be canonicalized away. This is indeed very useful to e.g. know that constants are always on the right. However, this is only useful if the canonicalization is actually reliable. This is the case for constants, but not for arguments: Moving these to the right makes it look like the "more complex" expression is guaranteed to be on the left, but this is not actually the case in practice. It fails as soon as you replace the argument with another instruction. The end result is that it looks like things correctly work in tests, while they actually don't. We use the "thwart complexity-based canonicalization" trick to handle this in tests, but it's often a challenge for new contributors to get this right, and based on the regressions this PR originally exposed, we clearly don't get this right in many cases. For this reason, I think that it's better to remove this complexity canonicalization. It will make it much easier to write tests for commuted cases and make sure that they are handled.	2024-08-21 12:02:54 +02:00
Florian Hahn	2ab910c08c	[LV] Check pointer user are in loop when checking for uniform pointers. Widening decisions are not set for users outside the loop. Avoid crashing by only calling isVectorizedMemAccessUse for users in the loop. Fixes https://github.com/llvm/llvm-project/issues/102934.	2024-08-13 09:23:44 +01:00
Florian Hahn	cd08fadd03	[LV] Include chains feeding inductions in cost precomputation. Include chain of ops feeding inductions in cost precomputation for inductions, not just the induction increment. In VPlan, those instructions will be cleaned up, as both phi and increment are generated by VPWidenIntOrFpInductionRecipe independently. Fixes https://github.com/llvm/llvm-project/issues/101337.	2024-08-12 14:45:43 +01:00
Florian Hahn	db0603cb7b	[LV] Only OR unique edges when creating block-in masks. This removes redundant ORs of matching masks. Follow-up to f0df4fbd0c7b to reduce the number of redundant ORs for masks.	2024-08-12 10:17:40 +01:00
Florian Hahn	5a42a677aa	[VPlan] Mark VPVectorPointer as only using the first part of the ptr. VPVectorPointerRecipe only uses the first part of the pointer operand, so mark it accordingly. Follow-up suggested as part of https://github.com/llvm/llvm-project/pull/99808.	2024-08-12 08:46:55 +01:00
Farzon Lotfi	efc6b50d2d	[LoopVectorize][X86][AMDLibm] Add Missing AMD LibM trig vector intrinsics (#101125 ) Adding the following linked to their docs: - [amd_vrs16_acosf](`9c0b67293b/scripts/libalm.def (L221)`) - [amd_vrd2_cosh](`9c0b67293b/scripts/libalm.def (L124)`) - [amd_vrs16_tanhf](`9c0b67293b/scripts/libalm.def (L224)`)	2024-08-11 22:11:09 -04:00
Florian Hahn	60680f7181	[LV] Handle SwitchInst in ::isPredicatedInst. After f0df4fbd0c7b, isPredicatedInst needs to handle SwitchInst as well. Handle it the same as BranchInst. This fixes a crash in the newly added test and improves the results for one of the existing tests in predicate-switch.ll Should fix https://lab.llvm.org/buildbot/#/builders/113/builds/2099.	2024-08-11 20:56:58 +01:00
Florian Hahn	f0df4fbd0c	[LV] Support generating masks for switch terminators. (#99808 ) Update createEdgeMask to created masks where the terminator in Src is a switch. We need to handle 2 separate cases: 1. Dst is not the default desintation. Dst is reached if any of the cases with destination == Dst are taken. Join the conditions for each case where destination == Dst using a logical OR. 2. Dst is the default destination. Dst is reached if none of the cases with destination != Dst are taken. Join the conditions for each case where the destination is != Dst using a logical OR and negate it. Edge masks are created for every destination of cases and/or default when requesting a mask where the source is a switch. Fixes https://github.com/llvm/llvm-project/issues/48188. PR: https://github.com/llvm/llvm-project/pull/99808	2024-08-11 20:38:36 +02:00
Florian Hahn	fdb9f96fa2	[LV] Consider earlier stores to invariant reduction address as dead. For invariant stores to an address of a reduction, only the latest store will be generated outside the loop. Consider earlier stores as dead. This fixes a difference between the legacy and VPlan-based cost model. Fixes https://github.com/llvm/llvm-project/issues/96294.	2024-08-04 20:54:26 +01:00
Florian Hahn	855703537e	[LV] Add more tests with switches. Extra tests for https://github.com/llvm/llvm-project/pull/99808, including cost model tests.	2024-08-01 19:30:48 +01:00
Farzon Lotfi	378fe2fc23	[X86][LoopVectorize] Add support for arc and hyperbolic trig functions (#99383 ) This change is part 2 x86 Loop Vectorization of : https://github.com/llvm/llvm-project/pull/96222 It also has veclib call loop vectorization hence the test cases in `llvm/test/Transforms/LoopVectorize/X86/veclib-calls.ll` finally the last pr missed tests for `llvm/test/CodeGen/X86/fp-strict-libcalls-msvc32.ll` and `llvm/test/CodeGen/X86/vec-libcalls.ll` so added those aswell. No evidence was found for arc and hyperbolic trig glibc vector math functions https://github.com/lattera/glibc/blob/master/sysdeps/x86/fpu/bits/math-vector.h so no new `_ZGVbN2v_` and `_ZGVdN4v_` . So no new tests in `llvm/test/Transforms/LoopVectorize/X86/libm-vector-calls-VF2-VF8.ll` Also no new svml and no new tests to: `llvm/test/Transforms/LoopVectorize/X86/svml-calls.ll` There was not enough evidence that there were svml arc and hyperbolic trig vector implementations, Documentation was scarces so looked at test cases in [numpy](`32bf2a9842/linux/avx512/svml_z0_acos_d_la.s (L8)`). Someone with more experience with svml should investigate. ## Note amd libm doesn't have a vector hyperbolic sine api hence why youi might notice there are no tests for `sinh`. ## History This change is part of https://github.com/llvm/llvm-project/issues/87367's investigation on supporting IEEE math operations as intrinsics. Which was discussed in this RFC: https://discourse.llvm.org/t/rfc-all-the-math-intrinsics/78294 This change adds loop vectorization for `acos`, `asin`, `atan`, `cosh`, `sinh`, and `tanh`. resolves #70079 resolves #70080 resolves #70081 resolves #70083 resolves #70084 resolves #95966	2024-07-28 20:57:43 -04:00
Florian Hahn	a3092152ac	[VPlan] Don't create live-outs for induction increments. Follow up to fc9cd3272b5 to also skip creating live-outs for IV increments, as those are also generated independent of VPlan for now.	2024-07-25 21:34:55 +01:00
Simon Pilgrim	010dcfd85f	[CostModel][X86] Improve add/sub/mul overflow intrinsic costs Noticed due to x86 changes in #97463	2024-07-25 16:01:05 +01:00
Florian Hahn	72532c9219	[LV] Don't predicate divs with invariant divisor when folding tail (#98904 ) When folding the tail, at least one of the lanes must execute unconditionally. If the divisor is loop-invariant no predication is needed, as predication would not prevent the divide-by-0 on the executed lane. Depends on https://github.com/llvm/llvm-project/pull/98892. PR: https://github.com/llvm/llvm-project/pull/98904	2024-07-25 12:21:09 +01:00
Florian Hahn	05f986e143	[LV] Add tests for loops with switches.	2024-07-21 10:11:38 +01:00
Florian Hahn	710dab6e18	[VPlan] Remove VPPredInstPHIRecipes without users after region merging. After merging replicate regions, VPPredInstPHIRecipes may become unused. Remove them directly instead of moving them to the merged region.	2024-07-20 13:21:32 +01:00
Florian Hahn	008df3cf85	[LV] Check isPredInst instead of isScalarWithPred in uniform analysis. (#98892 ) Any instruction marked as uniform will result in a uniform VPReplicateRecipe. If it requires predication, it will be placed in a replicate region, even if isScalarWithPredication returns false. Check isPredicatedInst instead of isScalarWithPredication to avoid generating uniform VPReplicateRecipes placed inside a replicate region. This fixes an assertion when using scalable VFs. Fixes https://github.com/llvm/llvm-project/issues/80416. Fixes https://github.com/llvm/llvm-project/issues/94328. Fixes https://github.com/llvm/llvm-project/issues/99625. PR: https://github.com/llvm/llvm-project/pull/98892	2024-07-19 12:02:25 +01:00
Florian Hahn	2bb65660ae	[LV] Allow re-processing of operands of instrs feeding interleave group Follow up to d216615518 to update dead interleave group pointer detection to allow re-processing of operands of instructions determined to only feed interleave groups. This is needed because instructions feeding interleave group pointers can become dead in any order, as per the newly added test case.	2024-07-17 21:37:28 +01:00
Florian Hahn	d216615518	[LV] Process dead interleave pointer ops in reverse order. Process dead interleave pointer ops in reverse order. This also catches cases where the same base pointer is used by multiple different interleave groups. This fixes another case where the legacy cost model inaccuarately estimates cost, surfaced by b841e2eca3b5c8.	2024-07-17 11:43:42 +01:00
Florian Hahn	967eba0754	[LV] Add test cases for tail-folding sdiv/udiv/urem feeding geps. Based on reduced tests from https://github.com/llvm/llvm-project/issues/94328.	2024-07-15 11:45:07 +01:00
Florian Hahn	8fcb822da6	[LV] Add uses of result to pointer-runtime-checks-unprofitable.ll test. Otherwise %p.2 is not used and will be removed by VPlan transforms, leading to a difference between legacy and VPlan-based cost.	2024-07-15 09:59:46 +01:00
Florian Hahn	fc9cd3272b	[VPlan] Don't add live-outs for IV phis. Resume and exit values for inductions are currently still created outside of VPlan and independent of the induction recipes. Don't add live-outs for now, as the additional unneeded users can pessimize other anlysis. Fixes https://github.com/llvm/llvm-project/issues/98660.	2024-07-14 20:49:03 +01:00
Florian Hahn	7a49d80f58	[VPlan] Skip users outside loop in check for exit pre-compute candidates When collecting candidates to pre-compute cost for operands of exit conditions, skip users outside the loop when checking if they are in ExistInstrs. The users outside the loop should be ignored, as they won't make a value live in the VPlan. This fixes a failure when building for X86 with sanitizers on macOS after b841e2eca3b5c (https://green.lab.llvm.org/job/llvm.org/job/clang-stage2-cmake-RgSan/287/)	2024-07-11 22:04:39 +01:00
Florian Hahn	9a5a8731e7	[VPlan] Introduce ResumePhi VPInstruction, use to create phi for FOR. (#94760 ) This patch introduces a new ResumePhi VPInstruction which creates a phi in a leaf block of a VPlan. The first use is to create the phi node for fixed-order recurrence resume values in the scalar preheader. The VPInstruction takes 2 operands: 1) the incoming value from the middle-block and a default value to be used for all other incoming blocks. In follow-up changes, it will also be used to create phis for reduction and induction resume values. Depends on https://github.com/llvm/llvm-project/pull/92651 PR: https://github.com/llvm/llvm-project/pull/94760	2024-07-11 16:08:04 +01:00
Florian Hahn	ef89e3efa9	[VPlan] Collect ephemeral values for VPlan. Port collectEphemeralValues to VPlan as collectEphemeralRecipesForVPlan, use it in willGenerateVectors. This fixes a regression caused by 29b8b72117 for loops where the only vector values are ephemeral.	2024-07-09 21:34:49 +01:00
Florian Hahn	27ccc8835e	[LV] Add tests with ephemeral values that are widened. Add tests with loops with ephemeral values that are widened. After 29b8b72117, @ephemeral_load_and_compare_another_load_used_outside is vectorized even though the only vector values that are generated are ephemeral.	2024-07-08 13:15:39 +01:00
Florian Hahn	29b8b72117	[LV] Move check if any vector insts will be generated to VPlan. (#96622 ) This patch moves the check if any vector instructions will be generated from getInstructionCost to be based on VPlan. This simplifies getInstructionCost, is more accurate as we check the final result and also allows us to exit early once we visit a recipe that generates vector instructions. The helper can then be re-used by the VPlan-based cost model to match the legacy selectVectorizationFactor behavior, this fixing a crash and paving the way to recommit https://github.com/llvm/llvm-project/pull/92555. PR: https://github.com/llvm/llvm-project/pull/96622	2024-07-07 20:08:01 +01:00
Florian Hahn	99d6c6d936	[VPlan] Model branch cond to enter scalar epilogue in VPlan. (#92651 ) This patch moves branch condition creation to enter the scalar epilogue loop to VPlan. Modeling the branch in the middle block also requires modeling the successor blocks. This is done using the recently introduced VPIRBasicBlock. Note that the middle.block is still created as part of the skeleton and then patched in during VPlan execution. Unfortunately the skeleton needs to create the middle.block early on, as it is also used for induction resume value creation and is also needed to properly update the dominator tree during skeleton creation. After this patch lands, I plan to move induction resume value and phi node creation in the scalar preheader to VPlan. Once that is done, we should be able to create the middle.block in VPlan directly. This is a re-worked version based on the earlier https://reviews.llvm.org/D150398 and the main change is the use of VPIRBasicBlock. Depends on https://github.com/llvm/llvm-project/pull/92525 PR: https://github.com/llvm/llvm-project/pull/92651	2024-07-05 10:08:42 +01:00
Noah Goldstein	7c96469ea8	[ValueTracking] Extend LHS/RHS with matching operand to work without constants. Previously we only handled the `L0 == R0` case if both `L1` and `R1` where constant. We can get more out of the analysis using general constant ranges instead. For example, `X u> Y` implies `X != 0`. In general, any strict comparison on `X` implies that `X` is not equal to the boundary value for the sign and constant ranges with/without sign bits can be useful in deducing implications. Closes #85557	2024-07-03 20:18:51 +08:00
David Green	352a836176	[InstCombine] Canonicalize non-i8 gep of mul to i8 (#96606 ) This is a small canonicalization for `gep i32, p, (mul x, C)` -> `gep i8, p, (mul x, C*4)`, so that the mul can combine both of the constant multiplications, and we take a small step towards canonicalizing more geps to i8. It currently doesn't attempt to check for multiple uses on the mul, but that should be possible if it sounds better. Let me know what you think of the idea in general.	2024-06-26 14:25:54 +01:00
Florian Hahn	8681bb8bed	[LV] Add additional test coverage for cost modeling. Add missing tests uncovered by https://github.com/llvm/llvm-project/pull/92555. Includes test for https://github.com/llvm/llvm-project/issues/96294 and https://github.com/llvm/llvm-project/issues/96328	2024-06-26 10:18:01 +01:00
Nikita Popov	eeb0884e66	[LoopUnroll] Use poison instead of undef for preheader value	2024-06-25 12:09:58 +02:00
Florian Hahn	3808ba78de	[VPlan] Model middle block via VPIRBasicBlock. (#95816 ) Use VPIRBasicBlock to wrap the middle block and implement patching up branches in predecessors in VPIRBasicBlock::execute. The IR middle block is only created after skeleton creation. Initially a regular VPBasicBlock is created, which will later be replaced by a VPIRBasicBlock once the middle IR basic block has been created. Note that this slightly changes the order of instructions created in the middle block; code generated by recipe execution in the middle block will now be inserted before the terminator (and in between the compare to used by the terminator). The original order will be restored in https://github.com/llvm/llvm-project/pull/92651. PR: https://github.com/llvm/llvm-project/pull/95816	2024-06-20 13:42:20 +01:00
Florian Hahn	b9702bb12f	[LV] Consider insts feeding interleave group pointers free. For interleave groups, we only generate a pointer for the start of the interleave group (the instruction at the insert position). The other addresses for other members are alreayd considered free, but so are their operands, if they are only used in address computations for other interleave group members.	2024-06-19 17:06:52 +01:00
Florian Hahn	3be7312f81	[LV] Add more masked store cost tests with different masks. Add additional masked store tests which caused crashes with earlier versions of https://github.com/llvm/llvm-project/pull/92555.	2024-06-19 15:34:03 +01:00
Florian Hahn	fb86cb7ec1	[LV] Add extra tests for interleave-group, reduction store costing. Add extra cost model tests exposed by VPlan cost-model transition, causing revert in 6f538f6a2d3224efda985e9eb09012fa4275ea92	2024-06-18 14:35:51 +01:00
Florian Hahn	52d29eb287	[LV] Add extra cost model tests with truncated inductions. Extra test cases that caused revert of https://github.com/llvm/llvm-project/pull/92555	2024-06-13 20:42:53 +01:00
Florian Hahn	2e4c06780c	[LV] Add extra X86 cost tests for any_of reduction and multi-exit loops. Add extra test coverage to ensure decisions do not change when transitioning to a VPlan-based cost model.	2024-06-10 13:13:04 +01:00
Florian Hahn	998c33e5fc	[VPlan] Mark FirstOrderRecurrenceSplice as not having side-effects. Now that FOR exit and resume value creation is explicitly modeled in VPlan (05e1b5340b0caf1, 07b330132c0b) it doesn't depend on the first order recurrence splice being preserved and it can now be marked as not having side-effects. This allows removal of first-order-recurrence-splce if the FOR is only used in the exit or as scalar ph resume value.	2024-06-08 21:40:30 +01:00

1 2 3 4 5 ...

784 Commits