llvm-project

Author	SHA1	Message	Date
Paschalis Mpeis	e50c4c83b6	[AArch64][TLI] Add TLI mappings for ArmPL modf, sincos, sincospi (#83143 ) ArmPL 24.04 release fixes a bug concerning these methods, so now they can be re-introduced to TLI mappings.	2024-04-10 09:34:46 +01:00
Florian Hahn	a8ec1eb843	[VPlan] Dont assign slots to VPValues with an underlying value. This makes sure the numbering for VPValues without underlying values is consecutive.	2024-04-09 21:30:51 +01:00
Simon Pilgrim	3bfd5c6424	[TTI] getCommonMaskedMemoryOpCost - consistently use getScalarizationOverhead instead of ExtractElement costs for address/mask extraction. (#87771 ) These aren't unknown extraction indices, we will be extracting every address/mask element in sequence.	2024-04-09 15:42:51 +01:00
Florian Hahn	c836983671	[VPlan] Remove unused first mask op from VPBlendRecipe. (#87770 ) VPBlendRecipe does not use the first mask operand. Removing it allows VPlan-based DCE to remove unused mask computations. This also fixes #87410, where unused Not VPInstructions are considered having only their first lane demanded, but some of their operands providing a vector value due to other users. Fixes https://github.com/llvm/llvm-project/issues/87410 PR: https://github.com/llvm/llvm-project/pull/87770	2024-04-09 11:14:05 +01:00
Florian Hahn	fa8a726672	[LV] Make global_alias.ll test independent of O1 pipeline. Update global_alias.ll with the IR after the O1 pipeline. Depending on the O1 makes the tests more fragile and also makes it more difficult to reason about the behavior of the tests, as it doesn't show the IR before LoopVectorize.	2024-04-06 14:48:41 +01:00
Florian Hahn	233c030dcb	[LV] Add extra tests for induction cost modeling.	2024-04-06 12:36:07 +01:00
Florian Hahn	c6e38b928c	Reapply "[LV] Improve AnyOf reduction codegen. (#78304 )" This reverts the revert commit 589c7abb03448. This patch includes a fix for any-of reductions and epilogue vectorization. Extra test coverage for the issue that caused the revert has been added in 399ff08e29d. -------------------------------- Original commit message: Update AnyOf reduction code generation to only keep track of the AnyOf property in a boolean vector in the loop, only selecting either the new or start value in the middle block. The patch incorporates feedback from https://reviews.llvm.org/D153697. This fixes the #62565, as now there aren't multiple uses of the start/new values. Fixes https://github.com/llvm/llvm-project/issues/62565 PR: https://github.com/llvm/llvm-project/pull/78304	2024-04-05 13:45:13 +01:00
Alexey Bataev	413a66f339	[LV, VP]VP intrinsics support for the Loop Vectorizer + adding new tail-folding mode using EVL. (#76172 ) This patch introduces generating VP intrinsics in the Loop Vectorizer. Currently the Loop Vectorizer supports vector predication in a very limited capacity via tail-folding and masked load/store/gather/scatter intrinsics. However, this does not let architectures with active vector length predication support take advantage of their capabilities. Architectures with general masked predication support also can only take advantage of predication on memory operations. By having a way for the Loop Vectorizer to generate Vector Predication intrinsics, which (will) provide a target-independent way to model predicated vector instructions. These architectures can make better use of their predication capabilities. Our first approach (implemented in this patch) builds on top of the existing tail-folding mechanism in the LV (just adds a new tail-folding mode using EVL), but instead of generating masked intrinsics for memory operations it generates VP intrinsics for loads/stores instructions. The patch adds a new VPlanTransforms to replace the wide header predicate compare with EVL and updates codegen for load/stores to use VP store/load with EVL. Other important part of this approach is how the Explicit Vector Length is computed. (VP intrinsics define this vector length parameter as Explicit Vector Length (EVL)). We use an experimental intrinsic `get_vector_length`, that can be lowered to architecture specific instruction(s) to compute EVL. Also, added a new recipe to emit instructions for computing EVL. Using VPlan in this way will eventually help build and compare VPlans corresponding to different strategies and alternatives. Differential Revision: https://reviews.llvm.org/D99750	2024-04-04 18:30:17 -04:00
Florian Hahn	7bd163d0a4	[VPlan] Clean up dead recipes after UF & VF specific simplification. Recursively remove dead recipes after simplifying vector loop exit branch.	2024-04-04 12:05:08 +01:00
Florian Hahn	399ff08e29	[LV] Precommit tests with any-of reductions and epilogue vectorization. Test case for failures from https://lab.llvm.org/buildbot/#/builders/74/builds/26697 caused the revert of 95fef1d in 589c7ab.	2024-04-03 13:32:32 +01:00
Florian Hahn	89271b4676	[LV] Add test depending on target to RISCV subdirectory.	2024-04-02 22:02:25 +01:00
Florian Hahn	6261c53c6f	[VPlan] Make sure OR VPInstructions are treated as disjoint ops. Make sure that VPInstructions with OR opcodes are properly registered as disjoint ops. Fixes https://github.com/llvm/llvm-project/issues/87378.	2024-04-02 21:48:51 +01:00
Florian Hahn	6ef829941b	Recommit "[VPlan] Replace disjoint or with add instead of dropping disjoint. (#83821 )" Recommit with a fix for the use-after-free causing the revert. This reverts the revert commit f872043e055f4163c3c4b1b86ca0354490174987. Original commit message: Dropping disjoint from an OR may yield incorrect results, as some analysis may have converted it to an Add implicitly (e.g. SCEV used for dependence analysis). Instead, replace it with an equivalent Add. This is possible as all users of the disjoint OR only access lanes where the operands are disjoint or poison otherwise. Note that replacing all disjoint ORs with ADDs instead of dropping the flags is not strictly necessary. It is only needed for disjoint ORs that SCEV treated as ADDs, but those are not tracked. There are other places that may drop poison-generating flags; those likely need similar treatment. Fixes https://github.com/llvm/llvm-project/issues/81872 PR: https://github.com/llvm/llvm-project/pull/83821	2024-03-27 19:11:18 +00:00
Florian Hahn	06bb8c9f20	[VPlan] Explicitly handle scalar pointer inductions. (#83068 ) Add a new PtrAdd opcode to VPInstruction that corresponds to IRBuilder::CreatePtrAdd, which creates a GEP with source element type i8. This is then used to model scalarizing VPWidenPointerInductionRecipe by introducing scalar-steps to model the index increment followed by a PtrAdd. Note that PtrAdd needs to be able to generate code for only the first lane or for all lanes. This may warrant introducing a separate recipe for scalarizing that can be created without relying on the underlying IR. Depends on https://github.com/llvm/llvm-project/pull/80271 PR: https://github.com/llvm/llvm-project/pull/83068	2024-03-26 16:01:57 +01:00
Florian Hahn	1081d3a0a7	[VPlan] Mark CanonicalIVIncrementForPart as only using part 0 of IV. CanonicalIVIncrementForPart uses VPIteration(0, 0) of the IV (first operand), mark it as only using part 0. This avoids generating redundant IV increments per part.	2024-03-25 11:27:17 +00:00
Florian Hahn	f0a8738401	[VPlan] Generate CalculateTripCountMinusVF for Part 0 only. (NFCI). The value produced by CalculateTripCountMinusVF VPInstructions is independent of the part. Only compute it for part 0 and use that for other parts.	2024-03-24 20:59:54 +00:00
Benjamin Kramer	f872043e05	Revert "[VPlan] Replace disjoint or with add instead of dropping disjoint. (#83821 )" This reverts commit c2c1e6ee4ce0df3d000ba880fa6cf58441da6462. It creates a use after free. ==8342==ERROR: AddressSanitizer: heap-use-after-free on address 0x50f000001760 at pc 0x55b9fb84a8fb bp 0x7ffc18468a10 sp 0x7ffc18468a08 READ of size 1 at 0x50f000001760 thread T0 #0 0x55b9fb84a8fa in dropPoisonGeneratingFlags llvm/lib/Transforms/Vectorize/VPlan.h:1040:13 #1 0x55b9fb84a8fa in llvm::VPlanTransforms::dropPoisonGeneratingRecipes(llvm::VPlan&, llvm::function_ref<bool (llvm::BasicBlock)>)::$_0::operator()(llvm::VPRecipeBase) const llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp:1236:23 #2 0x55b9fb84a196 in llvm::VPlanTransforms::dropPoisonGeneratingRecipes(llvm::VPlan&, llvm::function_ref<bool (llvm::BasicBlock*)>) llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp Can be reproduced with asan on Transforms/LoopVectorize/AArch64/sve-interleaved-masked-accesses.ll Transforms/LoopVectorize/X86/pr81872.ll Transforms/LoopVectorize/X86/x86-interleaved-accesses-masked-group.ll	2024-03-20 15:14:58 +01:00
Yingwei Zheng	f0420c7bc6	[ValueTracking] Handle `not` in `isImpliedCondition` (#85397 ) This patch handles `not` in `isImpliedCondition` to enable more fold in some multi-use cases.	2024-03-20 16:16:42 +08:00
Noah Goldstein	6960ace534	Revert "[InstCombine] Canonicalize `(sitofp x)` -> `(uitofp x)` if `x >= 0`" This reverts commit d80d5b923c6f611590a12543bdb33e0c16044d44. It wasn't a particularly important transform to begin with and caused some codegen regressions on targets that prefer `sitofp` so dropping. Might re-visit along with adding `nneg` flag to `uitofp` so its easily reversable for the backend.	2024-03-20 00:50:45 -05:00
Florian Hahn	c2c1e6ee4c	[VPlan] Replace disjoint or with add instead of dropping disjoint. (#83821 ) Dropping disjoint from an OR may yield incorrect results, as some analysis may have converted it to an Add implicitly (e.g. SCEV used for dependence analysis). Instead, replace it with an equivalent Add. This is possible as all users of the disjoint OR only access lanes where the operands are disjoint or poison otherwise. Note that replacing all disjoint ORs with ADDs instead of dropping the flags is not strictly necessary. It is only needed for disjoint ORs that SCEV treated as ADDs, but those are not tracked. There are other places that may drop poison-generating flags; those likely need similar treatment. Fixes https://github.com/llvm/llvm-project/issues/81872 PR: https://github.com/llvm/llvm-project/pull/83821	2024-03-19 20:16:18 +01:00
Michele Scandale	09eb9f1136	[InstCombine] Fix for folding `select` into floating point binary operators. (#83200 ) Folding a `select` into a floating point binary operators can only be done if the result is preserved for both case. In particular, if the other operand of the `select` can be a NaN, then the transformation won't preserve the result value.	2024-03-19 09:47:07 -07:00
Kirill Stoimenov	589c7abb03	Revert "[LV] Improve AnyOf reduction codegen. (#78304 )" Broke sanitizer bots: https://lab.llvm.org/buildbot/#/builders/74/builds/26697 This reverts commit 95fef1dfefd5467206e74c089d29806fcd82889b.	2024-03-14 14:57:01 +00:00
Florian Hahn	95fef1dfef	[LV] Improve AnyOf reduction codegen. (#78304 ) Update AnyOf reduction code generation to only keep track of the AnyOf property in a boolean vector in the loop, only selecting either the new or start value in the middle block. The patch incorporates feedback from https://reviews.llvm.org/D153697. This fixes the #62565, as now there aren't multiple uses of the start/new values. Fixes https://github.com/llvm/llvm-project/issues/62565 PR: https://github.com/llvm/llvm-project/pull/78304	2024-03-14 11:22:06 +00:00
Noah Goldstein	d80d5b923c	[InstCombine] Canonicalize `(sitofp x)` -> `(uitofp x)` if `x >= 0` Just a standard canonicalization. Proofs: https://alive2.llvm.org/ce/z/9W4VFm Closes #82404	2024-03-13 18:26:21 -05:00
Florian Hahn	1402c016ff	[VPlan] Use VPBuilder to create BranchOnCond in VPHCFGBuilder. This simplifies the code to create the recipe slightly as well as properly retaining the debug location of the input IR.	2024-03-13 14:30:09 +00:00
annamthomas	866ac9a165	[LV] Address postcommit review for PR84782 (#84797 ) This testcase was added to show miscompile in https://github.com/llvm/llvm-project/issues/81872	2024-03-11 13:23:00 -04:00
annamthomas	34acdb3ec2	Precommit testcase for pr81872 (#84782 ) Testcase shows miscompile when dropping disjoint flag from disjoint or during vectorization.	2024-03-11 12:16:52 -04:00
Cameron McInally	416debf79b	[test] Move pr73894.ll to AArch64 directory and update the target triple (#84269 ) pr73894.ll is failing on a number of non-AArch64 buildbots. I'm not certain that this is a proper fix, but I think it's best to move the test to the test/Transforms/LoopVectorize/AArch64/ directory and replace the triple with one commonly used in that directory. llvm#73894	2024-03-06 21:25:28 -05:00
Cameron McInally	012d217174	[LV] Use scalar CMP for active-lane-mask with scalar VF (#83902 ) Instead of generating a <1 x i1> active lane mask intrinsic, generate the equivalent scalar ICMP instead. This allows us to avoid unnecessarily extracting the scalar part from the vector mask. Fixes llvm#73894.	2024-03-06 15:59:35 -05:00
Niwin Anto	eaf0d82529	[LV] Disable fold tail by masking when IV is used outside (#81609 ) When induction variable are used outside the loop body, tail folding by masking mis-compiles, because for users outside of the loop the final value of the induction is computed separately from the vector loop. Fixes https://github.com/llvm/llvm-project/issues/76069 Fixes https://github.com/llvm/llvm-project/issues/51677	2024-03-04 11:33:30 +00:00
Shih-Po Hung	6ee9c8afbc	[RISCV][CostModel] Updates reduction and shuffle cost (#77342 ) - Make `andi` cost 1 in SK_Broadcast - Query the cost of VID_V, VRSUB_VX/VRSUB_VI which would scale with LMUL	2024-02-29 15:41:19 +08:00
Nilanjana Basu	1c211bc76e	[LV] Remove unused configuration option (#82955 ) Recent set of changes (PR #67725) in loop interleaving algorithm caused removal of the loop trip count threshold for allowing interleaving. Therefore configuration option interleave-small-loop-scalar-reduction is no longer needed.	2024-02-28 10:17:25 -08:00
Niwin Anto	ce0687e2df	[LV] Add test for tail fold by masking with external IV users. (#82329 ) Test case for https://github.com/llvm/llvm-project/issues/76069	2024-02-28 13:46:00 +00:00
Florian Hahn	15d9d0fa8f	[VPlan] Also print final VPlan directly before codegen/execute. (#82269 ) Some optimizations are apply after UF and VF have been chosen. This patch adds an extra print of the final VPlan just before codegen/execution. In the future, there will be additional transforms that are applied later (interleaving for example). PR: https://github.com/llvm/llvm-project/pull/82269	2024-02-28 13:19:43 +00:00
Florian Hahn	e421c12e47	[VPlan] Remove left-over CHECK-NOT line. This removes a CHECK-NOT: vector.body line from the test which seems to imply the test does not get vectorized, but it does now. This line was left over from when the test was pre-committed, remove it.	2024-02-27 09:38:40 +00:00
Florian Hahn	911055e34f	[VPlan] Consistently use (Part, 0) for first lane scalar values (#80271 ) At the moment, some VPInstructions create only a single scalar value, but use VPTransformatState's 'vector' storage for this value. Those values are effectively uniform-per-VF (or in some cases uniform-across-VF-and-UF). Using the vector/per-part storage doesn't interact well with other recipes, that more accurately using (Part, Lane) to look up scalar values and prevents VPInstructions creating scalars from interacting with other recipes working with scalars. This PR tries to unify handling of scalars by using (Part, 0) for scalar values where only the first lane is demanded. This allows using VPInstructions with other recipes like VPScalarCastRecipe and is also needed when using VPInstructions in more cases otuside the vector loop region to generate scalars. Depends on https://github.com/llvm/llvm-project/pull/80269	2024-02-26 19:06:43 +00:00
Benjamin Kramer	e7c60915e6	Remove duplicated REQUIRES: asserts	2024-02-23 12:01:30 +01:00
Ramkumar Ramachandra	f5c8e9e531	LoopVectorize/test: guard pr72969 with asserts (#82653 ) Follow up on 695a9d8 (LoopVectorize: add test for crash in #72969) to guard pr72969.ll with REQUIRES: asserts, in order to be reasonably confident that it will crash reliably.	2024-02-22 19:55:18 +00:00
Benjamin Kramer	3168af56bc	LoopVectorize: Mark crash test as requiring assertions	2024-02-22 20:25:58 +01:00
Philip Reames	f67ef1a8d9	[RISCV][LV] Add additional small trip count loop coverage	2024-02-22 08:30:25 -08:00
Philip Reames	9eb5f94f9b	[RISCV][AArch64] Add vscale_range attribute to tests per architecture minimums Spent a bunch of time tracing down an odd issue "in SCEV" which turned out to be the fact that SCEV doesn't have access to TTI. As a result, the only way for it to get range facts on vscales (to avoid collapsing ranges of element counts and type sizes to trivial ranges on multiplies) is to look at the vscale_range attribute. Since vscale_range is set by clang by default, manually setting it in the tests shouldn't interfere with the test intent.	2024-02-22 08:11:24 -08:00
Ramkumar Ramachandra	695a9d84dc	LoopVectorize: add test for crash in #72969 (#74111 )	2024-02-22 16:00:33 +00:00
Florian Hahn	9923d29cfa	[VPlan] Merge main VPlan verifer with HCFG verifier. Unify VPlan verifiers in verifyVPlanIsValid. This adds verification for various properties on blocks to the verifier used for VPlans generated by the inner loop vectorizer. It also adds def-use checks for the verifier used in the VPlan native path. This drops the separate flag to enable HCFG verification. Instead, all VPlans are verified once they have been created, if assertions are enabled. This also removes VPWidenPHIRecipe from VPHeaderPHIRecipe; it is used to model any phi node in the native path.	2024-02-20 16:43:57 +00:00
Florian Hahn	0dacba3ad1	[VPlan] Handle truncating ICMPs in truncateToMinimalBWs. Update truncateToMinimalBitwidths to handle truncating ICMPs. For ICMPs, the new target type will be the same as the original type. In that case, only truncate the operands, but skip the extend. This is in line with what the original truncateToMinimalBitwidths did for compares. Fixes https://github.com/llvm/llvm-project/issues/81415.	2024-02-16 12:58:56 +00:00
Rohit Aggarwal	36adfec155	Adding support of AMDLIBM vector library (#78560 ) Hi, AMD has it's own implementation of vector calls. This patch include the changes to enable the use of AMD's math library using -fveclib=AMDLIBM. Please refer https://github.com/amd/aocl-libm-ose --------- Co-authored-by: Rohit Aggarwal <Rohit.Aggarwal@amd.com>	2024-02-15 12:13:07 +05:30
David Sherwood	1c10821022	[LoopVectorize] Fix divide-by-zero bug (#80836 ) (#81721 ) When attempting to use the estimated trip count to refine the costs of the runtime memory checks we should also check for sane trip counts to prevent divide-by-zero faults on some platforms. Fixes #80836	2024-02-14 16:07:51 +00:00
Fangrui Song	3d18c8cd26	[test] Replace aarch64-*-{eabi,gnueabi}{,hf} with aarch64 Similar to d39b4ce3ce8a3c256e01bdec2b140777a332a633 Using "eabi" or "gnueabi" for aarch64 targets is a common mistake and warned by Clang Driver. We want to avoid them elsewhere as well. Just use the common "aarch64" without other triple components.	2024-02-12 18:29:55 -08:00
Nikita Popov	7c0d52ca91	[ValueTracking] Support dominating known bits condition in and/or (#74728 ) This extends computeKnownBits() support for dominating conditions to also handle and/or conditions. We'll look through either and or or depending on which edge we're considering. This change is mainly for the sake of completeness, so we don't start missing optimizations if SimplifyCFG decides to merge some branches.	2024-02-08 09:47:49 +01:00
Philip Reames	1aafe7605b	[test] Regen a test for naming changes	2024-02-06 18:06:24 -08:00
Philip Reames	c5bf1f4b8f	[test] Autogen a test for ease of update in forthcoming patch	2024-02-06 17:59:54 -08:00

1 2 3 4 5 ...

2388 Commits