llvm-project

Author	SHA1	Message	Date
Florian Hahn	c836983671	[VPlan] Remove unused first mask op from VPBlendRecipe. (#87770 ) VPBlendRecipe does not use the first mask operand. Removing it allows VPlan-based DCE to remove unused mask computations. This also fixes #87410, where unused Not VPInstructions are considered having only their first lane demanded, but some of their operands providing a vector value due to other users. Fixes https://github.com/llvm/llvm-project/issues/87410 PR: https://github.com/llvm/llvm-project/pull/87770	2024-04-09 11:14:05 +01:00
Florian Hahn	233c030dcb	[LV] Add extra tests for induction cost modeling.	2024-04-06 12:36:07 +01:00
Alexey Bataev	413a66f339	[LV, VP]VP intrinsics support for the Loop Vectorizer + adding new tail-folding mode using EVL. (#76172 ) This patch introduces generating VP intrinsics in the Loop Vectorizer. Currently the Loop Vectorizer supports vector predication in a very limited capacity via tail-folding and masked load/store/gather/scatter intrinsics. However, this does not let architectures with active vector length predication support take advantage of their capabilities. Architectures with general masked predication support also can only take advantage of predication on memory operations. By having a way for the Loop Vectorizer to generate Vector Predication intrinsics, which (will) provide a target-independent way to model predicated vector instructions. These architectures can make better use of their predication capabilities. Our first approach (implemented in this patch) builds on top of the existing tail-folding mechanism in the LV (just adds a new tail-folding mode using EVL), but instead of generating masked intrinsics for memory operations it generates VP intrinsics for loads/stores instructions. The patch adds a new VPlanTransforms to replace the wide header predicate compare with EVL and updates codegen for load/stores to use VP store/load with EVL. Other important part of this approach is how the Explicit Vector Length is computed. (VP intrinsics define this vector length parameter as Explicit Vector Length (EVL)). We use an experimental intrinsic `get_vector_length`, that can be lowered to architecture specific instruction(s) to compute EVL. Also, added a new recipe to emit instructions for computing EVL. Using VPlan in this way will eventually help build and compare VPlans corresponding to different strategies and alternatives. Differential Revision: https://reviews.llvm.org/D99750	2024-04-04 18:30:17 -04:00
Florian Hahn	6ef829941b	Recommit "[VPlan] Replace disjoint or with add instead of dropping disjoint. (#83821 )" Recommit with a fix for the use-after-free causing the revert. This reverts the revert commit f872043e055f4163c3c4b1b86ca0354490174987. Original commit message: Dropping disjoint from an OR may yield incorrect results, as some analysis may have converted it to an Add implicitly (e.g. SCEV used for dependence analysis). Instead, replace it with an equivalent Add. This is possible as all users of the disjoint OR only access lanes where the operands are disjoint or poison otherwise. Note that replacing all disjoint ORs with ADDs instead of dropping the flags is not strictly necessary. It is only needed for disjoint ORs that SCEV treated as ADDs, but those are not tracked. There are other places that may drop poison-generating flags; those likely need similar treatment. Fixes https://github.com/llvm/llvm-project/issues/81872 PR: https://github.com/llvm/llvm-project/pull/83821	2024-03-27 19:11:18 +00:00
Florian Hahn	06bb8c9f20	[VPlan] Explicitly handle scalar pointer inductions. (#83068 ) Add a new PtrAdd opcode to VPInstruction that corresponds to IRBuilder::CreatePtrAdd, which creates a GEP with source element type i8. This is then used to model scalarizing VPWidenPointerInductionRecipe by introducing scalar-steps to model the index increment followed by a PtrAdd. Note that PtrAdd needs to be able to generate code for only the first lane or for all lanes. This may warrant introducing a separate recipe for scalarizing that can be created without relying on the underlying IR. Depends on https://github.com/llvm/llvm-project/pull/80271 PR: https://github.com/llvm/llvm-project/pull/83068	2024-03-26 16:01:57 +01:00
Benjamin Kramer	f872043e05	Revert "[VPlan] Replace disjoint or with add instead of dropping disjoint. (#83821 )" This reverts commit c2c1e6ee4ce0df3d000ba880fa6cf58441da6462. It creates a use after free. ==8342==ERROR: AddressSanitizer: heap-use-after-free on address 0x50f000001760 at pc 0x55b9fb84a8fb bp 0x7ffc18468a10 sp 0x7ffc18468a08 READ of size 1 at 0x50f000001760 thread T0 #0 0x55b9fb84a8fa in dropPoisonGeneratingFlags llvm/lib/Transforms/Vectorize/VPlan.h:1040:13 #1 0x55b9fb84a8fa in llvm::VPlanTransforms::dropPoisonGeneratingRecipes(llvm::VPlan&, llvm::function_ref<bool (llvm::BasicBlock)>)::$_0::operator()(llvm::VPRecipeBase) const llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp:1236:23 #2 0x55b9fb84a196 in llvm::VPlanTransforms::dropPoisonGeneratingRecipes(llvm::VPlan&, llvm::function_ref<bool (llvm::BasicBlock*)>) llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp Can be reproduced with asan on Transforms/LoopVectorize/AArch64/sve-interleaved-masked-accesses.ll Transforms/LoopVectorize/X86/pr81872.ll Transforms/LoopVectorize/X86/x86-interleaved-accesses-masked-group.ll	2024-03-20 15:14:58 +01:00
Noah Goldstein	6960ace534	Revert "[InstCombine] Canonicalize `(sitofp x)` -> `(uitofp x)` if `x >= 0`" This reverts commit d80d5b923c6f611590a12543bdb33e0c16044d44. It wasn't a particularly important transform to begin with and caused some codegen regressions on targets that prefer `sitofp` so dropping. Might re-visit along with adding `nneg` flag to `uitofp` so its easily reversable for the backend.	2024-03-20 00:50:45 -05:00
Florian Hahn	c2c1e6ee4c	[VPlan] Replace disjoint or with add instead of dropping disjoint. (#83821 ) Dropping disjoint from an OR may yield incorrect results, as some analysis may have converted it to an Add implicitly (e.g. SCEV used for dependence analysis). Instead, replace it with an equivalent Add. This is possible as all users of the disjoint OR only access lanes where the operands are disjoint or poison otherwise. Note that replacing all disjoint ORs with ADDs instead of dropping the flags is not strictly necessary. It is only needed for disjoint ORs that SCEV treated as ADDs, but those are not tracked. There are other places that may drop poison-generating flags; those likely need similar treatment. Fixes https://github.com/llvm/llvm-project/issues/81872 PR: https://github.com/llvm/llvm-project/pull/83821	2024-03-19 20:16:18 +01:00
Noah Goldstein	d80d5b923c	[InstCombine] Canonicalize `(sitofp x)` -> `(uitofp x)` if `x >= 0` Just a standard canonicalization. Proofs: https://alive2.llvm.org/ce/z/9W4VFm Closes #82404	2024-03-13 18:26:21 -05:00
annamthomas	866ac9a165	[LV] Address postcommit review for PR84782 (#84797 ) This testcase was added to show miscompile in https://github.com/llvm/llvm-project/issues/81872	2024-03-11 13:23:00 -04:00
annamthomas	34acdb3ec2	Precommit testcase for pr81872 (#84782 ) Testcase shows miscompile when dropping disjoint flag from disjoint or during vectorization.	2024-03-11 12:16:52 -04:00
Florian Hahn	911055e34f	[VPlan] Consistently use (Part, 0) for first lane scalar values (#80271 ) At the moment, some VPInstructions create only a single scalar value, but use VPTransformatState's 'vector' storage for this value. Those values are effectively uniform-per-VF (or in some cases uniform-across-VF-and-UF). Using the vector/per-part storage doesn't interact well with other recipes, that more accurately using (Part, Lane) to look up scalar values and prevents VPInstructions creating scalars from interacting with other recipes working with scalars. This PR tries to unify handling of scalars by using (Part, 0) for scalar values where only the first lane is demanded. This allows using VPInstructions with other recipes like VPScalarCastRecipe and is also needed when using VPInstructions in more cases otuside the vector loop region to generate scalars. Depends on https://github.com/llvm/llvm-project/pull/80269	2024-02-26 19:06:43 +00:00
Benjamin Kramer	e7c60915e6	Remove duplicated REQUIRES: asserts	2024-02-23 12:01:30 +01:00
Ramkumar Ramachandra	f5c8e9e531	LoopVectorize/test: guard pr72969 with asserts (#82653 ) Follow up on 695a9d8 (LoopVectorize: add test for crash in #72969) to guard pr72969.ll with REQUIRES: asserts, in order to be reasonably confident that it will crash reliably.	2024-02-22 19:55:18 +00:00
Benjamin Kramer	3168af56bc	LoopVectorize: Mark crash test as requiring assertions	2024-02-22 20:25:58 +01:00
Ramkumar Ramachandra	695a9d84dc	LoopVectorize: add test for crash in #72969 (#74111 )	2024-02-22 16:00:33 +00:00
Rohit Aggarwal	36adfec155	Adding support of AMDLIBM vector library (#78560 ) Hi, AMD has it's own implementation of vector calls. This patch include the changes to enable the use of AMD's math library using -fveclib=AMDLIBM. Please refer https://github.com/amd/aocl-libm-ose --------- Co-authored-by: Rohit Aggarwal <Rohit.Aggarwal@amd.com>	2024-02-15 12:13:07 +05:30
Nilanjana Basu	c1c5b854ad	[LV] Remove loop trip count threshold for deciding whether to interleave a loop (#67725 ) A set of microbenchmarks (https://github.com/llvm/llvm-test-suite/pull/26) showed that loop interleaving can be beneficial for loops with low trip count as well. Loop interleaving count computation is updated accordingly in prior patches while this patch removes the loop trip count threshold for interleaving.	2024-02-05 17:23:58 -08:00
Florian Hahn	9536a6286e	[VPlan] Preserve original induction order when creating scalar steps. Update createScalarIVSteps to take an insert point as parameter. This ensures that the inserted scalar steps are in the same order as the recipes they replace (vs in reverse order as currently). This helps to reduce the diff for follow-up changes.	2024-01-31 13:31:28 +00:00
David Spickett	4a91206359	[llvm][LV] Move new test into X86 subfolder Added in a04f6152914ea21f3068aaba9d8fc21d2e703d3e. Failing on our Arm only bots: https://lab.llvm.org/buildbot/#/builders/245/builds/19684	2024-01-25 17:04:34 +00:00
Nikita Popov	90ba33099c	[InstCombine] Canonicalize constant GEPs to i8 source element type (#68882 ) This patch canonicalizes getelementptr instructions with constant indices to use the `i8` source element type. This makes it easier for optimizations to recognize that two GEPs are identical, because they don't need to see past many different ways to express the same offset. This is a first step towards https://discourse.llvm.org/t/rfc-replacing-getelementptr-with-ptradd/68699. This is limited to constant GEPs only for now, as they have a clear canonical form, while we're not yet sure how exactly to deal with variable indices. The test llvm/test/Transforms/PhaseOrdering/switch_with_geps.ll gives two representative examples of the kind of optimization improvement we expect from this change. In the first test SimplifyCFG can now realize that all switch branches are actually the same. In the second test it can convert it into simple arithmetic. These are representative of common optimization failures we see in Rust. Fixes https://github.com/llvm/llvm-project/issues/69841.	2024-01-24 15:25:29 +01:00
Florian Hahn	51afb10174	[LV] Create block in mask up-front if needed. (#76635 ) At the moment, block and edge masks are created on demand, which means that they are inserted at the point where they are demanded and then cached. It is possible that the mask for a block is looked up later at a point that's not dominated by the point where the mask has been inserted. To avoid this, create masks up front on entry to the corresponding basic block and leave it to VPlan simplification to remove unneeded masks. Note that we need to create masks for all blocks, if any of the blocks in the loop needs predication, as computing the mask of a block depends on the masks of its predecessor. Needed for #76090. https://github.com/llvm/llvm-project/pull/76635	2024-01-09 10:50:08 +00:00
Florian Hahn	18ec3304a9	[VPlan] Manage InBounds via VPRecipeWithIRFlags for VectorPtrRecipe. As suggested as follow-up in https://github.com/llvm/llvm-project/pull/72164, manage inbounds via VPRecipeWithIRFlags. Note that in some cases we can now preserve inbounds in a few more cases.	2024-01-07 13:58:05 +00:00
Florian Hahn	249d2ccb1d	[LV] Add test showing overly aggressive dropping of inbounds. As %B.gep.0 executes unconditionally in the latch, inbounds could be preserved in the vector version. https://alive2.llvm.org/ce/z/XWbMuD	2024-01-07 13:55:32 +00:00
Yingwei Zheng	6681650025	[InstCombine] Revert the `signed icmp -> unsigned icmp` canonicalization when folding `icmp Pred min\|max(X, Y), Z` (#76685 ) This patch tries to flip the signedness of predicates when folding an unsigned icmp with a signed min/max. It will enable more optimizations as we canonicalizes a signed icmp into an unsigned icmp when both operands are known to have the same sign. Fixes #76672. Compile-time impact: http://llvm-compile-time-tracker.com/compare.php?from=949ec83eaf6fa6dbffb94c2ea9c0a4d5efdbd239&to=2deca1aea8a4e13609bab72c522a97d424f0fc2d&stat=instructions:u \|stage1-O3\|stage1-ReleaseThinLTO\|stage1-ReleaseLTO-g\|stage1-O0-g\|stage2-O3\|stage2-O0-g\|stage2-clang\| \|--\|--\|--\|--\|--\|--\|--\| \|-0.00%\|+0.01%\|+0.05%\|-0.12%\|-0.01%\|-0.03%\|-0.00%\| NOTE: We can flip the signedness of predicate if both operands are negative. But I don't see the benefit of handling these cases.	2024-01-05 14:39:16 +08:00
Nilanjana Basu	cd28da390f	[LV] Change loops' interleave count computation (#73766 ) [LV] Change loops' interleave count computation A set of microbenchmarks in llvm-test-suite (https://github.com/llvm/llvm-test-suite/pull/56), when tested on a AArch64 platform, demonstrates that loop interleaving is beneficial when the vector loop runs at least twice or when the epilogue loop trip count (TC) is minimal. Therefore, we choose interleaving count (IC) between TC/VF & TC/2*VF (VF = vectorization factor), such that remainder TC for the epilogue loop is minimum while the IC is maximum in case the remainder TC is same for both. The initial tests for this change were submitted in PRs: https://github.com/llvm/llvm-project/pull/70272 and https://github.com/llvm/llvm-project/pull/74689.	2024-01-04 12:45:22 +05:30
Florian Hahn	f18536d642	[VPlan] Model address separately. (#72164 ) Move vector pointer generation to a separate VPVectorPointerRecipe. This untangles address computation from the memory recipes future and is also needed to enable explicit unrolling in VPlan. https://github.com/llvm/llvm-project/pull/72164	2024-01-01 19:51:15 +00:00
Nikita Popov	d77067d08a	[ValueTracking] Add dominating condition support in computeKnownBits() (#73662 ) This adds support for using dominating conditions in computeKnownBits() when called from InstCombine. The implementation uses a DomConditionCache, which stores which branches may provide information that is relevant for a given value. DomConditionCache is similar to AssumptionCache, but does not try to do any kind of automatic tracking. Relevant branches have to be explicitly registered and invalidated values explicitly removed. The necessary tracking is done inside InstCombine. The reason why this doesn't just do exactly the same thing as AssumptionCache is that a lot more transforms touch branches and branch conditions than assumptions. AssumptionCache is an immutable analysis and mostly gets away with this because only a handful of places have to register additional assumptions (mostly as a result of cloning). This is very much not the case for branches. This change regresses compile-time by about ~0.2%. It also improves stage2-O0-g builds by about ~0.2%, which indicates that this change results in additional optimizations inside clang itself. Fixes https://github.com/llvm/llvm-project/issues/74242.	2023-12-06 14:17:18 +01:00
Florian Hahn	bbd1941a38	[VPlan] Add disjoint flag to VPRecipeWithIRFlags. (#74364 ) A new disjoint flag was added for OR instructions in #72583. Update VPRecipeWithIRFlags to also support the new flag. This allows printing and preserving the disjoint flag in vectorized code.	2023-12-05 15:21:59 +00:00
Alexey Bataev	056367bb19	[LV]Support dropping of nneg flag for zext widencast recipes. (#74112 ) Compiler crashes when the assertion triggered for zext nneg instruction, that checks that the instruction cannot produce poison. Changed the base class for widencast recipe to handle dropping nneg flag to avoid compiler crash.	2023-12-05 09:17:23 -05:00
Nikita Popov	eecb99c5f6	[Tests] Add disjoint flag to some tests (NFC) These tests rely on SCEV looking recognizing an "or" with no common bits as an "add". Add the disjoint flag to relevant or instructions in preparation for switching SCEV to use the flag instead of the ValueTracking query. The IR with disjoint flag matches what InstCombine would produce.	2023-12-05 14:09:36 +01:00
Craig Topper	7ec4f6094e	[InstCombine] Infer disjoint flag on Or instructions. (#72912 ) The disjoint flag was recently added to IR in #72583 We already set it when we turn an add into an or. This patch sets it on Ors that weren't converted from an Add.	2023-12-02 14:11:12 -08:00
Craig Topper	03d4a9d94d	[InstCombine] Set disjoint flag when turning Add into Or. (#72702 ) The disjoint flag was recently added to IR in #72583	2023-11-27 12:54:11 -08:00
Nilanjana Basu	e2210cefb1	[LV] Pre-committing tests for changing loop interleaving count computation (#70272 ) Added tests for evaluating changes to loop interleaving count computation and for removing loop interleaving threshold in subsequent patches.	2023-11-17 17:38:04 -08:00
Florian Hahn	e5e71affb7	[LV] Reverse mask up front, not when creating vector pointer. (#72163 ) Reverse mask early on when populating BlockInMask. This will enable separating mask management and address computation from the memory recipes in the future and is also needed to enable explicit unrolling in VPlan.	2023-11-17 13:59:35 +00:00
Yingwei Zheng	dc6d077396	[CVP] Infer nneg on existing zext (#72052 ) This patch infers `nneg` flags for existing zext instructions in CVP. After https://github.com/llvm/llvm-project/pull/71534 and this patch, we can drop `zext -> zext nneg` transform in `RISCVCodeGenPrepare`: `40671bbdef/llvm/lib/Target/RISCV/RISCVCodeGenPrepare.cpp (L74-L83)` This is an alternative to #72049.	2023-11-13 22:41:37 +08:00
Florian Hahn	fd82b5b287	[LV] Support recieps without underlying instr in collectPoisonGenRec. Support recipes without underlying instruction in collectPoisonGeneratingRecipes by directly trying to dyn_cast_or_null the underlying value. Fixes https://github.com/llvm/llvm-project/issues/70590.	2023-11-03 10:21:14 +00:00
Matthias Braun	e3cf80c5c1	BlockFrequencyInfoImpl: Avoid big numbers, increase precision for small spreads BlockFrequencyInfo calculates block frequencies as Scaled64 numbers but as a last step converts them to unsigned 64bit integers (`BlockFrequency`). This improves the factors picked for this conversion so that: * Avoid big numbers close to UINT64_MAX to avoid users overflowing/saturating when adding multiply frequencies together or when multiplying with integers. This leaves the topmost 10 bits unused to allow for some room. * Spread the difference between hottest/coldest block as much as possible to increase precision. * If the hot/cold spread cannot be represented loose precision at the lower end, but keep the frequencies at the upper end for hot blocks differentiable.	2023-10-24 20:27:39 -07:00
Nikita Popov	30faaaf626	[LoopVectorize] Regenerate test checks (NFC)	2023-10-12 14:35:23 +02:00
Dhruv Chawla	3e992d81af	[InferAlignment] Enable InferAlignment pass by default This gives an improvement of 0.6%: https://llvm-compile-time-tracker.com/compare.php?from=7d35fe6d08e2b9b786e1c8454cd2391463832167&to=0456c8e8a42be06b62ad4c3e3cf34b21f2633d1e&stat=instructions:u Differential Revision: https://reviews.llvm.org/D158600	2023-09-20 12:08:52 +05:30
Yingwei Zheng	44e5afdb91	[InstCombine] Generalize foldICmpWithMinMax This patch generalizes the fold of `icmp pred min/max(X, Y), Z` to address the issue https://github.com/llvm/llvm-project/issues/62898. For example, we can fold `smin(X, Y) < Z` into `X < Z` when `Y > Z` is implied by constant folds/invariants/dom conditions. Alive2 (with `--disable-undef-input` due to the limitation of --smt-to=10000): https://alive2.llvm.org/ce/z/rB7qLc You can run the standalone translation validation tool `alive-tv` locally to verify these transformations. ``` alive-tv transforms.ll --smt-to=600000 --exit-on-error ``` Reviewed By: goldstein.w.n Differential Revision: https://reviews.llvm.org/D156238	2023-09-11 02:26:48 +08:00
Florian Hahn	96e83d3705	[LV] Use IRBuilder to create and optimize middle-block compare. Split off from D150398 to avoid builder-related diff changes there. Using IRBuilder to create ICmps simplifies the result if both operands are constants. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D158332	2023-08-29 11:42:18 +01:00
Anna Thomas	5dfdf34df0	[LV] Move interleaved test to X86 directory Remove the x86-registered-target under REQUIRES.	2023-08-09 16:03:33 -04:00
Anna Thomas	3cf24dbbdd	[LV] Complete load groups and release store groups. Try 2. This is a complete fix for CompleteLoadGroups introduced in D154309. We need to check for dependency between A and every member of the load Group of B. This patch also fixes another miscompile seen when we incorrectly sink stores below a depending load (see testcase in interleaved-accesses-sink-store-across-load.ll). This is fixed by releasing store groups correctly. This change was previously reverted (e85fd3cbdd68) due to Asan failure with use-after-free error. A testcase is added and the bug is fixed in this version of the patch. Differential Revision: https://reviews.llvm.org/D155520	2023-08-08 18:10:23 -04:00
Florian Hahn	707359ecf5	Recommit "[LV] Re-use existing broadcast value for live-ins." This reverts commit 245ec675a4e41f7ec24dfc998720bffdc46a6c53. Recommits eea9258648ce with a fix to only erase the instruction from the first part if it is defined outside the loop. This fixes a use-after-free error reported.	2023-08-01 15:54:02 +01:00
Nikita Popov	d01aec4c76	[InstCombine] Set dead phi inputs to poison in more cases Set phi inputs to poison whenever we find a dead edge (either during initial worklist population or the main InstCombine run), instead of only doing this for successors of dead blocks. This means that the phi operand is set to poison even if for critical edges without an intermediate block. There are quite a few test changes, because the pattern is fairly common in vectorizer output, for cases where we know the vectorized loop will be entered.	2023-08-01 11:53:47 +02:00
Anna Thomas	e85fd3cbdd	Revert "[LV] Complete load groups and release store groups in presence of dependency" This reverts commit eaf6117f3388615f51198e47c0d6be0252729508 (D155520). There's an ASAN build failure that needs investigation.	2023-07-26 15:07:26 -04:00
Anna Thomas	eaf6117f33	[LV] Complete load groups and release store groups in presence of dependency This is a complete fix for CompleteLoadGroups introduced in D154309. We need to check for dependency between A and every member of the load Group of B. This patch also fixes another miscompile seen when we incorrectly sink stores below a depending load (see testcase in interleaved-accesses-sink-store-across-load.ll). This is fixed by releasing store groups correctly. Differential Revision: https://reviews.llvm.org/D155520	2023-07-25 17:32:09 -04:00
Martin Storsjö	245ec675a4	Revert "[LV] Re-use existing broadcast value for live-ins." This reverts commit eea9258648ce73507f6f85c395de978af659d498. That commit triggered crashes in the following testcase: $ cat reduced.c typedef struct { int a[8] } b; typedef struct { b c; short d } e; void f() { int g; char h; e i = f; short j = i->d; int a = i->c->a[0]; for (;;) for (; g < a; g++) { h = j * i->d >> 8; h++; } } $ clang -target aarch64-linux-gnu -w -c -O2 reduced.c	2023-07-25 10:35:41 +03:00
Florian Hahn	eea9258648	[LV] Re-use existing broadcast value for live-ins. When requesting a vector value for a live-in, we can re-use the broadcast of the live-in of part 0 for parts > 0.	2023-07-24 11:50:47 +01:00

1 2 3 4 5 ...

717 Commits