llvm-project

Author	SHA1	Message	Date
Florian Hahn	89ae085859	[VPlan] Remove VPVectorPointer for part 0 after unrolling. (#149735 ) VPVectorPointer for part 0 is just the pointer operand. Simplify it after unrolling. This removes a large number of redundant GEPs with index 0. PR: https://github.com/llvm/llvm-project/pull/149735	2025-07-27 13:53:26 +01:00
Florian Hahn	fa3ec0c17c	[VPlan] Materialize constant vector trip counts before final opts. (#142309 ) Materialize constant vector trip counts before ::execute, if the trip count can be computed as Original (TC / (VF * UF)) * (VF * UF). For now this excludes when the tail is folded or scalar epilogues are required. This enables removing a number of redundant branches from the middle block. For now this is also only done when not vectorizing the epilogue, as the simplification complicates stitching the 2 plans together. PR: https://github.com/llvm/llvm-project/pull/142309	2025-07-26 17:16:36 +01:00
Florian Hahn	6108d50aed	[VPlan] Add ReductionStartVector VPInstruction. (#142290 ) Add a new VPInstruction::ReductionStartVector opcode to create the start values for wide reductions. This more accurately models the start value creation in VPlan and simplifies VPReductionPHIRecipe::execute. Down the line it also allows removing VPReductionPHIRecipe::RdxDesc. PR: https://github.com/llvm/llvm-project/pull/142290	2025-06-09 20:59:12 +01:00
Florian Hahn	5739a22fbb	[VPlan] Also duplicated scalar-steps when it enables sinking scalars. (#136021 ) Extend sinking logic to duplicate scalar steps recipe if it enables sinking, that is if all users in a destination block require all lanes. This should be the last step before removing legacy sinkScalarOperands. PR: https://github.com/llvm/llvm-project/pull/136021	2025-04-21 18:36:43 +01:00
Björn Pettersson	092b6e73e6	[InstCombine] Handle "add like" in ADD+GEP->GEP+GEP rewrites (#135156 ) Considering that "or disjoint" is the canonical for certain add operations, then I think we want to support such "add like" operations when doing ADD+GEP->GEP+GEP rewrites to make things more consistent. Problem was found when improving ValueTracking, which turned an ADD into OR, and then suddenly optimizations got worse due to these rewrites no longer triggering.	2025-04-14 17:11:13 +02:00
Florian Hahn	2c7d40b2f0	[VPlan] Generalize SCALAR-STEPS removal to any unroll factor. Follow-up to dfca6c0d3bf9d1a056 to extend isUnrolled handle any unrolled VPlan, which means there's a single UF, but it will be > 1 if unrolling took place.	2025-03-26 21:03:50 +00:00
Florian Hahn	dfca6c0d3b	[VPlan] Remove no-op SCALAR-STEPS after unrolling. (#123655 ) After unrolling, there may be additional simplifications that can be applied. One example is removing SCALAR-STEPS for the first part where only the first lane is demanded. This removes redundant adds of 0 from a large number of tests (~200), many which I am still working on updating. In preparation for removing redundant WideIV steps added in https://github.com/llvm/llvm-project/pull/119284. PR: https://github.com/llvm/llvm-project/pull/123655	2025-03-25 12:57:24 +00:00
Florian Hahn	62994c3291	[VPlan] Also introduce explicit broadcasts for values from entry VPBB. Update and generalize materializeBroadcasts to also introduce explicit broadcasts for VPValues defined in the Plans Entry block. This fixes a crash when trying to insert the broadcasts generated by VPTransformState::get after the generating instruction, which isn't possible after invoke instructions. Fixes https://github.com/llvm/llvm-project/issues/128838.	2025-03-12 22:03:19 +00:00
Florian Hahn	1e1b9bccc0	[VPlan] Simplify BLEND %a, %b, NOT(%m) -> BLEND %b, %a, %m. (#128375 ) Avoid negations for normalized blends by reordering operands. PR: https://github.com/llvm/llvm-project/pull/128375	2025-02-27 17:43:24 +00:00
Florian Hahn	4277c21059	[VPlan] Introduce explicit broadcasts for live-ins. (#124644 ) Add a new VPInstruction::Broadcast opcode and use it to materialize explicit broadcasts of live-ins. The initial patch only materlizes the broadcasts if the vector preheader dominates all uses that need it. Later patches will pick the best valid insert point, thus retiring implicit hoisting of broadcasts from VPTransformsState::get(). PR: https://github.com/llvm/llvm-project/pull/124644	2025-02-26 13:57:51 +00:00
Florian Hahn	32c4493d5f	[VPlan] Add incoming values for all predecessor to ResumePHI (NFCI). Follow-up as discussed when using VPInstruction::ResumePhi for all resume values (#112147). This patch explicitly adds incoming values for each predecessor in VPlan. This simplifies codegen and allows transformations adjusting the predecessors of blocks with NFC modulo incoming block order in phis.	2025-02-09 11:20:20 +00:00
Florian Hahn	713482fccf	[VPlan] Use State.get to extract lane mask for BranchOnMask. Simplifies the code slightly and avoids redundant extracts/broadcasts if the operand is live-in or already scalar.	2025-01-27 21:35:36 +00:00
Florian Hahn	1de3dc7d23	[LV] Bail out early if BTC+1 wraps. Currently we fail to detect the case where BTC + 1 wraps, i.e. the vector trip count is 0, In those cases, the minimum iteration count check will fail, and the vector code will never be executed. Explicitly check for this condition in computeMaxVF and avoid trying to vectorize alltogether. Note that a number of tests needed to be updated, because the vector loop would never be executed given the input IR. Fixes https://github.com/llvm/llvm-project/issues/122558.	2025-01-14 22:07:38 +00:00
Luke Lau	f0d5104c94	[VPlan] Handle some VPInstructions in may{Read,Write}FromMemory (#120058 ) This just copies the same conservative definition from mayWriteToMemory, and enables more VPInstructions to be hoisted out in LICM. I think this should give more accurate costs, and I was able to build llvm-test-suite without the legacy-vplan cost model assertion going off.	2025-01-08 15:17:26 +08:00
Florian Hahn	7f3428d3ed	[VPlan] Compute induction end values in VPlan. (#112145 ) Use createDerivedIV to compute IV end values directly in VPlan, instead of creating them up-front. This allows updating IV users outside the loop as follow-up. Depends on https://github.com/llvm/llvm-project/pull/110004 and https://github.com/llvm/llvm-project/pull/109975. PR: https://github.com/llvm/llvm-project/pull/112145	2024-12-29 19:05:08 +00:00
Florian Hahn	4ad0fdd163	[VPlan] Remove reverse() of predecessors from VPInstruction::generate. This was originally done to reduce the diff for the change. Remove it and update the remaining tests. NFC modulo reordering of incoming values. Clean up after https://github.com/llvm/llvm-project/pull/114292.	2024-12-17 20:44:32 +00:00
Florian Hahn	7f7f540a48	Reapply "[VPlan] Update scalar induction resume values in VPlan. (#110577 )" This reverts commit f09b16e2671cbcdf7cb7dc7ed705db092a9deda1. The crash when building llvm-test-suite with stage2 should have been fixed by 1091fad31a83d5ab87eb6fa11fe3bdb3f0d152ea.	2024-12-06 19:41:51 +00:00
Nikita Popov	f09b16e267	Revert "[VPlan] Update scalar induction resume values in VPlan. (#110577 )" This reverts commit 0678e2058364ec10b94560d27ec7138dfa003287. This reverts commit 1091fad31a83d5ab87eb6fa11fe3bdb3f0d152ea. Causes crashes in llvm-test-suite when using stage 2 clang.	2024-12-06 18:01:42 +01:00
Florian Hahn	0678e20583	[VPlan] Update scalar induction resume values in VPlan. (#110577 ) Updated ILV.createInductionResumeValues (now createInductionResumeVPValue) to directly update the VPIRInstructions wrapping the original phis with the created resume values. This is the first step towards modeling them completely in VPlan. Subsequent patches will move creation of the resume values completely into VPlan. Depends on https://github.com/llvm/llvm-project/pull/109975. PR: https://github.com/llvm/llvm-project/pull/110577	2024-12-06 12:26:19 +00:00
Nikita Popov	462cb3cd6c	[InstCombine] Infer nusw + nneg -> nuw for getelementptr (#111144 ) If the gep is nusw (usually via inbounds) and the offset is non-negative, we can infer nuw. Proof: https://alive2.llvm.org/ce/z/ihztLy	2024-12-05 14:36:40 +01:00
Paul Walker	38fffa630e	[LLVM][IR] Use splat syntax when printing Constant[Data]Vector. (#112548 )	2024-11-06 11:53:33 +00:00
Florian Hahn	b021464d35	[VPlan] Introduce scalar loop header in plan, remove VPLiveOut. (#109975 ) Update VPlan to include the scalar loop header. This allows retiring VPLiveOut, as the remaining live-outs can now be handled by adding operands to the wrapped phis in the scalar loop header. Note that the current version only includes the scalar loop header, no other loop blocks and also does not wrap it in a region block. PR: https://github.com/llvm/llvm-project/pull/109975	2024-10-31 21:36:44 +01:00
Nikita Popov	9f3d1695eb	[SCEVExpander] Preserve gep nuw during expansion (#102133 ) When expanding SCEV adds to geps, transfer the nuw flag to the resulting gep. (Note that this doesn't apply to IV increment GEPs, which go through a different code path.)	2024-10-02 11:45:00 +02:00
Florian Hahn	53266f73f0	[VPlan] Run DCE after unrolling. This cleans up a number of dead recipes after unrolling if only their first or last parts are used. This simplifies a number of tests. Fixes https://github.com/llvm/llvm-project/issues/109581.	2024-09-22 22:08:46 +01:00
Florian Hahn	8ec406757c	[VPlan] Implement unrolling as VPlan-to-VPlan transform. (#95842 ) This patch implements explicit unrolling by UF as VPlan transform. In follow up patches this will allow simplifying VPTransform state (no need to store unrolled parts) as well as recipe execution (no need to generate code for multiple parts in an each recipe). It also allows for more general optimziations (e.g. avoid generating code for recipes that are uniform-across parts). It also unifies the logic dealing with unrolled parts in a single place, rather than spreading it out across multiple places (e.g. VPlan post processing for header-phi recipes previously.) In the initial implementation, a number of recipes still take the unrolled part as additional, optional argument, if their execution depends on the unrolled part. The computation for start/step values for scalable inductions changed slightly. Previously the step would be computed as scalar and then splatted, now vscale gets splatted and multiplied by the step in a vector mul. This has been split off https://github.com/llvm/llvm-project/pull/94339 which also includes changes to simplify VPTransfomState and recipes' ::execute. The current version mostly leaves existing ::execute untouched and instead sets VPTransfomState::UF to 1. A follow-up patch will clean up all references to VPTransformState::UF. Another follow-up patch will simplify VPTransformState to only store a single vector value per VPValue. PR: https://github.com/llvm/llvm-project/pull/95842	2024-09-21 19:47:37 +01:00
Philip Reames	2c7786e94a	Prefer use of 0.0 over -0.0 for fadd reductions w/nsz (in IR) (#106770 ) This is a follow up to 924907bc6, and is mostly motivated by consistency but does include one additional optimization. In general, we prefer 0.0 over -0.0 as the identity value for an fadd. We use that value in several places, but don't in others. So, let's be consistent and use the same identity (when nsz allows) everywhere. This creates a bunch of test churn, but due to 924907bc6, most of that churn doesn't actually indicate a change in codegen. The exception is that this change enables the use of 0.0 for nsz, but not reasoc, fadd reductions. Or said differently, it allows the neutral value of an ordered fadd reduction to be 0.0.	2024-09-03 09:16:37 -07:00
Nikita Popov	f044564db1	[InstCombine] Make backedge check in op of phi transform more precise (#106075 ) The op of phi transform wants to prevent moving an operation across a backedge, as this may lead to an infinite combine loop. Currently, this is done using isPotentiallyReachable(). The problem with that is that all blocks inside a loop are reachable from each other. This means that the op of phi transform is effectively completely disabled for code inside loops, even when it's not actually operating on a loop phi (just a phi that happens to be in a loop). Fix this by explicitly computing the backedges inside the function instead. Do this via RPOT, which is a bit more efficient than using FindFunctionBackedges() (which does it without any pre-computed analyses). For irreducible cycles, there may be multiple possible choices of backedge, and this just picks one of them. This is still sufficient to prevent combine loops. This also removes the last use of LoopInfo in InstCombine -- I'll drop the analysis in a followup.	2024-09-02 09:09:21 +02:00
Nikita Popov	a105877646	[InstCombine] Remove some of the complexity-based canonicalization (#91185 ) The idea behind this canonicalization is that it allows us to handle less patterns, because we know that some will be canonicalized away. This is indeed very useful to e.g. know that constants are always on the right. However, this is only useful if the canonicalization is actually reliable. This is the case for constants, but not for arguments: Moving these to the right makes it look like the "more complex" expression is guaranteed to be on the left, but this is not actually the case in practice. It fails as soon as you replace the argument with another instruction. The end result is that it looks like things correctly work in tests, while they actually don't. We use the "thwart complexity-based canonicalization" trick to handle this in tests, but it's often a challenge for new contributors to get this right, and based on the regressions this PR originally exposed, we clearly don't get this right in many cases. For this reason, I think that it's better to remove this complexity canonicalization. It will make it much easier to write tests for commuted cases and make sure that they are handled.	2024-08-21 12:02:54 +02:00
Nikita Popov	c3c2370c9a	[Tests] Regenerate test checks (NFC)	2024-08-06 12:59:55 +02:00
Florian Hahn	9a5a8731e7	[VPlan] Introduce ResumePhi VPInstruction, use to create phi for FOR. (#94760 ) This patch introduces a new ResumePhi VPInstruction which creates a phi in a leaf block of a VPlan. The first use is to create the phi node for fixed-order recurrence resume values in the scalar preheader. The VPInstruction takes 2 operands: 1) the incoming value from the middle-block and a default value to be used for all other incoming blocks. In follow-up changes, it will also be used to create phis for reduction and induction resume values. Depends on https://github.com/llvm/llvm-project/pull/92651 PR: https://github.com/llvm/llvm-project/pull/94760	2024-07-11 16:08:04 +01:00
Florian Hahn	99d6c6d936	[VPlan] Model branch cond to enter scalar epilogue in VPlan. (#92651 ) This patch moves branch condition creation to enter the scalar epilogue loop to VPlan. Modeling the branch in the middle block also requires modeling the successor blocks. This is done using the recently introduced VPIRBasicBlock. Note that the middle.block is still created as part of the skeleton and then patched in during VPlan execution. Unfortunately the skeleton needs to create the middle.block early on, as it is also used for induction resume value creation and is also needed to properly update the dominator tree during skeleton creation. After this patch lands, I plan to move induction resume value and phi node creation in the scalar preheader to VPlan. Once that is done, we should be able to create the middle.block in VPlan directly. This is a re-worked version based on the earlier https://reviews.llvm.org/D150398 and the main change is the use of VPIRBasicBlock. Depends on https://github.com/llvm/llvm-project/pull/92525 PR: https://github.com/llvm/llvm-project/pull/92651	2024-07-05 10:08:42 +01:00
Noah Goldstein	2632680006	[InstCombine] Canonicalize `(gep <not i8> p, (div exact X, C))` If C % sizeof(gep_element_type) is zero, we can canonicalize to `i8` via: `(gep i8 p, (div exact X, C / (sizeof(gep_element_type))))` Closes #96898	2024-07-01 22:22:35 +08:00
David Green	352a836176	[InstCombine] Canonicalize non-i8 gep of mul to i8 (#96606 ) This is a small canonicalization for `gep i32, p, (mul x, C)` -> `gep i8, p, (mul x, C*4)`, so that the mul can combine both of the constant multiplications, and we take a small step towards canonicalizing more geps to i8. It currently doesn't attempt to check for multiple uses on the mul, but that should be possible if it sounds better. Let me know what you think of the idea in general.	2024-06-26 14:25:54 +01:00
Florian Hahn	3808ba78de	[VPlan] Model middle block via VPIRBasicBlock. (#95816 ) Use VPIRBasicBlock to wrap the middle block and implement patching up branches in predecessors in VPIRBasicBlock::execute. The IR middle block is only created after skeleton creation. Initially a regular VPBasicBlock is created, which will later be replaced by a VPIRBasicBlock once the middle IR basic block has been created. Note that this slightly changes the order of instructions created in the middle block; code generated by recipe execution in the middle block will now be inserted before the terminator (and in between the compare to used by the terminator). The original order will be restored in https://github.com/llvm/llvm-project/pull/92651. PR: https://github.com/llvm/llvm-project/pull/95816	2024-06-20 13:42:20 +01:00
Florian Hahn	05e1b5340b	[VPlan] Model FOR resume value extraction in VPlan. (#93396 ) This patch uses the ExtractFromEnd VPInstruction opcode to extract the value of a FOR to be used as resume value for the ph in the scalar loop. It adds a new live-out that temporarily wraps the FOR phi in the scalar loop. fixFixedOrderRecurrence will process live outs for fixed order recurrence phis by creating a new phi node in the scalar preheader, using the generated value for the live-out as incoming value from the middle block and the original start value as incoming value for the other edge. Creation of the phi in the preheader, as well as updating the phi in the scalar loop will also be moved to VPlan in the future, eventually retiring fixFixedOrderRecurrence Depends on https://github.com/llvm/llvm-project/pull/93395 PR: https://github.com/llvm/llvm-project/pull/93396	2024-06-05 11:18:06 +01:00
Nikita Popov	7c0d52ca91	[ValueTracking] Support dominating known bits condition in and/or (#74728 ) This extends computeKnownBits() support for dominating conditions to also handle and/or conditions. We'll look through either and or or depending on which edge we're considering. This change is mainly for the sake of completeness, so we don't start missing optimizations if SimplifyCFG decides to merge some branches.	2024-02-08 09:47:49 +01:00
Nikita Popov	90ba33099c	[InstCombine] Canonicalize constant GEPs to i8 source element type (#68882 ) This patch canonicalizes getelementptr instructions with constant indices to use the `i8` source element type. This makes it easier for optimizations to recognize that two GEPs are identical, because they don't need to see past many different ways to express the same offset. This is a first step towards https://discourse.llvm.org/t/rfc-replacing-getelementptr-with-ptradd/68699. This is limited to constant GEPs only for now, as they have a clear canonical form, while we're not yet sure how exactly to deal with variable indices. The test llvm/test/Transforms/PhaseOrdering/switch_with_geps.ll gives two representative examples of the kind of optimization improvement we expect from this change. In the first test SimplifyCFG can now realize that all switch branches are actually the same. In the second test it can convert it into simple arithmetic. These are representative of common optimization failures we see in Rust. Fixes https://github.com/llvm/llvm-project/issues/69841.	2024-01-24 15:25:29 +01:00
Yingwei Zheng	6681650025	[InstCombine] Revert the `signed icmp -> unsigned icmp` canonicalization when folding `icmp Pred min\|max(X, Y), Z` (#76685 ) This patch tries to flip the signedness of predicates when folding an unsigned icmp with a signed min/max. It will enable more optimizations as we canonicalizes a signed icmp into an unsigned icmp when both operands are known to have the same sign. Fixes #76672. Compile-time impact: http://llvm-compile-time-tracker.com/compare.php?from=949ec83eaf6fa6dbffb94c2ea9c0a4d5efdbd239&to=2deca1aea8a4e13609bab72c522a97d424f0fc2d&stat=instructions:u \|stage1-O3\|stage1-ReleaseThinLTO\|stage1-ReleaseLTO-g\|stage1-O0-g\|stage2-O3\|stage2-O0-g\|stage2-clang\| \|--\|--\|--\|--\|--\|--\|--\| \|-0.00%\|+0.01%\|+0.05%\|-0.12%\|-0.01%\|-0.03%\|-0.00%\| NOTE: We can flip the signedness of predicate if both operands are negative. But I don't see the benefit of handling these cases.	2024-01-05 14:39:16 +08:00
Florian Hahn	f18536d642	[VPlan] Model address separately. (#72164 ) Move vector pointer generation to a separate VPVectorPointerRecipe. This untangles address computation from the memory recipes future and is also needed to enable explicit unrolling in VPlan. https://github.com/llvm/llvm-project/pull/72164	2024-01-01 19:51:15 +00:00
Nikita Popov	d77067d08a	[ValueTracking] Add dominating condition support in computeKnownBits() (#73662 ) This adds support for using dominating conditions in computeKnownBits() when called from InstCombine. The implementation uses a DomConditionCache, which stores which branches may provide information that is relevant for a given value. DomConditionCache is similar to AssumptionCache, but does not try to do any kind of automatic tracking. Relevant branches have to be explicitly registered and invalidated values explicitly removed. The necessary tracking is done inside InstCombine. The reason why this doesn't just do exactly the same thing as AssumptionCache is that a lot more transforms touch branches and branch conditions than assumptions. AssumptionCache is an immutable analysis and mostly gets away with this because only a handful of places have to register additional assumptions (mostly as a result of cloning). This is very much not the case for branches. This change regresses compile-time by about ~0.2%. It also improves stage2-O0-g builds by about ~0.2%, which indicates that this change results in additional optimizations inside clang itself. Fixes https://github.com/llvm/llvm-project/issues/74242.	2023-12-06 14:17:18 +01:00
Nikita Popov	f0faff8b9b	[LoopVectorize] Regenerate test checks (NFC)	2023-11-28 15:50:27 +01:00
Craig Topper	03d4a9d94d	[InstCombine] Set disjoint flag when turning Add into Or. (#72702 ) The disjoint flag was recently added to IR in #72583	2023-11-27 12:54:11 -08:00
Philip Reames	3f2ed812f0	[InstCombine] Infer nneg on zext when forming from non-negative sext (#70706 ) Builds on #67982 which recently introduced the nneg flag on a zext instruction. InstCombine is one of our largest canonicalizers of zext from non-negative sext instructions, so set the flag there.	2023-10-30 12:09:43 -07:00
Dmitriy Smirnov	e13bed4c5f	[PATCH] [llvm] [InstCombine] Canonicalise ADD+GEP This patch tries to canonicalise add + gep to gep + gep. Co-authored-by: Paul Walker <paul.walker@arm.com> Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D155688	2023-10-06 12:29:06 +01:00
Florian Hahn	96e83d3705	[LV] Use IRBuilder to create and optimize middle-block compare. Split off from D150398 to avoid builder-related diff changes there. Using IRBuilder to create ICmps simplifies the result if both operands are constants. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D158332	2023-08-29 11:42:18 +01:00
Florian Hahn	707359ecf5	Recommit "[LV] Re-use existing broadcast value for live-ins." This reverts commit 245ec675a4e41f7ec24dfc998720bffdc46a6c53. Recommits eea9258648ce with a fix to only erase the instruction from the first part if it is defined outside the loop. This fixes a use-after-free error reported.	2023-08-01 15:54:02 +01:00
Nikita Popov	d01aec4c76	[InstCombine] Set dead phi inputs to poison in more cases Set phi inputs to poison whenever we find a dead edge (either during initial worklist population or the main InstCombine run), instead of only doing this for successors of dead blocks. This means that the phi operand is set to poison even if for critical edges without an intermediate block. There are quite a few test changes, because the pattern is fairly common in vectorizer output, for cases where we know the vectorized loop will be entered.	2023-08-01 11:53:47 +02:00
Martin Storsjö	245ec675a4	Revert "[LV] Re-use existing broadcast value for live-ins." This reverts commit eea9258648ce73507f6f85c395de978af659d498. That commit triggered crashes in the following testcase: $ cat reduced.c typedef struct { int a[8] } b; typedef struct { b c; short d } e; void f() { int g; char h; e i = f; short j = i->d; int a = i->c->a[0]; for (;;) for (; g < a; g++) { h = j * i->d >> 8; h++; } } $ clang -target aarch64-linux-gnu -w -c -O2 reduced.c	2023-07-25 10:35:41 +03:00
Florian Hahn	eea9258648	[LV] Re-use existing broadcast value for live-ins. When requesting a vector value for a live-in, we can re-use the broadcast of the live-in of part 0 for parts > 0.	2023-07-24 11:50:47 +01:00
Nikita Popov	745cfa3449	[InstCombine] Compute known bits for multi-use add/sub We were failing to set the known bits for add/sub in the multi-use case, resulting in odd behavioral differences depending on the number of uses. Noticed while adding a consistency assertion. The test changes are essentially a revert to the state before d6498ab. These changes are not really desirable, but if we don't want them, that needs to be handled as part of the heuristic for demanded constant shrinking, not by artifically suppressing the known bits in one specific case.	2023-05-17 17:50:00 +02:00

1 2 3

137 Commits