llvm-project

Author	SHA1	Message	Date
Piotr Fusik	1a44a53cd5	[LSR][NFC] Use range-based `for` (#113889 )	2024-11-05 07:11:23 +01:00
Kazu Hirata	94f9cbbe49	[Scalar] Remove unused includes (NFC) (#114645 ) Identified with misc-include-cleaner.	2024-11-02 08:32:26 -07:00
Youngsuk Kim	caa32e6d6f	[llvm][LSR] Fix where invariant on ScaledReg & Scale is violated (#112576 ) Comments attached to the `ScaledReg` field of `struct Formula` explains that, `ScaledReg` must be non-null when `Scale` is non-zero. This fixes up a code path where this invariant is violated. Also, add an assert to ensure this invariant holds true. Without this patch, compiler aborts with the attached test case. Fixes #76504	2024-10-17 10:47:44 -04:00
Orlando Cazalet-Hyams	7506872afc	[DebugInfo][LSR] Fix assertion failure salvaging IV with offset > 64 bits wide (#110979 ) Fixes #110494	2024-10-03 11:47:08 +01:00
Mehdi Amini	6c7a3f80e7	Fix LLVM_ENABLE_ABI_BREAKING_CHECKS macro check: use #if instead of #ifdef (#110938 ) This macros is always defined: either 0 or 1. The correct pattern is to use #if. Re-apply #110185 with more fixes for debug build with the ABI breaking checks disabled.	2024-10-03 01:24:14 +02:00
Sergey Kachkov	1f2a634c44	Reland "[LSR] Do not create duplicated PHI nodes while preserving LCSSA form" (#107380 ) Motivating example: https://godbolt.org/z/eb97zrxhx Here we have 2 induction variables in the loop: one is corresponding to i variable (add rdx, 4), the other - to res (add rax, 2). The second induction variable can be removed by rewriteLoopExitValues() method (final value of res at loop exit is unroll_iter * -2); however, this doesn't happen because we have duplicated LCSSA phi nodes at loop exit: ``` ; Preheader: for.body.preheader.new: ; preds = %for.body.preheader %unroll_iter = and i64 %N, -4 br label %for.body ; Loop: for.body: ; preds = %for.body, %for.body.preheader.new %lsr.iv = phi i64 [ %lsr.iv.next, %for.body ], [ 0, %for.body.preheader.new ] %i.07 = phi i64 [ 0, %for.body.preheader.new ], [ %inc.3, %for.body ] %inc.3 = add nuw i64 %i.07, 4 %lsr.iv.next = add nsw i64 %lsr.iv, -2 %niter.ncmp.3.not = icmp eq i64 %unroll_iter, %inc.3 br i1 %niter.ncmp.3.not, label %for.end.loopexit.unr-lcssa.loopexit, label %for.body, !llvm.loop !7 ; Exit blocks for.end.loopexit.unr-lcssa.loopexit: ; preds = %for.body %inc.3.lcssa = phi i64 [ %inc.3, %for.body ] %lsr.iv.next.lcssa11 = phi i64 [ %lsr.iv.next, %for.body ] %lsr.iv.next.lcssa = phi i64 [ %lsr.iv.next, %for.body ] br label %for.end.loopexit.unr-lcssa ``` rewriteLoopExitValues requires %lsr.iv.next value to have only 2 uses: one in LCSSA phi node, the other - in induction phi node. Here we have 3 uses of this value because of duplicated lcssa nodes, so the transform doesn't apply and leads to an extra add operation inside the loop. The proposed solution is to accumulate inserted instructions that will require LCSSA form update into SetVector and then call formLCSSAForInstructions for this SetVector once, so the same instructions don't process twice. Reland fixes the issue with preserve-lcssa.ll test: it fails in the situation when x86_64-unknown-linux-gnu target is unavailable in opt. The changes are moved into separate duplicated-phis.ll test with explicit x86 target requirement to fix bots which are not building this target.	2024-09-09 16:14:51 +03:00
dyung	2bf551e600	Revert "[LSR] Do not create duplicated PHI nodes while preserving LCSSA form" (#107666 ) Reverts llvm/llvm-project#107380 Change is causing the test preserve-lcssa.ll to fail on at least 2 build bots: - https://lab.llvm.org/buildbot/#/builders/190/builds/5231 - https://lab.llvm.org/buildbot/#/builders/161/builds/1855	2024-09-06 19:54:26 -07:00
Sergey Kachkov	2cb4d1b1bd	[LSR] Do not create duplicated PHI nodes while preserving LCSSA form (#107380 ) Motivating example: https://godbolt.org/z/eb97zrxhx Here we have 2 induction variables in the loop: one is corresponding to i variable (add rdx, 4), the other - to res (add rax, 2). The second induction variable can be removed by rewriteLoopExitValues() method (final value of res at loop exit is unroll_iter * -2); however, this doesn't happen because we have duplicated LCSSA phi nodes at loop exit: ``` ; Preheader: for.body.preheader.new: ; preds = %for.body.preheader %unroll_iter = and i64 %N, -4 br label %for.body ; Loop: for.body: ; preds = %for.body, %for.body.preheader.new %lsr.iv = phi i64 [ %lsr.iv.next, %for.body ], [ 0, %for.body.preheader.new ] %i.07 = phi i64 [ 0, %for.body.preheader.new ], [ %inc.3, %for.body ] %inc.3 = add nuw i64 %i.07, 4 %lsr.iv.next = add nsw i64 %lsr.iv, -2 %niter.ncmp.3.not = icmp eq i64 %unroll_iter, %inc.3 br i1 %niter.ncmp.3.not, label %for.end.loopexit.unr-lcssa.loopexit, label %for.body, !llvm.loop !7 ; Exit blocks for.end.loopexit.unr-lcssa.loopexit: ; preds = %for.body %inc.3.lcssa = phi i64 [ %inc.3, %for.body ] %lsr.iv.next.lcssa11 = phi i64 [ %lsr.iv.next, %for.body ] %lsr.iv.next.lcssa = phi i64 [ %lsr.iv.next, %for.body ] br label %for.end.loopexit.unr-lcssa ``` rewriteLoopExitValues requires %lsr.iv.next value to have only 2 uses: one in LCSSA phi node, the other - in induction phi node. Here we have 3 uses of this value because of duplicated lcssa nodes, so the transform doesn't apply and leads to an extra add operation inside the loop. The proposed solution is to accumulate inserted instructions that will require LCSSA form update into SetVector and then call formLCSSAForInstructions for this SetVector once, so the same instructions don't process twice.	2024-09-06 18:39:47 +03:00
Nikita Popov	7660981402	[LSR] Use computeConstantDifference() This API is faster than getMinusSCEV() and a SCEVConstant cast.	2024-08-28 12:20:59 +02:00
Philip Reames	27a62ec72a	[LSR] Split the -lsr-term-fold transformation into it's own pass (#104234 ) This transformation doesn't actually use any of the internal state of LSR and recomputes all information from SCEV. Splitting it out makes it easier to test. Note that long term I would like to write a version of this transform which is integrated with LSR's solver, but if that happens, we'll just delete the extra pass. Integration wise, I switched from using TTI to using a pass configuration variable. This seems slightly more idiomatic, and means we don't run the extra logic on any target other than RISCV.	2024-08-17 18:34:23 -07:00
Benjamin Maxwell	7fad04e94b	[LSR] Fix matching vscale immediates (#100080 ) Somewhat confusingly a `SCEVMulExpr` is a `SCEVNAryExpr`, so can have > 2 operands. Previously, the vscale immediate matching did not check the number of operands of the `SCEVMulExpr`, so would ignore any operands after the first two. This led to incorrect codegen (and results) for ArmSME in IREE (https://github.com/iree-org/iree), which sometimes addresses things that are a `vscale * vscale` multiple away. The test added with this change shows an example reduced from IREE. The second write should be offset from the first `16 * vscale * vscale` (* 4 bytes), however, previously LSR dropped the second vscale and instead offset the write by `#4, mul vl`, which is an offset of `16 * vscale` (* 4 bytes).	2024-07-24 10:06:34 +01:00
Shan Huang	d83d09facd	[DebugInfo][LoopStrengthReduce] Fix missing debug location updates (#97519 ) Fix #97510 . Note that, for the new phi instruction `NewPH`, which replaces the old phi `PH` and the cast `ShadowUse`, I choose to propagate the debug location of `PH` to it, because the cast is eliminated according to the optimization semantics.	2024-07-15 09:44:18 +08:00
Kazu Hirata	2f55e55101	[Transforms] Use range-based for loops (NFC) (#98725 )	2024-07-14 13:44:50 -07:00
Graham Hunter	4311b14e9c	[LSR] Recognize vscale-relative immediates (#88124 ) Extends LoopStrengthReduce to recognize immediates multiplied by vscale, and query the current target for whether they are legal offsets for memory operations or adds.	2024-07-01 09:23:31 +01:00
Nikita Popov	2d209d964a	[IR] Add getDataLayout() helpers to BasicBlock and Instruction (#96902 ) This is a helper to avoid writing `getModule()->getDataLayout()`. I regularly try to use this method only to remember it doesn't exist... `getModule()->getDataLayout()` is also a common (the most common?) reason why code has to include the Module.h header.	2024-06-27 16:38:15 +02:00
David Green	c7308d405d	[LSR][AArch64] Optimize chain generation based on legal addressing modes (#94453 ) LSR will generate chains of related instructions with a known increment between them. With SVE, in the case of the test case, this can include increments like 'vscale * 16 + 8'. The idea of this patch is if we have a '+8' increment already calculated in the chain, we can generate a (legal) '+ vscale*16' addressing mode from it, allowing us to use the '[x16, #1, mul vl]' addressing mode instructions. In order to do this we keep track of the known 'bases' when generating chains in GenerateIVChain, checking for each if the accumulated increment expression from the base neatly folds into a legal addressing mode. If they do not we fall back to the existing LeftOverExpr, whether it is legal or not. This is mostly orthogonal to #88124, dealing with the generation of chains as opposed to rest of LSR. The existing vscale addressing mode work has greatly helped compared to the last time I looked at this, allowing us to check that the addressing modes are indeed legal.	2024-06-10 20:35:33 +01:00
Alex Bradbury	5a20141539	[LSR] Provide TTI hook to enable dropping solutions deemed to be unprofitable (#89924 ) <https://reviews.llvm.org/D126043> introduced a flag to drop solutions if deemed unprofitable. As noted there, introducing a TTI hook enables backends to individually opt into this behaviour. This will be used by #89927.	2024-06-05 14:40:58 +02:00
Philip Reames	baca93fc83	[LSR] Tweak debug output to always print initial cost	2024-05-14 13:34:20 -07:00
Graham Hunter	2e8d815596	[TTI] Support scalable offsets in getScalingFactorCost (#88113 ) Part of the work to support vscale-relative immediates in LSR.	2024-05-10 11:22:11 +01:00
Stephen Tozer	ffd08c7759	[RemoveDIs][NFC] Rename DPValue -> DbgVariableRecord (#85216 ) This is the major rename patch that prior patches have built towards. The DPValue class is being renamed to DbgVariableRecord, which reflects the updated terminology for the "final" implementation of the RemoveDI feature. This is a pure string substitution + clang-format patch. The only manual component of this patch was determining where to perform these string substitutions: `DPValue` and `DPV` are almost exclusively used for DbgRecords, except for: - llvm/lib/target, where 'DP' is used to mean double-precision, and so appears as part of .td files and in variable names. NB: There is a single existing use of `DPValue` here that refers to debug info, which I've manually updated. - llvm/tools/gold, where 'LDPV' is used as a prefix for symbol visibility enums. Outside of these places, I've applied several basic string substitutions, with the intent that they only affect DbgRecord-related identifiers; I've checked them as I went through to verify this, with reasonable confidence that there are no unintended changes that slipped through the cracks. The substitutions applied are all case-sensitive, and are applied in the order shown: ``` DPValue -> DbgVariableRecord DPVal -> DbgVarRec DPV -> DVR ``` Following the previous rename patches, it should be the case that there are no instances of any of these strings that are meant to refer to the general case of DbgRecords, or anything other than the DPValue class. The idea behind this patch is therefore that pure string substitution is correct in all cases as long as these assumptions hold.	2024-03-19 20:07:07 +00:00
Stephen Tozer	2e865353ed	[RemoveDIs][NFC] Move DPValue::filter -> filterDbgVars (#85208 ) This patch changes DPValue::filter to be a non-member method filterDbgVars. There are two reasons for this: firstly, the name of DPValue is about to change to DbgVariableRecord, which will result in every `for` loop that uses DPValue::filter to require a line break. This is a small thing, but it makes the rename patch more difficult to review, and is just generally more awkward for what is a fairly common loop. Secondly, the intent is to later break up the DPValue class into subclasses, at which point it would be better to have a non-member function that allows template arguments for the cases we want to filter with greater specificity.	2024-03-14 12:19:15 +00:00
Nikita Popov	beba307c5b	[LSR] Clear SCEVExpander before deleting phi nodes Fixes https://github.com/llvm/llvm-project/issues/84709.	2024-03-12 16:24:10 +01:00
Stephen Tozer	15f3f446c5	[RemoveDIs][NFC] Rename common interface functions for DPValues->DbgRecords (#84793 ) As part of the effort to rename the DbgRecord classes, this patch renames the widely-used functions that operate on DbgRecords but refer to DbgValues or DPValues in their names to refer to DbgRecords instead; all such functions are defined in one of `BasicBlock.h`, `Instruction.h`, and `DebugProgramInstruction.h`. This patch explicitly does not change the names of any comments or variables, except for where they use the exact name of one of the renamed functions. The reason for this is reviewability; this patch can be trivially examined to determine that the only changes are direct string substitutions and any results from clang-format responding to the changed line lengths. Future patches will cover renaming variables and comments, and then renaming the classes themselves.	2024-03-12 14:53:13 +00:00
Jeremy Morse	2fe81edef6	[NFC][RemoveDIs] Insert instruction using iterators in Transforms/ As part of the RemoveDIs project we need LLVM to insert instructions using iterators wherever possible, so that the iterators can carry a bit of debug-info. This commit implements some of that by updating the contents of llvm/lib/Transforms/Utils to always use iterator-versions of instruction constructors. There are two general flavours of update: * Almost all call-sites just call getIterator on an instruction * Several make use of an existing iterator (scenarios where the code is actually significant for debug-info) The underlying logic is that any call to getFirstInsertionPt or similar APIs that identify the start of a block need to have that iterator passed directly to the insertion function, without being converted to a bare Instruction pointer along the way. Noteworthy changes: * FindInsertedValue now takes an optional iterator rather than an instruction pointer, as we need to always insert with iterators, * I've added a few iterator-taking versions of some value-tracking and DomTree methods -- they just unwrap the iterator. These are purely convenience methods to avoid extra syntax in some passes. * A few calls to getNextNode become std::next instead (to keep in the theme of using iterators for positions), * SeparateConstOffsetFromGEP has it's insertion-position field changed. Noteworthy because it's not a purely localised spelling change. All this should be NFC.	2024-03-05 15:12:22 +00:00
Patrick O'Neill	8d6e867eb2	[LSR][term-fold] Ensure the simple recurrence is from the current loop (#83085 ) If the phi node found by matchSimpleRecurrence is not from the current loop, then isAlmostDeadIV panics. With this patch we bail out early. Signed-off-by: Patrick O'Neill <patrick@rivosinc.com> --------- Signed-off-by: Patrick O'Neill <patrick@rivosinc.com>	2024-03-04 16:40:40 -08:00
Nikita Popov	72763521c3	[LSR] Clear SCEVExpander before calling DeleteDeadPHIs To avoid an assertion failure when an AssertingVH is removed, as reported in: https://github.com/llvm/llvm-project/pull/82362#issuecomment-1960067147 Also remove an unnecessary use of SCEVExpanderCleaner.	2024-02-22 22:50:26 +01:00
Orlando Cazalet-Hyams	ababa96475	[RemoveDIs][NFC] Introduce DbgRecord base class [1/3] (#78252 ) Patch 1 of 3 to add llvm.dbg.label support to the RemoveDIs project. The patch stack adds a new base class -> 1. Add DbgRecord base class for DPValue and the not-yet-added DPLabel class. 2. Add the DPLabel class. 3. Enable dbg.label conversion and add support to passes. Patches 1 and 2 are NFC. In the near future we also will rename DPValue to DbgVariableRecord and DPLabel to DbgLabelRecord, at which point we'll overhaul the function names too. The name DPLabel keeps things consistent for now.	2024-02-20 16:00:55 +00:00
Philip Reames	28865da374	[LSR][term-fold] Adjust expansion budget based on trip count (#80304 ) Follow up to https://github.com/llvm/llvm-project/pull/74747 This change extends the previously added fixed expansion threshold by scaling down the cost allowed for an expansion for a loop with either a small known trip count or a profile which indicates the trip count is likely small. The goal here is to improve code generation for a loop nest where the outer loop has a high trip count, and the inner loop runs only a handful of iterations. --------- Co-authored-by: Nikita Popov <github@npopov.com>	2024-02-02 08:57:26 -08:00
Philip Reames	f264da4322	[lsr][term-fold] Restrict transform to low cost expansions (#74747 ) This is a follow up to an item I noted in my submission comment for e947f95. I don't have a real world example where this is triggering unprofitably, but avoiding the transform when we estimate the loop to be short running from profiling seems quite reasonable. It's also now come up as a possibility in a regression twice in two days, so I'd like to get this in to close out the possibility if nothing else. The original review dropped the threshold for short trip count loops. I will return to that in a separate review if this lands.	2024-01-31 14:48:20 -08:00
Craig Topper	3dea0aa8f4	[LSR] Fix incorrect comment. NFC (#79207 )	2024-01-23 17:57:34 -08:00
Stephen Tozer	7c53e9f667	[RemoveDIs][DebugInfo] Add support for DPValues to LoopStrengthReduce (#78706 ) This patch trivially extends support for DbgValueInst recovery to DPValues in LoopStrengthReduce; they are handled identically, so this is mostly done by reusing the DbgValueInst code (using templates or auto-parameter lambdas to reduce actual code duplication).	2024-01-22 18:59:19 +00:00
Philip Reames	f5dd70c582	[LSR] Require non-zero step when considering wrap around for term folding (#77809 ) The term folding logic needs to prove that the induction variable does not cycle through the same set of values so that testing for the value of the IV on the exiting iteration is guaranteed to trigger only on that iteration. The prior code checked the no-self-wrap property on the IV, but this is insufficient as a zero step is trivially no-self-wrap per SCEV's definition but does repeat the same series of values. In the current form, this has the effect of basically disabling lsr's term-folding for all non-constant strides. This is still a net improvement as we've disabled term-folding entirely, so being able to enable it for constant strides is still a net improvement. As future work, there's two SCEV weakness worth investigating. First sext (or i32 %a, 1) to i64 does not return true for isKnownNonZero. This is because we check only the unsigned range in that query. We could either do query pushdown, or check the signed range as well. I tried the second locally and it has very broad impact - i.e. we have a bunch of missing optimizations here. Second, zext (or i32 %a, 1) to i64 as the increment to the IV in expensive_expand_short_tc causes the addrec to no longer be provably no-self-wrap. I didn't investigate this so it might be necessary, but the loop structure is such that I find this result surprising.	2024-01-11 10:07:17 -08:00
Craig Topper	d8ddcae547	[LSR] Fix typo in debug message where backspace escape was used instead of new line.	2023-12-24 10:35:27 -08:00
Joshua Cao	bd382032f6	[BBUtils][NFC] Delete SplitLandingPadPredecessors with DT (#73406 ) Function is marked for deprecation. There is only one consumer which is converted to use DomTreeUpdater.	2023-12-02 11:33:43 -08:00
Simon Pilgrim	3246a32d3f	Fix MSVC "not all control paths return a value" warning. NFC.	2023-11-30 10:07:01 +00:00
Philip Reames	e947f95337	[LSR][TTI][RISCV] Enable terminator folding for RISC-V If looking for a miscompile revert candidate, look here! The transform being enabled prefers comparing to a loop invariant exit value for a secondary IV over using an otherwise dead primary IV. This increases register pressure (by requiring the exit value to be live through the loop), but reduces the number of instructions within the loop by one. On RISC-V which has a large number of scalar registers, this is generally a profitable transform. We loose the ability to use a beqz on what is typically a count down IV, and pay the cost of computing the exit value on the secondary IV in the loop preheader, but save an add or sub in the loop body. For anything except an extremely short running loop, or one with extreme register pressure, this is profitable. On spec2017, we see a 0.42% geomean improvement in dynamic icount, with no individual workload regressing by more than 0.25%. Code size wise, we trade a (possibly compressible) beqz and a (possibly compressible) addi for a uncompressible beq. We also add instructions in the preheader. Net result is a slight regression overall, but neutral or better inside the loop. Previous versions of this transform had numerous cornercase correctness bugs. All of them ones I can spot by inspection have been fixed, and I have run this through all of spec2017, but there may be further issues lurking. Adding uses to an IV is a fraught thing to do given poison semantics, so this transform is somewhat inherently risky. This patch is a reworked version of D134893 by @eop. That patch has been abandoned since May, so I picked it up, reworked it a bit, and am landing it.	2023-11-29 12:04:06 -08:00
Nikita Popov	e408f70524	[LSR] Avoid use of ConstantExpr::getCast() (NFC) Use the constant folding API instead, which must succeed as we're working on a ConstantInt.	2023-11-02 09:48:04 +01:00
David Green	54e5de08d4	[ARM][LSR] Exclude uses outside the loop when favoring postinc. (#67090 ) Extra uses for variables outside the loop can mess with the generation of postinc variables. This patch alters the collection of loop invariant fixups in LSR when the target is optimizing for PostInc, to exclude the collection of these extra uses. It is expected that the variable can be rematerialized, which will lead to a more optimal sequence of instructions in the loop.	2023-09-25 10:09:36 +01:00
Nikita Popov	d35e5afc87	[LSR] Simplify type check for opaque pointers (NFC) For pointer types, checking the address space is the same as type equality now, so we no longer need the special case.	2023-09-22 10:23:04 +02:00
Bjorn Pettersson	db456dc6ba	[LSR] Drop support for typed pointers The opaque pointers are already "canonicalized". So remove the redundant/obsolete code.	2023-09-07 16:37:45 +02:00
Nuno Lopes	d8e2821170	[LSR] Use poison instead of undef as placeholder [NFC] This value is patcher afterwards, and only used temporarily during dbg info construction	2023-07-23 15:57:21 +01:00
Nikita Popov	ddb46abd3c	[LSR] Don't consider users of constant outside loop In CollectLoopInvariantFixupsAndFormulae(), LSR looks at users outside the loop. E.g. if we have an addrec based on %base, and %base is also used outside the loop, then we have to keep it in a register anyway, which may make it more profitable to use %base + %idx style addressing. This reasoning doesn't hold up when the base is a constant, because the constant can be rematerialized. The lsr-memcpy.ll test regressed when enabling opaque pointers, because inttoptr (i64 6442450944 to ptr) now also has a use outside the loop (previously it didn't due to a pointer type difference), and that extra "use" results in worse use of addressing modes in the loop. However, the use outside the loop actually gets rematerialized, so the alleged register saving does not occur. The same reasoning also applies to other types of constants, such as global variable references. Differential Revision: https://reviews.llvm.org/D155073	2023-07-13 12:22:38 +02:00
Florian Hahn	7f5b15ad15	[LSR] Move normalization check to normalizeForPostIncUse. Move the logic added in 3a57152d85e1 to normalizeForPostIncUse to catch additional un-invertable cases. This fixes another mis-compile pointed out by @peixin in D153004.	2023-07-04 11:56:51 +01:00
Florian Hahn	3a57152d85	[LSR] Return nullptr from getExpr if the result isn't invertible. getExpr is missing a check to make sure the result is invertible. This can lead to incorrect results, so return nullptr in those cases like in other places in IVUsers. Fixes #62660. Reviewed By: qcolombet Differential Revision: https://reviews.llvm.org/D153202	2023-06-22 19:10:48 +01:00
Luke Lau	438cc10b8e	[IR] Add getAccessType to Instruction There are multiple places in the code where the type of memory being accessed from an instruction needs to be obtained, including an upcoming patch to improve GEP cost modeling. This deduplicates the logic between them. It's not strictly NFC as EarlyCSE/LoopStrengthReduce may catch more intrinsics now. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D150583	2023-06-21 16:17:25 +01:00
Florian Hahn	dae5cd73cb	Recommit "[LSR] Consider post-inc form when creating extends/truncates." This reverts the revert commit 1797ab36efc9c90c921cd725831f8c3f6a7125a2. The recommitted version now checks the PostIncLoopSets for all fixups and returns nullptr if the result doesn't match for all fixups.	2023-06-19 17:57:06 +01:00
Florian Hahn	1797ab36ef	Revert "[LSR] Consider post-inc form when creating extends/truncates." This reverts commit abfeda5af329b5889db709ff74506e20e0b569e9. and fe19036e1266d2a90b44725c82b898134906e4c3. The added assertion triggers during clang bootstrap builds. Revert while I investigate.	2023-06-17 17:58:41 +01:00
Florian Hahn	abfeda5af3	[LSR] Consider post-inc form when creating extends/truncates. GenerateTruncates at the moment creates extends/truncates for post-inc uses of normalized expressions. For example, if an add rec of the form {1,+,-1} is used outside the loop, the normalized form will use {1,+,-1} instead of {0,+,-1}. When naively sign-extending the normalized expression, it will get extended incorrectly to {1,+,-1} for the wider type, if the backedge-taken count of the loop is 1. To address this, the patch updates GenerateTruncates to check if the LSRUse contains any fixups with PostIncLoops. If that's the case, first de-normalize the expression, then perform the extend/truncate, then normalize again. There may be other places where similar checks are needed and the helper can be generalized for those cases. I'd not be surprised if other subtle mis-compiles are caused by this. Fixes #38847. Fixes #58039. Fixes #62852. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D153004	2023-06-17 09:58:37 +01:00
Nikita Popov	143ed21b26	Revert "[LCSSA] Remove unused ScalarEvolution argument (NFC)" This reverts commit 5362a0d859d8e96b3f7c0437b7866e17a818a4f7. In preparation for reverting a dependent revision.	2023-06-05 16:45:38 +02:00
Nikita Popov	d5c56c5162	[SCEVExpander] Remember phi nodes inserted by LCSSA construction SCEVExpander keeps track of all instructions it inserted. However, it currently misses some phi nodes created during LCSSA construction. Fix this by collecting these into another argument. This also removes the IRBuilder argument, which was added for essentially the same purpose, but only handles the root LCSSA nodes, not those inserted by SSAUpdater. This was reported as a regression on D149344, but the reduced test case also reproduces without it. Differential Revision: https://reviews.llvm.org/D150681	2023-05-25 09:34:19 +02:00

1 2 3 4 5 ...

986 Commits