llvm-project

Author	SHA1	Message	Date
Nikita Popov	6dc23b7009	[SCEVExpander] Don't try to reuse SCEVUnknown values (#115141 ) The expansion of a SCEVUnknown is trivial (it's just the wrapped value). If we try to reuse an existing value it might be a more complex expression that simplifies to the SCEVUnknown. This is inspired by https://github.com/llvm/llvm-project/issues/114879, because SCEVExpander replacing a constant with a phi node is just silly. (I don't consider this a fix for that issue though.)	2024-11-11 12:36:29 +01:00
Yingwei Zheng	0b9f1cc024	[SCEV] Disallow simplifying phi(undef, X) to X (#115109 ) See the following case: ``` @GlobIntONE = global i32 0, align 4 define ptr @src() { entry: br label %for.body.peel.begin for.body.peel.begin: ; preds = %entry br label %for.body.peel for.body.peel: ; preds = %for.body.peel.begin br i1 true, label %cleanup.peel, label %cleanup.loopexit.peel cleanup.loopexit.peel: ; preds = %for.body.peel br label %cleanup.peel cleanup.peel: ; preds = %cleanup.loopexit.peel, %for.body.peel %retval.2.peel = phi ptr [ undef, %for.body.peel ], [ @GlobIntONE, %cleanup.loopexit.peel ] br i1 true, label %for.body.peel.next, label %cleanup7 for.body.peel.next: ; preds = %cleanup.peel br label %for.body.peel.next1 for.body.peel.next1: ; preds = %for.body.peel.next br label %entry.peel.newph entry.peel.newph: ; preds = %for.body.peel.next1 br label %for.body for.body: ; preds = %cleanup, %entry.peel.newph %retval.0 = phi ptr [ %retval.2.peel, %entry.peel.newph ], [ %retval.2, %cleanup ] br i1 false, label %cleanup, label %cleanup.loopexit cleanup.loopexit: ; preds = %for.body br label %cleanup cleanup: ; preds = %cleanup.loopexit, %for.body %retval.2 = phi ptr [ %retval.0, %for.body ], [ @GlobIntONE, %cleanup.loopexit ] br i1 false, label %for.body, label %cleanup7.loopexit cleanup7.loopexit: ; preds = %cleanup %retval.2.lcssa.ph = phi ptr [ %retval.2, %cleanup ] br label %cleanup7 cleanup7: ; preds = %cleanup7.loopexit, %cleanup.peel %retval.2.lcssa = phi ptr [ %retval.2.peel, %cleanup.peel ], [ %retval.2.lcssa.ph, %cleanup7.loopexit ] ret ptr %retval.2.lcssa } define ptr @tgt() { entry: br label %for.body.peel.begin for.body.peel.begin: ; preds = %entry br label %for.body.peel for.body.peel: ; preds = %for.body.peel.begin br i1 true, label %cleanup.peel, label %cleanup.loopexit.peel cleanup.loopexit.peel: ; preds = %for.body.peel br label %cleanup.peel cleanup.peel: ; preds = %cleanup.loopexit.peel, %for.body.peel %retval.2.peel = phi ptr [ undef, %for.body.peel ], [ @GlobIntONE, %cleanup.loopexit.peel ] br i1 true, label %for.body.peel.next, label %cleanup7 for.body.peel.next: ; preds = %cleanup.peel br label %for.body.peel.next1 for.body.peel.next1: ; preds = %for.body.peel.next br label %entry.peel.newph entry.peel.newph: ; preds = %for.body.peel.next1 br label %for.body for.body: ; preds = %cleanup, %entry.peel.newph br i1 false, label %cleanup, label %cleanup.loopexit cleanup.loopexit: ; preds = %for.body br label %cleanup cleanup: ; preds = %cleanup.loopexit, %for.body br i1 false, label %for.body, label %cleanup7.loopexit cleanup7.loopexit: ; preds = %cleanup %retval.2.lcssa.ph = phi ptr [ %retval.2.peel, %cleanup ] br label %cleanup7 cleanup7: ; preds = %cleanup7.loopexit, %cleanup.peel %retval.2.lcssa = phi ptr [ %retval.2.peel, %cleanup.peel ], [ %retval.2.lcssa.ph, %cleanup7.loopexit ] ret ptr %retval.2.lcssa } ``` 1. `simplifyInstruction(%retval.2.peel)` returns `@GlobIntONE`. Thus, `ScalarEvolution::createNodeForPHI` returns SCEV expr `@GlobIntONE` for `%retval.2.peel`. 2. `SimplifyIndvar::replaceIVUserWithLoopInvariant` tries to replace the use of `%retval.2.peel` in `%retval.2.lcssa.ph` with `@GlobIntONE`. 3. `simplifyLoopAfterUnroll -> simplifyLoopIVs -> SCEVExpander::expand` reuses `%retval.2.peel = phi ptr [ undef, %for.body.peel ], [ @GlobIntONE, %cleanup.loopexit.peel ]` to generate code for `@GlobIntONE`. It is incorrect. This patch disallows simplifying `phi(undef, X)` to `X` by setting `CanUseUndef` to false. Closes https://github.com/llvm/llvm-project/issues/114879.	2024-11-07 15:53:51 +08:00
Youngsuk Kim	caa32e6d6f	[llvm][LSR] Fix where invariant on ScaledReg & Scale is violated (#112576 ) Comments attached to the `ScaledReg` field of `struct Formula` explains that, `ScaledReg` must be non-null when `Scale` is non-zero. This fixes up a code path where this invariant is violated. Also, add an assert to ensure this invariant holds true. Without this patch, compiler aborts with the attached test case. Fixes #76504	2024-10-17 10:47:44 -04:00
Orlando Cazalet-Hyams	7506872afc	[DebugInfo][LSR] Fix assertion failure salvaging IV with offset > 64 bits wide (#110979 ) Fixes #110494	2024-10-03 11:47:08 +01:00
Jeremy Morse	e6bf48d110	[X86] Don't request 0x90 nop filling in p2align directives (#110134 ) As of rev ea222be0d, LLVMs assembler will actually try to honour the "fill value" part of p2align directives. X86 printed these as 0x90, which isn't actually what it wanted: we want multi-byte nops for .text padding. Compiling via a textual assembly file produces single-byte nop padding since ea222be0d but the built-in assembler will produce multi-byte nops. This divergent behaviour is undesirable. To fix: don't set the byte padding field for x86, which allows the assembler to pick multi-byte nops. Test that we get the same multi-byte padding when compiled via textual assembly or directly to object file. Added same-align-bytes-with-llasm-llobj.ll to that effect, updated numerous other tests to not contain check-lines for the explicit padding.	2024-10-02 11:14:05 +01:00
Nikita Popov	9f3d1695eb	[SCEVExpander] Preserve gep nuw during expansion (#102133 ) When expanding SCEV adds to geps, transfer the nuw flag to the resulting gep. (Note that this doesn't apply to IV increment GEPs, which go through a different code path.)	2024-10-02 11:45:00 +02:00
Nikita Popov	4b3ba64ba7	[SCEVExpander] Clear flags when reusing GEP (#109293 ) As pointed out in the review of #102133, SCEVExpander currently incorrectly reuses GEP instructions that have poison-generating flags set. Fix this by clearing the flags on the reused instruction.	2024-10-01 14:22:54 +02:00
Nikita Popov	c4744e49f6	[LSR] Regenerate test checks (NFC)	2024-09-19 16:34:10 +02:00
Sergey Kachkov	1f2a634c44	Reland "[LSR] Do not create duplicated PHI nodes while preserving LCSSA form" (#107380 ) Motivating example: https://godbolt.org/z/eb97zrxhx Here we have 2 induction variables in the loop: one is corresponding to i variable (add rdx, 4), the other - to res (add rax, 2). The second induction variable can be removed by rewriteLoopExitValues() method (final value of res at loop exit is unroll_iter * -2); however, this doesn't happen because we have duplicated LCSSA phi nodes at loop exit: ``` ; Preheader: for.body.preheader.new: ; preds = %for.body.preheader %unroll_iter = and i64 %N, -4 br label %for.body ; Loop: for.body: ; preds = %for.body, %for.body.preheader.new %lsr.iv = phi i64 [ %lsr.iv.next, %for.body ], [ 0, %for.body.preheader.new ] %i.07 = phi i64 [ 0, %for.body.preheader.new ], [ %inc.3, %for.body ] %inc.3 = add nuw i64 %i.07, 4 %lsr.iv.next = add nsw i64 %lsr.iv, -2 %niter.ncmp.3.not = icmp eq i64 %unroll_iter, %inc.3 br i1 %niter.ncmp.3.not, label %for.end.loopexit.unr-lcssa.loopexit, label %for.body, !llvm.loop !7 ; Exit blocks for.end.loopexit.unr-lcssa.loopexit: ; preds = %for.body %inc.3.lcssa = phi i64 [ %inc.3, %for.body ] %lsr.iv.next.lcssa11 = phi i64 [ %lsr.iv.next, %for.body ] %lsr.iv.next.lcssa = phi i64 [ %lsr.iv.next, %for.body ] br label %for.end.loopexit.unr-lcssa ``` rewriteLoopExitValues requires %lsr.iv.next value to have only 2 uses: one in LCSSA phi node, the other - in induction phi node. Here we have 3 uses of this value because of duplicated lcssa nodes, so the transform doesn't apply and leads to an extra add operation inside the loop. The proposed solution is to accumulate inserted instructions that will require LCSSA form update into SetVector and then call formLCSSAForInstructions for this SetVector once, so the same instructions don't process twice. Reland fixes the issue with preserve-lcssa.ll test: it fails in the situation when x86_64-unknown-linux-gnu target is unavailable in opt. The changes are moved into separate duplicated-phis.ll test with explicit x86 target requirement to fix bots which are not building this target.	2024-09-09 16:14:51 +03:00
Sergey Kachkov	17f0c5dfaa	[LSR][NFC] Add pre-commit test	2024-09-09 16:14:50 +03:00
dyung	2bf551e600	Revert "[LSR] Do not create duplicated PHI nodes while preserving LCSSA form" (#107666 ) Reverts llvm/llvm-project#107380 Change is causing the test preserve-lcssa.ll to fail on at least 2 build bots: - https://lab.llvm.org/buildbot/#/builders/190/builds/5231 - https://lab.llvm.org/buildbot/#/builders/161/builds/1855	2024-09-06 19:54:26 -07:00
Sergey Kachkov	2cb4d1b1bd	[LSR] Do not create duplicated PHI nodes while preserving LCSSA form (#107380 ) Motivating example: https://godbolt.org/z/eb97zrxhx Here we have 2 induction variables in the loop: one is corresponding to i variable (add rdx, 4), the other - to res (add rax, 2). The second induction variable can be removed by rewriteLoopExitValues() method (final value of res at loop exit is unroll_iter * -2); however, this doesn't happen because we have duplicated LCSSA phi nodes at loop exit: ``` ; Preheader: for.body.preheader.new: ; preds = %for.body.preheader %unroll_iter = and i64 %N, -4 br label %for.body ; Loop: for.body: ; preds = %for.body, %for.body.preheader.new %lsr.iv = phi i64 [ %lsr.iv.next, %for.body ], [ 0, %for.body.preheader.new ] %i.07 = phi i64 [ 0, %for.body.preheader.new ], [ %inc.3, %for.body ] %inc.3 = add nuw i64 %i.07, 4 %lsr.iv.next = add nsw i64 %lsr.iv, -2 %niter.ncmp.3.not = icmp eq i64 %unroll_iter, %inc.3 br i1 %niter.ncmp.3.not, label %for.end.loopexit.unr-lcssa.loopexit, label %for.body, !llvm.loop !7 ; Exit blocks for.end.loopexit.unr-lcssa.loopexit: ; preds = %for.body %inc.3.lcssa = phi i64 [ %inc.3, %for.body ] %lsr.iv.next.lcssa11 = phi i64 [ %lsr.iv.next, %for.body ] %lsr.iv.next.lcssa = phi i64 [ %lsr.iv.next, %for.body ] br label %for.end.loopexit.unr-lcssa ``` rewriteLoopExitValues requires %lsr.iv.next value to have only 2 uses: one in LCSSA phi node, the other - in induction phi node. Here we have 3 uses of this value because of duplicated lcssa nodes, so the transform doesn't apply and leads to an extra add operation inside the loop. The proposed solution is to accumulate inserted instructions that will require LCSSA form update into SetVector and then call formLCSSAForInstructions for this SetVector once, so the same instructions don't process twice.	2024-09-06 18:39:47 +03:00
Philip Reames	27a62ec72a	[LSR] Split the -lsr-term-fold transformation into it's own pass (#104234 ) This transformation doesn't actually use any of the internal state of LSR and recomputes all information from SCEV. Splitting it out makes it easier to test. Note that long term I would like to write a version of this transform which is integrated with LSR's solver, but if that happens, we'll just delete the extra pass. Integration wise, I switched from using TTI to using a pass configuration variable. This seems slightly more idiomatic, and means we don't run the extra logic on any target other than RISCV.	2024-08-17 18:34:23 -07:00
Benjamin Maxwell	7fad04e94b	[LSR] Fix matching vscale immediates (#100080 ) Somewhat confusingly a `SCEVMulExpr` is a `SCEVNAryExpr`, so can have > 2 operands. Previously, the vscale immediate matching did not check the number of operands of the `SCEVMulExpr`, so would ignore any operands after the first two. This led to incorrect codegen (and results) for ArmSME in IREE (https://github.com/iree-org/iree), which sometimes addresses things that are a `vscale * vscale` multiple away. The test added with this change shows an example reduced from IREE. The second write should be offset from the first `16 * vscale * vscale` (* 4 bytes), however, previously LSR dropped the second vscale and instead offset the write by `#4, mul vl`, which is an offset of `16 * vscale` (* 4 bytes).	2024-07-24 10:06:34 +01:00
Benjamin Maxwell	c1b70fa5bf	Precommit vscale-fixups.ll test (NFC) Precommit test for #100080.	2024-07-23 10:03:14 +00:00
Luke Lau	557ef043af	[RISCV] Copy AVLs whose LiveIntervals aren't extendable in insertVSETVLI (#98342 ) Currently before forwarding an AVL we do a simple non-exhaustive check to see if its LiveInterval is extendable. But we also need to check for this when we're extending an AVL's LiveInterval via merging the VSETVLIInfos in transferBefore with equally zero AVLs. Rather than trying to conservatively prevent these cases, this inserts a copy of the AVL instead if we don't know we'll be able to extend it. This is likely to be more robust, and even if the extra copy is undesirable these cases should be rare in practice.	2024-07-15 13:18:12 +08:00
Shan Huang	d83d09facd	[DebugInfo][LoopStrengthReduce] Fix missing debug location updates (#97519 ) Fix #97510 . Note that, for the new phi instruction `NewPH`, which replaces the old phi `PH` and the cast `ShadowUse`, I choose to propagate the debug location of `PH` to it, because the cast is eliminated according to the optimization semantics.	2024-07-15 09:44:18 +08:00
Alex Bradbury	f8dbe1d09d	Revert "[RISCV] Enable TTI::shouldDropLSRSolutionIfLessProfitable by default" (#98328 ) Reverts llvm/llvm-project#89927 while we investigate performance regressions reported by @dtcxzyw	2024-07-10 15:33:20 +01:00
Alex Bradbury	af47a4ec50	[RISCV] Enable TTI::shouldDropLSRSolutionIfLessProfitable by default (#89927 ) This avoids some cases where LSR produces results that lead to very poor codegen. There's a chance we'll see minor degradations for some inputs in the case that our metrics say the found solution is worse, but in reality it's better than the starting point. Per the review thread, at least one vendor has been enabling this by defualt for some time and found overall it's an improvement. As such, we'll enable by default and aim to fix any as-yet-unknown regressions in-tree.	2024-07-10 13:23:31 +01:00
Luke Lau	db782b44b3	[RISCV] Don't forward AVL in VSETVLIInfo if it would clobber other definitions (#97264 ) This fixes a crash found when compiling OpenBLAS with -mllvm -verify-machineinstrs. When we "forward" the AVL from the output of a vsetvli, we might have to extend the LiveInterval of the AVL to where insert the new vsetvli. Most of the time we are able to extend the LiveInterval because there's only one val num (definition) for the register. But PHI elimination can assign multiple values to the same register, in which case we end up clobbering a different val num when extending: %x = PseudoVSETVLI %avl, ... %avl = ADDI ... %v = PseudoVADD ..., avl=%x ; %avl is forwarded to PseudoVADD: %x = PseudoVSETVLI %avl, ... %avl = ADDI ... %v = PseudoVADD ..., avl=%avl Here there's no way to extend the %avl from the vsetvli since %avl is redefined, i.e. we have two val nums. This fixes it by only forwarding it when we have exactly one val num, where it should be safe to extend it.	2024-07-05 11:44:59 +08:00
Graham Hunter	4311b14e9c	[LSR] Recognize vscale-relative immediates (#88124 ) Extends LoopStrengthReduce to recognize immediates multiplied by vscale, and query the current target for whether they are legal offsets for memory operations or adds.	2024-07-01 09:23:31 +01:00
Luke Lau	eb76bc38ff	[RISCV] Relax RISCVInsertVSETVLI output VL peeking to cover registers (#96200 ) If the AVL in a VSETVLIInfo is the output VL of a vsetvli with the same VLMAX, we treat it as the AVL of said vsetvli. This allows us to remove a true dependency as well as treating VSETVLIInfos as equal in more places and avoid toggles. We do this in two places, needVSETVLI and computeInfoForInstr. However we don't do this in computeInfoForInstr's vsetvli equivalent, getInfoForVSETVLI. We also have a restriction only in computeInfoForInstr that the AVL can't be a register as we want to avoid extending live ranges. This patch does two interlinked things: 1) It adds this AVL "peeking" to getInfoForVSETVLI 2) It relaxes the constraint that the AVL can't be a register in computeInfoForInstr, since it removes a use of the output VL which can actually reduce register pressure. E.g. see the diff in @vector_init_vsetvli_N and @test6 Now that getInfoForVSETVLI and computeInfoForInstr are consistent, we can remove the check in needVSETVLI. We also need to update how we update LiveIntervals in insertVSETVLI, as we can now end up needing to extend the LiveRange of the AVL across blocks.	2024-06-23 20:20:59 +08:00
Nikita Popov	76d3ab2cc3	[IR] Remove support for shl constant expressions (#96037 ) Remove support for shl constant expressions, as part of: https://discourse.llvm.org/t/rfc-remove-most-constant-expressions/63179	2024-06-20 08:59:29 +02:00
Philip Reames	cb76896d6e	[SCEVExpander] Recognize urem idiom during expansion (#96005 ) If we have a urem expression, emitting it as a urem is significantly better that letting the fully expansion kick in. We have the risk of a udiv or mul which could have previously been shared, but loosing that seems like a reasonable tradeoff for being able to round trip a urem w/o modification.	2024-06-19 08:40:04 -07:00
Stephen Tozer	094572701d	[RemoveDIs] Print IR with debug records by default (#91724 ) This patch makes the final major change of the RemoveDIs project, changing the default IR output from debug intrinsics to debug records. This is expected to break a large number of tests: every single one that tests for uses or declarations of debug intrinsics and does not explicitly disable writing records. If this patch has broken your downstream tests (or upstream tests on a configuration I wasn't able to run): 1. If you need to immediately unblock a build, pass `--write-experimental-debuginfo=false` to LLVM's option processing for all failing tests (remember to use `-mllvm` for clang/flang to forward arguments to LLVM). 2. For most test failures, the changes are trivial and mechanical, enough that they can be done by script; see the migration guide for a guide on how to do this: https://llvm.org/docs/RemoveDIsDebugInfo.html#test-updates 3. If any tests fail for reasons other than FileCheck check lines that need updating, such as assertion failures, that is most likely a real bug with this patch and should be reported as such. For more information, see the recent PSA: https://discourse.llvm.org/t/psa-ir-output-changing-from-debug-intrinsics-to-debug-records/79578	2024-06-14 15:07:27 +01:00
Graham Hunter	e16f2f5d24	[AArch64] Override isLSRCostLess, take number of instructions into account (#84189 ) Adds an AArch64-specific version of isLSRCostLess, changing the relative importance of the various terms from the formulae being evaluated. This has been split out from my vscale-aware LSR work, see the RFC for reference: https://discourse.llvm.org/t/rfc-vscale-aware-loopstrengthreduce/77131	2024-06-06 14:45:36 +01:00
Alex Bradbury	3ac6a646d4	[RISCV][test] Precommit LSR test that partially motivates #89927	2024-06-05 21:01:50 +02:00
Freddy Ye	4def1ce101	Reland "[X86] Remove knl/knm specific ISAs supports (#92883 )" (#93136 ) This reverts commit aa4069ea96e5eb62bc8c7895b9d920f129611b3a.	2024-05-24 13:46:34 +08:00
Freddy Ye	aa4069ea96	Revert "[X86] Remove knl/knm specific ISAs supports (#92883 )" (#93123 ) This reverts commit 282d2ab58f56c89510f810a43d4569824a90c538.	2024-05-23 10:25:23 +08:00
Freddy Ye	282d2ab58f	[X86] Remove knl/knm specific ISAs supports (#92883 ) Cont. patch after https://github.com/llvm/llvm-project/pull/75580	2024-05-23 09:46:44 +08:00
Nikita Popov	8e8d2595da	[ConstantFolding] Canonicalize constexpr GEPs to i8 (#89872 ) This patch canonicalizes constant expression GEPs to use i8 source element type, aka ptradd. This is the ConstantFolding equivalent of the InstCombine canonicalization introduced in #68882. I believe all our optimizations working on constant expression GEPs (like GlobalOpt etc) have already been switched to work on offsets, so I don't expect any significant fallout from this change. This is part of: https://discourse.llvm.org/t/rfc-replacing-getelementptr-with-ptradd/68699	2024-05-20 11:47:30 +02:00
Philip Reames	baca93fc83	[LSR] Tweak debug output to always print initial cost	2024-05-14 13:34:20 -07:00
Fangrui Song	b9ae06ba15	[test] Convert text files from CRLF to LF Skip .pdb, .rc, crlf, and FileCheck/dos-style-eol.txt	2024-05-03 10:09:52 -07:00
Nikita Popov	beba307c5b	[LSR] Clear SCEVExpander before deleting phi nodes Fixes https://github.com/llvm/llvm-project/issues/84709.	2024-03-12 16:24:10 +01:00
Nikita Popov	6409c21857	[SCEVExpander] Use PoisoningVH for OrigFlags It's common to delete some instructions after using SCEVExpander, while it is still live (but will not be used afterwards). In that case, the AssertingVH may trigger. Replace it with a PoisoningVH so that we only detect the case where the SCEVExpander actually is used in a problematic fashion after the instruction removal. The alternative would be to add clear() calls to more code paths. Fixes https://github.com/llvm/llvm-project/issues/83404.	2024-03-05 16:41:52 +01:00
Patrick O'Neill	8d6e867eb2	[LSR][term-fold] Ensure the simple recurrence is from the current loop (#83085 ) If the phi node found by matchSimpleRecurrence is not from the current loop, then isAlmostDeadIV panics. With this patch we bail out early. Signed-off-by: Patrick O'Neill <patrick@rivosinc.com> --------- Signed-off-by: Patrick O'Neill <patrick@rivosinc.com>	2024-03-04 16:40:40 -08:00
Nikita Popov	72763521c3	[LSR] Clear SCEVExpander before calling DeleteDeadPHIs To avoid an assertion failure when an AssertingVH is removed, as reported in: https://github.com/llvm/llvm-project/pull/82362#issuecomment-1960067147 Also remove an unnecessary use of SCEVExpanderCleaner.	2024-02-22 22:50:26 +01:00
Nikita Popov	365bf43b8b	[LSR] Regenerate test checks (NFC) Check output IR instead of debug output. The debug output will change in an upcoming patch in an irrecoverable way.	2024-02-07 11:14:49 +01:00
Nikita Popov	2d69827c5c	[Transforms] Convert tests to opaque pointers (NFC)	2024-02-05 11:57:34 +01:00
Philip Reames	28865da374	[LSR][term-fold] Adjust expansion budget based on trip count (#80304 ) Follow up to https://github.com/llvm/llvm-project/pull/74747 This change extends the previously added fixed expansion threshold by scaling down the cost allowed for an expansion for a loop with either a small known trip count or a profile which indicates the trip count is likely small. The goal here is to improve code generation for a loop nest where the outer loop has a high trip count, and the inner loop runs only a handful of iterations. --------- Co-authored-by: Nikita Popov <github@npopov.com>	2024-02-02 08:57:26 -08:00
Philip Reames	390d66b03b	[LSR] Add tests for restricting term-fold budget based on exact trip count	2024-02-01 07:47:04 -08:00
Philip Reames	f264da4322	[lsr][term-fold] Restrict transform to low cost expansions (#74747 ) This is a follow up to an item I noted in my submission comment for e947f95. I don't have a real world example where this is triggering unprofitably, but avoiding the transform when we estimate the loop to be short running from profiling seems quite reasonable. It's also now come up as a possibility in a regression twice in two days, so I'd like to get this in to close out the possibility if nothing else. The original review dropped the threshold for short trip count loops. I will return to that in a separate review if this lands.	2024-01-31 14:48:20 -08:00
Philip Reames	5282202db0	[LSR] Add a test case mentioned in review As mentioned in https://github.com/llvm/llvm-project/pull/74747, this case is triggering a particularly high cost trip count expansion.	2024-01-31 12:32:50 -08:00
paperchalice	e390c229a4	[Pass] Add hyphen to some pass names (#74287 ) Here is the list of the renamed passes: - `callbrprepare` -> `callbr-prepare` - `dwarfehprepare` -> `dwarf-eh-prepare` - `flattencfg` -> `flatten-cfg` - `loweratomic` -> `lower-atomic` - `lowerinvoke` -> `lower-invoke` - `lowerswitch` -> `lower-switch` - `winehprepare` -> `win-eh-prepare` - `targetir` -> `target-ir` - `targetlibinfo` -> `target-lib-info` Legacy passes are not affected.	2024-01-25 16:05:54 +08:00
Stephen Tozer	7c53e9f667	[RemoveDIs][DebugInfo] Add support for DPValues to LoopStrengthReduce (#78706 ) This patch trivially extends support for DbgValueInst recovery to DPValues in LoopStrengthReduce; they are handled identically, so this is mostly done by reusing the DbgValueInst code (using templates or auto-parameter lambdas to reduce actual code duplication).	2024-01-22 18:59:19 +00:00
Fangrui Song	9e9907f1cf	[AMDGPU,test] Change llc -march= to -mtriple= (#75982 ) Similar to 806761a7629df268c8aed49657aeccffa6bca449. For IR files without a target triple, -mtriple= specifies the full target triple while -march= merely sets the architecture part of the default target triple, leaving a target triple which may not make sense, e.g. amdgpu-apple-darwin. Therefore, -march= is error-prone and not recommended for tests without a target triple. The issue has been benign as we recognize $unknown-apple-darwin as ELF instead of rejecting it outrightly. This patch changes AMDGPU tests to not rely on the default OS/environment components. Tests that need fixes are not changed: ``` LLVM :: CodeGen/AMDGPU/fabs.f64.ll LLVM :: CodeGen/AMDGPU/fabs.ll LLVM :: CodeGen/AMDGPU/floor.ll LLVM :: CodeGen/AMDGPU/fneg-fabs.f64.ll LLVM :: CodeGen/AMDGPU/fneg-fabs.ll LLVM :: CodeGen/AMDGPU/r600-infinite-loop-bug-while-reorganizing-vector.ll LLVM :: CodeGen/AMDGPU/schedule-if-2.ll ```	2024-01-16 21:54:58 -08:00
Florian Hahn	ab33c0b96e	[LSR] Add test showing incorrectly adding nuw with #77827 . Extra test for https://github.com/llvm/llvm-project/pull/77827, where NUW gets added the AddRec due to the BTC being 0.	2024-01-15 21:31:28 +00:00
Philip Reames	e4d01bb227	[SCEV] Special case sext in isKnownNonZero (#77834 ) The existing logic in isKnownNonZero relies on unsigned ranges, which can be problematic when our range calculation is imprecise. Consider the following: %offset.nonzero = or i32 %offset, 1 --> %offset.nonzero U: [1,0) S: [1,0) %offset.i64 = sext i32 %offset.nonzero to i64 --> (sext i32 %offset.nonzero to i64) U: [-2147483648,2147483648) S: [-2147483648,2147483648) Note that the unsigned range for the sext does contain zero in this case despite the fact that it can never actually be zero. Instead, we can push the query down one level - relying on the fact that the sext is an invertible operation and that the result can only be zero if the input is. We could likely generalize this reasoning for other invertible operations, but special casing sext seems worthwhile.	2024-01-12 07:45:28 -08:00
Philip Reames	5ce067d592	Revert "[LSR][TTI][RISCV] Disable terminator folding for RISC-V." This reverts commit fdb87640ee2be63af9b0e0cd943cb13d79686a03, and thus re-enables terminator folding for RISCV. The reported miscompile has been fixed in f5dd70c58277d925710e5a7c25c86d7565cc3c6c.	2024-01-11 13:20:02 -08:00
Philip Reames	f5dd70c582	[LSR] Require non-zero step when considering wrap around for term folding (#77809 ) The term folding logic needs to prove that the induction variable does not cycle through the same set of values so that testing for the value of the IV on the exiting iteration is guaranteed to trigger only on that iteration. The prior code checked the no-self-wrap property on the IV, but this is insufficient as a zero step is trivially no-self-wrap per SCEV's definition but does repeat the same series of values. In the current form, this has the effect of basically disabling lsr's term-folding for all non-constant strides. This is still a net improvement as we've disabled term-folding entirely, so being able to enable it for constant strides is still a net improvement. As future work, there's two SCEV weakness worth investigating. First sext (or i32 %a, 1) to i64 does not return true for isKnownNonZero. This is because we check only the unsigned range in that query. We could either do query pushdown, or check the signed range as well. I tried the second locally and it has very broad impact - i.e. we have a bunch of missing optimizations here. Second, zext (or i32 %a, 1) to i64 as the increment to the IV in expensive_expand_short_tc causes the addrec to no longer be provably no-self-wrap. I didn't investigate this so it might be necessary, but the loop structure is such that I find this result surprising.	2024-01-11 10:07:17 -08:00

1 2 3 4 5 ...

649 Commits