llvm-project

Author	SHA1	Message	Date
Florian Hahn	e8908215de	[LSR] Support SCEVPtrToAddr in SCEVDbgValueBuilder. Allow SCEVPtrToAddr as cast in assertion in SCEVDbgValueBuilder. SCEVPtrToAddr is handled similarly to SCEVPtrToInt. Fixes a crash with debug info after bd40d1de9c9ee, which started to generate ptrtoaddr instead of ptrtoint expressions.	2026-02-07 14:02:45 +00:00
Nikita Popov	0809a9927d	[LSR] Add another test case for implicit truncation (NFC) From https://github.com/llvm/llvm-project/pull/171456#issuecomment-3675488354.	2025-12-19 16:34:17 +01:00
Nikita Popov	8c93254dfc	[LSR] Add test for implicit truncation on icmp immediate (NFC) Test case for: https://github.com/llvm/llvm-project/pull/171456#issuecomment-3670755137	2025-12-19 10:07:50 +01:00
John Brawn	ccd4e7b1ed	[LSR] Make OptimizeLoopTermCond able to handle some non-cmp conditions (#165590 ) Currently OptimizeLoopTermCond can only convert a cmp instruction to using a postincrement induction variable, which means it can't handle predicated loops where the termination condition comes from get_active_lane_mask. Relax this restriction so that we can handle any kind of instruction, though only if it's the instruction immediately before the branch (except for possibly an extractelement).	2025-12-03 15:28:46 +00:00
John Brawn	2ad71745cd	[LSR] Insert the transformed IV increment in the user block (#169515 ) Currently we try to hoist the transformed IV increment instruction to the header block to help with generation of postincrement instructions, but this only works if the user instruction is also in the header. We should instead be trying to insert it in the same block as the user.	2025-12-02 17:15:00 +00:00
Matt Arsenault	793ab6a80d	X86: Enable terminal rule (#165957 )	2025-11-10 12:34:00 -08:00
Sander de Smalen	7fe60a7685	[AArch64][SVE] Avoid movprfx by reusing register for _UNDEF pseudos. (#166926 ) For predicated SVE instructions where we know that the inactive lanes are undef, it is better to pick a destination register that is not unique. This avoids introducing a movprfx to copy a unique register to the destination operand, which would be needed to comply with the tied operand constraints. For example: ``` %src1 = COPY $z1 %src2 = COPY $z2 %dst = SDIV_ZPZZ_S_UNDEF %p, %src1, %src2 ``` Here it is beneficial to pick $z1 or $z2 as the destination register, because if it would have chosen a unique register (e.g. $z0) then the pseudo expand pass would need to insert a MOVPRFX to expand the operation into: ``` $z0 = SDIV_ZPZZ_S_UNDEF $p0, $z1, $z2 -> $z0 = MOVPRFX $z1 $z0 = SDIV_ZPmZ_S $p0, $z0, $z2 ``` By picking $z1 directly, we'd get: ``` $z1 = SDIV_ZPmZ_S, $p0 $z1, $z2 ```	2025-11-10 16:33:59 +00:00
John Brawn	53e7443e0c	[LSR] Don't count conditional loads/store as enabling pre/post-index (#159573 ) When a load/store is conditionally executed in a loop it isn't a candidate for pre/post-index addressing, as the increment of the address would only happen on those loop iterations where the load/store is executed. Detect this and only discount the AddRec cost when the load/store is unconditional.	2025-10-30 13:53:15 +00:00
paperchalice	249883d0c5	[test][Transforms] Remove unsafe-fp-math uses part 2 (NFC) (#164786 ) Post cleanup for #164534.	2025-10-23 20:31:31 +08:00
Florian Hahn	f78150d2d4	Reapply "[SCEV] Fold (C1 * A /u C2) -> A /u (C2 /u C1), if C2 > C1." (#158328 ) This reverts commit fd58f235f8c5bd40d98acfd8e7fb11d41de301c7. The recommit contains an extra check to make sure that D is a multiple of C2, if C2 > C1. This fixes the issue causing the revert fd58f235f8c. Tests have been added in 6a726e9a4d3d0. Original message: If C2 >u C1 and C1 >u 1, fold to A /u (C2 /u C1). Depends on https://github.com/llvm/llvm-project/pull/157555. Alive2 Proof: https://alive2.llvm.org/ce/z/BWvQYN PR: https://github.com/llvm/llvm-project/pull/157656	2025-09-17 14:28:30 +01:00
Florian Hahn	6a726e9a4d	[SCEV] Add more tests for MUL/UDIV folds. Add additional test coverage for https://github.com/llvm/llvm-project/pull/157656 covering cases where the dividend is not a multiple of the divisor, causing the revert of 70012fda631.	2025-09-17 13:05:49 +01:00
John Brawn	8fab81121e	[LSR] Add an addressing mode that considers all addressing modes (#158110 ) The way that loops strength reduction works is that the target has to upfront decide whether it wants its addressing to be preindex, postindex, or neither. This choice affects: * Which potential solutions we generate * Whether we consider a pre/post index load/store as costing an AddRec or not. None of these choices are a good fit for either AArch64 or ARM, where both preindex and postindex addressing are typically free: * If we pick None then we count pre/post index addressing as costing one addrec more than is correct so we don't pick them when we should. * If we pick PreIndexed or PostIndexed then we get the correct cost for that addressing type, but still get it wrong for the other and also exclude potential solutions using offset addressing that could have less cost. This patch adds an "all" addressing mode that causes all potential solutions to be generated and counts both pre and postindex as having AddRecCost of zero. Unfortuntely this reveals problems elsewhere in how we calculate the cost of things that need to be fixed before we can make use of it.	2025-09-16 11:46:54 +01:00
Reid Kleckner	fd58f235f8	Revert "[SCEV] Fold (C1 * A /u C2) -> A /u (C2 /u C1), if C2 > C1." (#158328 ) Reverts llvm/llvm-project#157656 There are multiple reports that this is causing miscompiles in the MSan test suite after bootstrapping and that this is causing miscompiles in rustc. Let's revert for now, and work to capture a reproducer next week.	2025-09-12 10:15:41 -07:00
Florian Hahn	70012fda63	[SCEV] Fold (C1 * A /u C2) -> A /u (C2 /u C1), if C2 > C1. (#157656 ) If C2 >u C1 and C1 >u 1, fold to A /u (C2 /u C1). Depends on https://github.com/llvm/llvm-project/pull/157555. Alive2 Proof: https://alive2.llvm.org/ce/z/BWvQYN PR: https://github.com/llvm/llvm-project/pull/157656	2025-09-11 08:08:47 +00:00
Florian Hahn	74ec38fad0	[SCEV] Fold (C * A /u C) -> A, if A is a multiple of C and C a pow-of-2. (#156730 ) Alive2 Proof: https://alive2.llvm.org/ce/z/JoHJE9 PR: https://github.com/llvm/llvm-project/pull/156730	2025-09-05 08:45:13 +00:00
Florian Hahn	7c10e57f54	[LSR] Move test from Analysis/ScalarEvolution to Transforms, regen. Move transform test to llvm/test/Transforms/LoopStrengthReduce/X86, clean up the names a bit and regenerate check lines.	2025-09-03 19:44:45 +01:00
Nikita Popov	c23b4fbdbb	[IR] Remove size argument from lifetime intrinsics (#150248 ) Now that #149310 has restricted lifetime intrinsics to only work on allocas, we can also drop the explicit size argument. Instead, the size is implied by the alloca. This removes the ability to only mark a prefix of an alloca alive/dead. We never used that capability, so we should remove the need to handle that possibility everywhere (though many key places, including stack coloring, did not actually respect this).	2025-08-08 11:09:34 +02:00
Paul Walker	ceb2b9c141	[LLVM][DAGCombiner] fold (shl (X * vscale(C0)), C1) -> (X * vscale(C0 << C1)). (#150651 )	2025-08-01 11:42:45 +01:00
Nikita Popov	5f531827a4	[LSR] Do not consider uses in lifetime intrinsics (#149492 ) We should ignore uses of pointers in lifetime intrinsics, as these are not actually materialized in the final code, so don't affect register pressure or anything else LSR needs to model. Handling these only results in peculiar rewrites where additional intermediate GEPs are introduced.	2025-07-18 16:13:00 +02:00
Shan Huang	089106fdfb	[DebugInfo][LoopStrengthReduce] Salvage the debug value of the dead cmp instruction (#147241 ) Fix #147238	2025-07-14 09:45:37 +08:00
Nikita Popov	a9c6f1ed4f	Revert "[LSR] Regenerate test checks (NFC)" This reverts commit a60405c2d51ffe428dd38b8b1020c19189968f76. Causes buildbot failures.	2025-07-11 14:11:42 +02:00
Nikita Popov	a60405c2d5	[LSR] Regenerate test checks (NFC)	2025-07-11 13:14:02 +02:00
Sergey Shcherbinin	8472eb1361	[AArch64LoadStoreOpt] BaseReg update is searched also in CF successor (#145583 ) Look for reg update instruction (to merge w/ mem instruction into pre/post-increment form) not only inside a single MBB but also along a CF path going downward w/o side enters such that BaseReg is alive along it but not at its exits. Regression test is updated accordingly.	2025-07-11 11:06:11 +01:00
Guy David	76274eb2b3	[PHIElimination] Revert #131837 #146320 #146337 (#146850 ) Reverting because mis-compiles: - https://github.com/llvm/llvm-project/pull/131837 - https://github.com/llvm/llvm-project/pull/146320 - https://github.com/llvm/llvm-project/pull/146337	2025-07-03 07:48:08 -04:00
Guy David	f5c62ee0fa	[PHIElimination] Reuse existing COPY in predecessor basic block (#131837 ) The insertion point of COPY isn't always optimal and could eventually lead to a worse block layout, see the regression test in the first commit. This change affects many architectures but the amount of total instructions in the test cases seems too be slightly lower.	2025-06-29 21:28:42 +03:00
John Brawn	a54712c8ec	[LSR] Make canHoistIVInc allow non-integer types (#143707 ) canHoistIVInc was made to only allow integer types to avoid a crash in isIndexedLoadLegal/isIndexedStoreLegal due to them failing an assertion in getValueType (or rather in MVT::getVT which gets called from that) when passed a struct type. Adjusting these functions to pass AllowUnknown=true to getValueType means we don't get an assertion failure (MVT::Other is returned which TLI->isIndexedLoadLegal should then return false for), meaning we can remove this check for integer type.	2025-06-16 15:23:40 +01:00
Ruiling, Song	0487db1f13	MachineScheduler: Improve instruction clustering (#137784 ) The existing way of managing clustered nodes was done through adding weak edges between the neighbouring cluster nodes, which is a sort of ordered queue. And this will be later recorded as `NextClusterPred` or `NextClusterSucc` in `ScheduleDAGMI`. But actually the instruction may be picked not in the exact order of the queue. For example, we have a queue of cluster nodes A B C. But during scheduling, node B might be picked first, then it will be very likely that we only cluster B and C for Top-Down scheduling (leaving A alone). Another issue is: ``` if (!ReorderWhileClustering && SUa->NodeNum > SUb->NodeNum) std::swap(SUa, SUb); if (!DAG->addEdge(SUb, SDep(SUa, SDep::Cluster))) ``` may break the cluster queue. For example, we want to cluster nodes (order as in `MemOpRecords`): 1 3 2. 1(SUa) will be pred of 3(SUb) normally. But when it comes to (3, 2), As 3(SUa) > 2(SUb), we would reorder the two nodes, which makes 2 be pred of 3. This makes both 1 and 2 become preds of 3, but there is no edge between 1 and 2. Thus we get a broken cluster chain. To fix both issues, we introduce an unordered set in the change. This could help improve clustering in some hard case. One key reason the change causes so many test check changes is: As the cluster candidates are not ordered now, the candidates might be picked in different order from before. The most affected targets are: AMDGPU, AArch64, RISCV. For RISCV, it seems to me most are just minor instruction reorder, don't see obvious regression. For AArch64, there were some combining of ldr into ldp being affected. With two cases being regressed and two being improved. This has more deeper reason that machine scheduler cannot cluster them well both before and after the change, and the load combine algorithm later is also not smart enough. For AMDGPU, some cases have more v_dual instructions used while some are regressed. It seems less critical. Seems like test `v_vselect_v32bf16` gets more buffer_load being claused.	2025-06-05 15:28:04 +08:00
Alexander Richardson	ee13638362	[AMDGPU] Remove explicit datalayout from tests where not needed Since e39f6c1844fab59c638d8059a6cf139adb42279a opt will infer the correct datalayout when given a triple. Avoid explicitly specifying it in tests that depend on the AMDGPU target being present to avoid the string becoming out of sync with the TargetInfo value. Only tests with REQUIRES: amdgpu-registered-target or a local lit.cfg were updated to ensure that tests for non-target-specific passes that happen to use the AMDGPU layout still pass when building with a limited set of targets. Reviewed By: shiltian, arsenm Pull Request: https://github.com/llvm/llvm-project/pull/137921	2025-04-30 10:58:17 -07:00
Ricardo Jesus	34f9ddf1ce	[AArch64][SVE] Fold ADD+CNTB to INCB/DECB (#118280 ) Currently, given: ```cpp uint64_t incb(uint64_t x) { return x+svcntb(); } ``` LLVM generates: ```gas incb: addvl x0, x0, #1 ret ``` Which is equivalent to: ```gas incb: incb x0 ret ``` However, on microarchitectures like the Neoverse V2 and Neoverse V3, the second form (with INCB) can have significantly better latency and throughput (according to their SWOG). On the Neoverse V2, for example, ADDVL has a latency and throughput of 2, whereas some forms of INCB have a latency of 1 and a throughput of 4. The same applies to DECB. This patch adds patterns to prefer the cheaper INCB/DECB forms over ADDVL where applicable.	2025-04-17 08:41:17 +01:00
YunQiang Su	1c7ab39f3d	MIPS: Set EnableLoopTermFold (#133454 ) Setting `EnableLoopTermFold` enables `loop-term-fold` pass.	2025-03-29 21:51:59 +08:00
Weaver	af150272cf	Revert "MIPS: Set EnableLoopTermFold (#133378 )" This reverts commit 71f43a7c42a37d18be98b0885d62ef76e658f242. Caused build bot failures: https://lab.llvm.org/buildbot/#/builders/190/builds/17267 https://lab.llvm.org/buildbot/#/builders/144/builds/21445 https://lab.llvm.org/buildbot/#/builders/46/builds/14343 please consider fixing the test failure in long-array-initialize.ll before recommitting.	2025-03-28 11:31:56 +00:00
YunQiang Su	71f43a7c42	MIPS: Set EnableLoopTermFold (#133378 ) Setting `EnableLoopTermFold` enables `loop-term-fold` pass.	2025-03-28 16:49:45 +08:00
YunQiang Su	6dd14c881c	MIPS: Implements MipsTTIImpl::isLSRCostLess using Insns as first (#133068 ) So that LoopStrengthReduce can work for MIPS. The code is copied from RISC-V. --------- Co-authored-by: qethu <190734095+qethu@users.noreply.github.com>	2025-03-26 23:01:26 +08:00
Jeremy Morse	792a6f8119	[RemoveDIs] Remove "try-debuginfo-iterators..." test flags (#130298 ) These date back to when the non-intrinsic format of variable locations was still being tested and was behind a compile-time flag, so not all builds / bots would correctly run them. The solution at the time, to get at least some test coverage, was to have tests opt-in to non-intrinsic debug-info if it was built into LLVM. Nowadays, non-intrinsic format is the default and has been on for more than a year, there's no need for this flag to exist. (I've downgraded the flag from "try" to explicitly requesting non-intrinsic format in some places, so that we can deal with tests that are explicitly about non-intrinsic format in their own commit).	2025-03-14 15:50:49 +00:00
Ricardo Jesus	15fbdc2b96	[AArch64][SVE] Lower unpredicated loads/stores as LDR/STR. (#127837 ) Currently, given: ```cpp svuint8_t foo(uint8_t *x) { return svld1(svptrue_b8(), x); } ``` We generate: ```gas foo: ptrue p0.b ld1b { z0.b }, p0/z, [x0] ret ``` However, on little-endian and with unaligned memory accesses allowed, we could instead be using LDR as follows: ```gas foo: ldr z0, [x0] ret ``` The second form avoids the predicate dependency. Likewise for other types and stores.	2025-02-26 13:56:35 +00:00
Sergei Barannikov	ff9c041d96	[MachineScheduler] Fix physreg dependencies of ExitSU (#123541 ) Providing the correct operand index allows addPhysRegDataDeps to compute the correct latency. Pull Request: https://github.com/llvm/llvm-project/pull/123541	2025-02-01 20:40:50 +03:00
Nikita Popov	29441e4f5f	[IR] Convert from nocapture to captures(none) (#123181 ) This PR removes the old `nocapture` attribute, replacing it with the new `captures` attribute introduced in #116990. This change is intended to be essentially NFC, replacing existing uses of `nocapture` with `captures(none)` without adding any new analysis capabilities. Making use of non-`none` values is left for a followup. Some notes: * `nocapture` will be upgraded to `captures(none)` by the bitcode reader. * `nocapture` will also be upgraded by the textual IR reader. This is to make it easier to use old IR files and somewhat reduce the test churn in this PR. * Helper APIs like `doesNotCapture()` will check for `captures(none)`. * MLIR import will convert `captures(none)` into an `llvm.nocapture` attribute. The representation in the LLVM IR dialect should be updated separately.	2025-01-29 16:56:47 +01:00
Alex MacLean	4583f6d344	[NVPTX] Switch front-ends and tests to ptx_kernel cc (#120806 ) the `ptx_kernel` calling convention is a more idiomatic and standard way of specifying a NVPTX kernel than using the metadata which is not supposed to change the meaning of the program. Further, checking the calling convention is significantly faster than traversing the metadata, improving compile time. This change updates the clang and mlir frontends as well as the NVPTXCtorDtorLowering pass to emit kernels using the calling convention. In addition, this updates all NVPTX unit tests to use the calling convention as well.	2025-01-07 18:24:50 -08:00
Fangrui Song	9efa7d7af3	Remove -print-lsr-output in favor of --stop-after=loop-reduce Pull Request: https://github.com/llvm/llvm-project/pull/121305	2024-12-29 18:58:30 -08:00
Lee Wei	abb9f9fa06	[llvm] Remove `br i1 undef` from some regression tests [NFC] (#117112 ) This PR removes tests with `br i1 undef` under `llvm/tests/Transforms/Loop, Lower`.	2024-11-21 08:06:56 +00:00
Nikita Popov	6dc23b7009	[SCEVExpander] Don't try to reuse SCEVUnknown values (#115141 ) The expansion of a SCEVUnknown is trivial (it's just the wrapped value). If we try to reuse an existing value it might be a more complex expression that simplifies to the SCEVUnknown. This is inspired by https://github.com/llvm/llvm-project/issues/114879, because SCEVExpander replacing a constant with a phi node is just silly. (I don't consider this a fix for that issue though.)	2024-11-11 12:36:29 +01:00
Yingwei Zheng	0b9f1cc024	[SCEV] Disallow simplifying phi(undef, X) to X (#115109 ) See the following case: ``` @GlobIntONE = global i32 0, align 4 define ptr @src() { entry: br label %for.body.peel.begin for.body.peel.begin: ; preds = %entry br label %for.body.peel for.body.peel: ; preds = %for.body.peel.begin br i1 true, label %cleanup.peel, label %cleanup.loopexit.peel cleanup.loopexit.peel: ; preds = %for.body.peel br label %cleanup.peel cleanup.peel: ; preds = %cleanup.loopexit.peel, %for.body.peel %retval.2.peel = phi ptr [ undef, %for.body.peel ], [ @GlobIntONE, %cleanup.loopexit.peel ] br i1 true, label %for.body.peel.next, label %cleanup7 for.body.peel.next: ; preds = %cleanup.peel br label %for.body.peel.next1 for.body.peel.next1: ; preds = %for.body.peel.next br label %entry.peel.newph entry.peel.newph: ; preds = %for.body.peel.next1 br label %for.body for.body: ; preds = %cleanup, %entry.peel.newph %retval.0 = phi ptr [ %retval.2.peel, %entry.peel.newph ], [ %retval.2, %cleanup ] br i1 false, label %cleanup, label %cleanup.loopexit cleanup.loopexit: ; preds = %for.body br label %cleanup cleanup: ; preds = %cleanup.loopexit, %for.body %retval.2 = phi ptr [ %retval.0, %for.body ], [ @GlobIntONE, %cleanup.loopexit ] br i1 false, label %for.body, label %cleanup7.loopexit cleanup7.loopexit: ; preds = %cleanup %retval.2.lcssa.ph = phi ptr [ %retval.2, %cleanup ] br label %cleanup7 cleanup7: ; preds = %cleanup7.loopexit, %cleanup.peel %retval.2.lcssa = phi ptr [ %retval.2.peel, %cleanup.peel ], [ %retval.2.lcssa.ph, %cleanup7.loopexit ] ret ptr %retval.2.lcssa } define ptr @tgt() { entry: br label %for.body.peel.begin for.body.peel.begin: ; preds = %entry br label %for.body.peel for.body.peel: ; preds = %for.body.peel.begin br i1 true, label %cleanup.peel, label %cleanup.loopexit.peel cleanup.loopexit.peel: ; preds = %for.body.peel br label %cleanup.peel cleanup.peel: ; preds = %cleanup.loopexit.peel, %for.body.peel %retval.2.peel = phi ptr [ undef, %for.body.peel ], [ @GlobIntONE, %cleanup.loopexit.peel ] br i1 true, label %for.body.peel.next, label %cleanup7 for.body.peel.next: ; preds = %cleanup.peel br label %for.body.peel.next1 for.body.peel.next1: ; preds = %for.body.peel.next br label %entry.peel.newph entry.peel.newph: ; preds = %for.body.peel.next1 br label %for.body for.body: ; preds = %cleanup, %entry.peel.newph br i1 false, label %cleanup, label %cleanup.loopexit cleanup.loopexit: ; preds = %for.body br label %cleanup cleanup: ; preds = %cleanup.loopexit, %for.body br i1 false, label %for.body, label %cleanup7.loopexit cleanup7.loopexit: ; preds = %cleanup %retval.2.lcssa.ph = phi ptr [ %retval.2.peel, %cleanup ] br label %cleanup7 cleanup7: ; preds = %cleanup7.loopexit, %cleanup.peel %retval.2.lcssa = phi ptr [ %retval.2.peel, %cleanup.peel ], [ %retval.2.lcssa.ph, %cleanup7.loopexit ] ret ptr %retval.2.lcssa } ``` 1. `simplifyInstruction(%retval.2.peel)` returns `@GlobIntONE`. Thus, `ScalarEvolution::createNodeForPHI` returns SCEV expr `@GlobIntONE` for `%retval.2.peel`. 2. `SimplifyIndvar::replaceIVUserWithLoopInvariant` tries to replace the use of `%retval.2.peel` in `%retval.2.lcssa.ph` with `@GlobIntONE`. 3. `simplifyLoopAfterUnroll -> simplifyLoopIVs -> SCEVExpander::expand` reuses `%retval.2.peel = phi ptr [ undef, %for.body.peel ], [ @GlobIntONE, %cleanup.loopexit.peel ]` to generate code for `@GlobIntONE`. It is incorrect. This patch disallows simplifying `phi(undef, X)` to `X` by setting `CanUseUndef` to false. Closes https://github.com/llvm/llvm-project/issues/114879.	2024-11-07 15:53:51 +08:00
Youngsuk Kim	caa32e6d6f	[llvm][LSR] Fix where invariant on ScaledReg & Scale is violated (#112576 ) Comments attached to the `ScaledReg` field of `struct Formula` explains that, `ScaledReg` must be non-null when `Scale` is non-zero. This fixes up a code path where this invariant is violated. Also, add an assert to ensure this invariant holds true. Without this patch, compiler aborts with the attached test case. Fixes #76504	2024-10-17 10:47:44 -04:00
Orlando Cazalet-Hyams	7506872afc	[DebugInfo][LSR] Fix assertion failure salvaging IV with offset > 64 bits wide (#110979 ) Fixes #110494	2024-10-03 11:47:08 +01:00
Jeremy Morse	e6bf48d110	[X86] Don't request 0x90 nop filling in p2align directives (#110134 ) As of rev ea222be0d, LLVMs assembler will actually try to honour the "fill value" part of p2align directives. X86 printed these as 0x90, which isn't actually what it wanted: we want multi-byte nops for .text padding. Compiling via a textual assembly file produces single-byte nop padding since ea222be0d but the built-in assembler will produce multi-byte nops. This divergent behaviour is undesirable. To fix: don't set the byte padding field for x86, which allows the assembler to pick multi-byte nops. Test that we get the same multi-byte padding when compiled via textual assembly or directly to object file. Added same-align-bytes-with-llasm-llobj.ll to that effect, updated numerous other tests to not contain check-lines for the explicit padding.	2024-10-02 11:14:05 +01:00
Nikita Popov	9f3d1695eb	[SCEVExpander] Preserve gep nuw during expansion (#102133 ) When expanding SCEV adds to geps, transfer the nuw flag to the resulting gep. (Note that this doesn't apply to IV increment GEPs, which go through a different code path.)	2024-10-02 11:45:00 +02:00
Nikita Popov	4b3ba64ba7	[SCEVExpander] Clear flags when reusing GEP (#109293 ) As pointed out in the review of #102133, SCEVExpander currently incorrectly reuses GEP instructions that have poison-generating flags set. Fix this by clearing the flags on the reused instruction.	2024-10-01 14:22:54 +02:00
Nikita Popov	c4744e49f6	[LSR] Regenerate test checks (NFC)	2024-09-19 16:34:10 +02:00
Sergey Kachkov	1f2a634c44	Reland "[LSR] Do not create duplicated PHI nodes while preserving LCSSA form" (#107380 ) Motivating example: https://godbolt.org/z/eb97zrxhx Here we have 2 induction variables in the loop: one is corresponding to i variable (add rdx, 4), the other - to res (add rax, 2). The second induction variable can be removed by rewriteLoopExitValues() method (final value of res at loop exit is unroll_iter * -2); however, this doesn't happen because we have duplicated LCSSA phi nodes at loop exit: ``` ; Preheader: for.body.preheader.new: ; preds = %for.body.preheader %unroll_iter = and i64 %N, -4 br label %for.body ; Loop: for.body: ; preds = %for.body, %for.body.preheader.new %lsr.iv = phi i64 [ %lsr.iv.next, %for.body ], [ 0, %for.body.preheader.new ] %i.07 = phi i64 [ 0, %for.body.preheader.new ], [ %inc.3, %for.body ] %inc.3 = add nuw i64 %i.07, 4 %lsr.iv.next = add nsw i64 %lsr.iv, -2 %niter.ncmp.3.not = icmp eq i64 %unroll_iter, %inc.3 br i1 %niter.ncmp.3.not, label %for.end.loopexit.unr-lcssa.loopexit, label %for.body, !llvm.loop !7 ; Exit blocks for.end.loopexit.unr-lcssa.loopexit: ; preds = %for.body %inc.3.lcssa = phi i64 [ %inc.3, %for.body ] %lsr.iv.next.lcssa11 = phi i64 [ %lsr.iv.next, %for.body ] %lsr.iv.next.lcssa = phi i64 [ %lsr.iv.next, %for.body ] br label %for.end.loopexit.unr-lcssa ``` rewriteLoopExitValues requires %lsr.iv.next value to have only 2 uses: one in LCSSA phi node, the other - in induction phi node. Here we have 3 uses of this value because of duplicated lcssa nodes, so the transform doesn't apply and leads to an extra add operation inside the loop. The proposed solution is to accumulate inserted instructions that will require LCSSA form update into SetVector and then call formLCSSAForInstructions for this SetVector once, so the same instructions don't process twice. Reland fixes the issue with preserve-lcssa.ll test: it fails in the situation when x86_64-unknown-linux-gnu target is unavailable in opt. The changes are moved into separate duplicated-phis.ll test with explicit x86 target requirement to fix bots which are not building this target.	2024-09-09 16:14:51 +03:00
Sergey Kachkov	17f0c5dfaa	[LSR][NFC] Add pre-commit test	2024-09-09 16:14:50 +03:00

1 2 3 4 5 ...

689 Commits