llvm-project

Author	SHA1	Message	Date
Nikita Popov	c23b4fbdbb	[IR] Remove size argument from lifetime intrinsics (#150248 ) Now that #149310 has restricted lifetime intrinsics to only work on allocas, we can also drop the explicit size argument. Instead, the size is implied by the alloca. This removes the ability to only mark a prefix of an alloca alive/dead. We never used that capability, so we should remove the need to handle that possibility everywhere (though many key places, including stack coloring, did not actually respect this).	2025-08-08 11:09:34 +02:00
Paul Walker	ceb2b9c141	[LLVM][DAGCombiner] fold (shl (X * vscale(C0)), C1) -> (X * vscale(C0 << C1)). (#150651 )	2025-08-01 11:42:45 +01:00
Nikita Popov	5f531827a4	[LSR] Do not consider uses in lifetime intrinsics (#149492 ) We should ignore uses of pointers in lifetime intrinsics, as these are not actually materialized in the final code, so don't affect register pressure or anything else LSR needs to model. Handling these only results in peculiar rewrites where additional intermediate GEPs are introduced.	2025-07-18 16:13:00 +02:00
Shan Huang	089106fdfb	[DebugInfo][LoopStrengthReduce] Salvage the debug value of the dead cmp instruction (#147241 ) Fix #147238	2025-07-14 09:45:37 +08:00
Nikita Popov	a9c6f1ed4f	Revert "[LSR] Regenerate test checks (NFC)" This reverts commit a60405c2d51ffe428dd38b8b1020c19189968f76. Causes buildbot failures.	2025-07-11 14:11:42 +02:00
Nikita Popov	a60405c2d5	[LSR] Regenerate test checks (NFC)	2025-07-11 13:14:02 +02:00
Sergey Shcherbinin	8472eb1361	[AArch64LoadStoreOpt] BaseReg update is searched also in CF successor (#145583 ) Look for reg update instruction (to merge w/ mem instruction into pre/post-increment form) not only inside a single MBB but also along a CF path going downward w/o side enters such that BaseReg is alive along it but not at its exits. Regression test is updated accordingly.	2025-07-11 11:06:11 +01:00
Guy David	76274eb2b3	[PHIElimination] Revert #131837 #146320 #146337 (#146850 ) Reverting because mis-compiles: - https://github.com/llvm/llvm-project/pull/131837 - https://github.com/llvm/llvm-project/pull/146320 - https://github.com/llvm/llvm-project/pull/146337	2025-07-03 07:48:08 -04:00
Guy David	f5c62ee0fa	[PHIElimination] Reuse existing COPY in predecessor basic block (#131837 ) The insertion point of COPY isn't always optimal and could eventually lead to a worse block layout, see the regression test in the first commit. This change affects many architectures but the amount of total instructions in the test cases seems too be slightly lower.	2025-06-29 21:28:42 +03:00
John Brawn	a54712c8ec	[LSR] Make canHoistIVInc allow non-integer types (#143707 ) canHoistIVInc was made to only allow integer types to avoid a crash in isIndexedLoadLegal/isIndexedStoreLegal due to them failing an assertion in getValueType (or rather in MVT::getVT which gets called from that) when passed a struct type. Adjusting these functions to pass AllowUnknown=true to getValueType means we don't get an assertion failure (MVT::Other is returned which TLI->isIndexedLoadLegal should then return false for), meaning we can remove this check for integer type.	2025-06-16 15:23:40 +01:00
Ruiling, Song	0487db1f13	MachineScheduler: Improve instruction clustering (#137784 ) The existing way of managing clustered nodes was done through adding weak edges between the neighbouring cluster nodes, which is a sort of ordered queue. And this will be later recorded as `NextClusterPred` or `NextClusterSucc` in `ScheduleDAGMI`. But actually the instruction may be picked not in the exact order of the queue. For example, we have a queue of cluster nodes A B C. But during scheduling, node B might be picked first, then it will be very likely that we only cluster B and C for Top-Down scheduling (leaving A alone). Another issue is: ``` if (!ReorderWhileClustering && SUa->NodeNum > SUb->NodeNum) std::swap(SUa, SUb); if (!DAG->addEdge(SUb, SDep(SUa, SDep::Cluster))) ``` may break the cluster queue. For example, we want to cluster nodes (order as in `MemOpRecords`): 1 3 2. 1(SUa) will be pred of 3(SUb) normally. But when it comes to (3, 2), As 3(SUa) > 2(SUb), we would reorder the two nodes, which makes 2 be pred of 3. This makes both 1 and 2 become preds of 3, but there is no edge between 1 and 2. Thus we get a broken cluster chain. To fix both issues, we introduce an unordered set in the change. This could help improve clustering in some hard case. One key reason the change causes so many test check changes is: As the cluster candidates are not ordered now, the candidates might be picked in different order from before. The most affected targets are: AMDGPU, AArch64, RISCV. For RISCV, it seems to me most are just minor instruction reorder, don't see obvious regression. For AArch64, there were some combining of ldr into ldp being affected. With two cases being regressed and two being improved. This has more deeper reason that machine scheduler cannot cluster them well both before and after the change, and the load combine algorithm later is also not smart enough. For AMDGPU, some cases have more v_dual instructions used while some are regressed. It seems less critical. Seems like test `v_vselect_v32bf16` gets more buffer_load being claused.	2025-06-05 15:28:04 +08:00
Alexander Richardson	ee13638362	[AMDGPU] Remove explicit datalayout from tests where not needed Since e39f6c1844fab59c638d8059a6cf139adb42279a opt will infer the correct datalayout when given a triple. Avoid explicitly specifying it in tests that depend on the AMDGPU target being present to avoid the string becoming out of sync with the TargetInfo value. Only tests with REQUIRES: amdgpu-registered-target or a local lit.cfg were updated to ensure that tests for non-target-specific passes that happen to use the AMDGPU layout still pass when building with a limited set of targets. Reviewed By: shiltian, arsenm Pull Request: https://github.com/llvm/llvm-project/pull/137921	2025-04-30 10:58:17 -07:00
Ricardo Jesus	34f9ddf1ce	[AArch64][SVE] Fold ADD+CNTB to INCB/DECB (#118280 ) Currently, given: ```cpp uint64_t incb(uint64_t x) { return x+svcntb(); } ``` LLVM generates: ```gas incb: addvl x0, x0, #1 ret ``` Which is equivalent to: ```gas incb: incb x0 ret ``` However, on microarchitectures like the Neoverse V2 and Neoverse V3, the second form (with INCB) can have significantly better latency and throughput (according to their SWOG). On the Neoverse V2, for example, ADDVL has a latency and throughput of 2, whereas some forms of INCB have a latency of 1 and a throughput of 4. The same applies to DECB. This patch adds patterns to prefer the cheaper INCB/DECB forms over ADDVL where applicable.	2025-04-17 08:41:17 +01:00
YunQiang Su	1c7ab39f3d	MIPS: Set EnableLoopTermFold (#133454 ) Setting `EnableLoopTermFold` enables `loop-term-fold` pass.	2025-03-29 21:51:59 +08:00
Weaver	af150272cf	Revert "MIPS: Set EnableLoopTermFold (#133378 )" This reverts commit 71f43a7c42a37d18be98b0885d62ef76e658f242. Caused build bot failures: https://lab.llvm.org/buildbot/#/builders/190/builds/17267 https://lab.llvm.org/buildbot/#/builders/144/builds/21445 https://lab.llvm.org/buildbot/#/builders/46/builds/14343 please consider fixing the test failure in long-array-initialize.ll before recommitting.	2025-03-28 11:31:56 +00:00
YunQiang Su	71f43a7c42	MIPS: Set EnableLoopTermFold (#133378 ) Setting `EnableLoopTermFold` enables `loop-term-fold` pass.	2025-03-28 16:49:45 +08:00
YunQiang Su	6dd14c881c	MIPS: Implements MipsTTIImpl::isLSRCostLess using Insns as first (#133068 ) So that LoopStrengthReduce can work for MIPS. The code is copied from RISC-V. --------- Co-authored-by: qethu <190734095+qethu@users.noreply.github.com>	2025-03-26 23:01:26 +08:00
Jeremy Morse	792a6f8119	[RemoveDIs] Remove "try-debuginfo-iterators..." test flags (#130298 ) These date back to when the non-intrinsic format of variable locations was still being tested and was behind a compile-time flag, so not all builds / bots would correctly run them. The solution at the time, to get at least some test coverage, was to have tests opt-in to non-intrinsic debug-info if it was built into LLVM. Nowadays, non-intrinsic format is the default and has been on for more than a year, there's no need for this flag to exist. (I've downgraded the flag from "try" to explicitly requesting non-intrinsic format in some places, so that we can deal with tests that are explicitly about non-intrinsic format in their own commit).	2025-03-14 15:50:49 +00:00
Ricardo Jesus	15fbdc2b96	[AArch64][SVE] Lower unpredicated loads/stores as LDR/STR. (#127837 ) Currently, given: ```cpp svuint8_t foo(uint8_t *x) { return svld1(svptrue_b8(), x); } ``` We generate: ```gas foo: ptrue p0.b ld1b { z0.b }, p0/z, [x0] ret ``` However, on little-endian and with unaligned memory accesses allowed, we could instead be using LDR as follows: ```gas foo: ldr z0, [x0] ret ``` The second form avoids the predicate dependency. Likewise for other types and stores.	2025-02-26 13:56:35 +00:00
Sergei Barannikov	ff9c041d96	[MachineScheduler] Fix physreg dependencies of ExitSU (#123541 ) Providing the correct operand index allows addPhysRegDataDeps to compute the correct latency. Pull Request: https://github.com/llvm/llvm-project/pull/123541	2025-02-01 20:40:50 +03:00
Nikita Popov	29441e4f5f	[IR] Convert from nocapture to captures(none) (#123181 ) This PR removes the old `nocapture` attribute, replacing it with the new `captures` attribute introduced in #116990. This change is intended to be essentially NFC, replacing existing uses of `nocapture` with `captures(none)` without adding any new analysis capabilities. Making use of non-`none` values is left for a followup. Some notes: * `nocapture` will be upgraded to `captures(none)` by the bitcode reader. * `nocapture` will also be upgraded by the textual IR reader. This is to make it easier to use old IR files and somewhat reduce the test churn in this PR. * Helper APIs like `doesNotCapture()` will check for `captures(none)`. * MLIR import will convert `captures(none)` into an `llvm.nocapture` attribute. The representation in the LLVM IR dialect should be updated separately.	2025-01-29 16:56:47 +01:00
Alex MacLean	4583f6d344	[NVPTX] Switch front-ends and tests to ptx_kernel cc (#120806 ) the `ptx_kernel` calling convention is a more idiomatic and standard way of specifying a NVPTX kernel than using the metadata which is not supposed to change the meaning of the program. Further, checking the calling convention is significantly faster than traversing the metadata, improving compile time. This change updates the clang and mlir frontends as well as the NVPTXCtorDtorLowering pass to emit kernels using the calling convention. In addition, this updates all NVPTX unit tests to use the calling convention as well.	2025-01-07 18:24:50 -08:00
Fangrui Song	9efa7d7af3	Remove -print-lsr-output in favor of --stop-after=loop-reduce Pull Request: https://github.com/llvm/llvm-project/pull/121305	2024-12-29 18:58:30 -08:00
Lee Wei	abb9f9fa06	[llvm] Remove `br i1 undef` from some regression tests [NFC] (#117112 ) This PR removes tests with `br i1 undef` under `llvm/tests/Transforms/Loop, Lower`.	2024-11-21 08:06:56 +00:00
Nikita Popov	6dc23b7009	[SCEVExpander] Don't try to reuse SCEVUnknown values (#115141 ) The expansion of a SCEVUnknown is trivial (it's just the wrapped value). If we try to reuse an existing value it might be a more complex expression that simplifies to the SCEVUnknown. This is inspired by https://github.com/llvm/llvm-project/issues/114879, because SCEVExpander replacing a constant with a phi node is just silly. (I don't consider this a fix for that issue though.)	2024-11-11 12:36:29 +01:00
Yingwei Zheng	0b9f1cc024	[SCEV] Disallow simplifying phi(undef, X) to X (#115109 ) See the following case: ``` @GlobIntONE = global i32 0, align 4 define ptr @src() { entry: br label %for.body.peel.begin for.body.peel.begin: ; preds = %entry br label %for.body.peel for.body.peel: ; preds = %for.body.peel.begin br i1 true, label %cleanup.peel, label %cleanup.loopexit.peel cleanup.loopexit.peel: ; preds = %for.body.peel br label %cleanup.peel cleanup.peel: ; preds = %cleanup.loopexit.peel, %for.body.peel %retval.2.peel = phi ptr [ undef, %for.body.peel ], [ @GlobIntONE, %cleanup.loopexit.peel ] br i1 true, label %for.body.peel.next, label %cleanup7 for.body.peel.next: ; preds = %cleanup.peel br label %for.body.peel.next1 for.body.peel.next1: ; preds = %for.body.peel.next br label %entry.peel.newph entry.peel.newph: ; preds = %for.body.peel.next1 br label %for.body for.body: ; preds = %cleanup, %entry.peel.newph %retval.0 = phi ptr [ %retval.2.peel, %entry.peel.newph ], [ %retval.2, %cleanup ] br i1 false, label %cleanup, label %cleanup.loopexit cleanup.loopexit: ; preds = %for.body br label %cleanup cleanup: ; preds = %cleanup.loopexit, %for.body %retval.2 = phi ptr [ %retval.0, %for.body ], [ @GlobIntONE, %cleanup.loopexit ] br i1 false, label %for.body, label %cleanup7.loopexit cleanup7.loopexit: ; preds = %cleanup %retval.2.lcssa.ph = phi ptr [ %retval.2, %cleanup ] br label %cleanup7 cleanup7: ; preds = %cleanup7.loopexit, %cleanup.peel %retval.2.lcssa = phi ptr [ %retval.2.peel, %cleanup.peel ], [ %retval.2.lcssa.ph, %cleanup7.loopexit ] ret ptr %retval.2.lcssa } define ptr @tgt() { entry: br label %for.body.peel.begin for.body.peel.begin: ; preds = %entry br label %for.body.peel for.body.peel: ; preds = %for.body.peel.begin br i1 true, label %cleanup.peel, label %cleanup.loopexit.peel cleanup.loopexit.peel: ; preds = %for.body.peel br label %cleanup.peel cleanup.peel: ; preds = %cleanup.loopexit.peel, %for.body.peel %retval.2.peel = phi ptr [ undef, %for.body.peel ], [ @GlobIntONE, %cleanup.loopexit.peel ] br i1 true, label %for.body.peel.next, label %cleanup7 for.body.peel.next: ; preds = %cleanup.peel br label %for.body.peel.next1 for.body.peel.next1: ; preds = %for.body.peel.next br label %entry.peel.newph entry.peel.newph: ; preds = %for.body.peel.next1 br label %for.body for.body: ; preds = %cleanup, %entry.peel.newph br i1 false, label %cleanup, label %cleanup.loopexit cleanup.loopexit: ; preds = %for.body br label %cleanup cleanup: ; preds = %cleanup.loopexit, %for.body br i1 false, label %for.body, label %cleanup7.loopexit cleanup7.loopexit: ; preds = %cleanup %retval.2.lcssa.ph = phi ptr [ %retval.2.peel, %cleanup ] br label %cleanup7 cleanup7: ; preds = %cleanup7.loopexit, %cleanup.peel %retval.2.lcssa = phi ptr [ %retval.2.peel, %cleanup.peel ], [ %retval.2.lcssa.ph, %cleanup7.loopexit ] ret ptr %retval.2.lcssa } ``` 1. `simplifyInstruction(%retval.2.peel)` returns `@GlobIntONE`. Thus, `ScalarEvolution::createNodeForPHI` returns SCEV expr `@GlobIntONE` for `%retval.2.peel`. 2. `SimplifyIndvar::replaceIVUserWithLoopInvariant` tries to replace the use of `%retval.2.peel` in `%retval.2.lcssa.ph` with `@GlobIntONE`. 3. `simplifyLoopAfterUnroll -> simplifyLoopIVs -> SCEVExpander::expand` reuses `%retval.2.peel = phi ptr [ undef, %for.body.peel ], [ @GlobIntONE, %cleanup.loopexit.peel ]` to generate code for `@GlobIntONE`. It is incorrect. This patch disallows simplifying `phi(undef, X)` to `X` by setting `CanUseUndef` to false. Closes https://github.com/llvm/llvm-project/issues/114879.	2024-11-07 15:53:51 +08:00
Youngsuk Kim	caa32e6d6f	[llvm][LSR] Fix where invariant on ScaledReg & Scale is violated (#112576 ) Comments attached to the `ScaledReg` field of `struct Formula` explains that, `ScaledReg` must be non-null when `Scale` is non-zero. This fixes up a code path where this invariant is violated. Also, add an assert to ensure this invariant holds true. Without this patch, compiler aborts with the attached test case. Fixes #76504	2024-10-17 10:47:44 -04:00
Orlando Cazalet-Hyams	7506872afc	[DebugInfo][LSR] Fix assertion failure salvaging IV with offset > 64 bits wide (#110979 ) Fixes #110494	2024-10-03 11:47:08 +01:00
Jeremy Morse	e6bf48d110	[X86] Don't request 0x90 nop filling in p2align directives (#110134 ) As of rev ea222be0d, LLVMs assembler will actually try to honour the "fill value" part of p2align directives. X86 printed these as 0x90, which isn't actually what it wanted: we want multi-byte nops for .text padding. Compiling via a textual assembly file produces single-byte nop padding since ea222be0d but the built-in assembler will produce multi-byte nops. This divergent behaviour is undesirable. To fix: don't set the byte padding field for x86, which allows the assembler to pick multi-byte nops. Test that we get the same multi-byte padding when compiled via textual assembly or directly to object file. Added same-align-bytes-with-llasm-llobj.ll to that effect, updated numerous other tests to not contain check-lines for the explicit padding.	2024-10-02 11:14:05 +01:00
Nikita Popov	9f3d1695eb	[SCEVExpander] Preserve gep nuw during expansion (#102133 ) When expanding SCEV adds to geps, transfer the nuw flag to the resulting gep. (Note that this doesn't apply to IV increment GEPs, which go through a different code path.)	2024-10-02 11:45:00 +02:00
Nikita Popov	4b3ba64ba7	[SCEVExpander] Clear flags when reusing GEP (#109293 ) As pointed out in the review of #102133, SCEVExpander currently incorrectly reuses GEP instructions that have poison-generating flags set. Fix this by clearing the flags on the reused instruction.	2024-10-01 14:22:54 +02:00
Nikita Popov	c4744e49f6	[LSR] Regenerate test checks (NFC)	2024-09-19 16:34:10 +02:00
Sergey Kachkov	1f2a634c44	Reland "[LSR] Do not create duplicated PHI nodes while preserving LCSSA form" (#107380 ) Motivating example: https://godbolt.org/z/eb97zrxhx Here we have 2 induction variables in the loop: one is corresponding to i variable (add rdx, 4), the other - to res (add rax, 2). The second induction variable can be removed by rewriteLoopExitValues() method (final value of res at loop exit is unroll_iter * -2); however, this doesn't happen because we have duplicated LCSSA phi nodes at loop exit: ``` ; Preheader: for.body.preheader.new: ; preds = %for.body.preheader %unroll_iter = and i64 %N, -4 br label %for.body ; Loop: for.body: ; preds = %for.body, %for.body.preheader.new %lsr.iv = phi i64 [ %lsr.iv.next, %for.body ], [ 0, %for.body.preheader.new ] %i.07 = phi i64 [ 0, %for.body.preheader.new ], [ %inc.3, %for.body ] %inc.3 = add nuw i64 %i.07, 4 %lsr.iv.next = add nsw i64 %lsr.iv, -2 %niter.ncmp.3.not = icmp eq i64 %unroll_iter, %inc.3 br i1 %niter.ncmp.3.not, label %for.end.loopexit.unr-lcssa.loopexit, label %for.body, !llvm.loop !7 ; Exit blocks for.end.loopexit.unr-lcssa.loopexit: ; preds = %for.body %inc.3.lcssa = phi i64 [ %inc.3, %for.body ] %lsr.iv.next.lcssa11 = phi i64 [ %lsr.iv.next, %for.body ] %lsr.iv.next.lcssa = phi i64 [ %lsr.iv.next, %for.body ] br label %for.end.loopexit.unr-lcssa ``` rewriteLoopExitValues requires %lsr.iv.next value to have only 2 uses: one in LCSSA phi node, the other - in induction phi node. Here we have 3 uses of this value because of duplicated lcssa nodes, so the transform doesn't apply and leads to an extra add operation inside the loop. The proposed solution is to accumulate inserted instructions that will require LCSSA form update into SetVector and then call formLCSSAForInstructions for this SetVector once, so the same instructions don't process twice. Reland fixes the issue with preserve-lcssa.ll test: it fails in the situation when x86_64-unknown-linux-gnu target is unavailable in opt. The changes are moved into separate duplicated-phis.ll test with explicit x86 target requirement to fix bots which are not building this target.	2024-09-09 16:14:51 +03:00
Sergey Kachkov	17f0c5dfaa	[LSR][NFC] Add pre-commit test	2024-09-09 16:14:50 +03:00
dyung	2bf551e600	Revert "[LSR] Do not create duplicated PHI nodes while preserving LCSSA form" (#107666 ) Reverts llvm/llvm-project#107380 Change is causing the test preserve-lcssa.ll to fail on at least 2 build bots: - https://lab.llvm.org/buildbot/#/builders/190/builds/5231 - https://lab.llvm.org/buildbot/#/builders/161/builds/1855	2024-09-06 19:54:26 -07:00
Sergey Kachkov	2cb4d1b1bd	[LSR] Do not create duplicated PHI nodes while preserving LCSSA form (#107380 ) Motivating example: https://godbolt.org/z/eb97zrxhx Here we have 2 induction variables in the loop: one is corresponding to i variable (add rdx, 4), the other - to res (add rax, 2). The second induction variable can be removed by rewriteLoopExitValues() method (final value of res at loop exit is unroll_iter * -2); however, this doesn't happen because we have duplicated LCSSA phi nodes at loop exit: ``` ; Preheader: for.body.preheader.new: ; preds = %for.body.preheader %unroll_iter = and i64 %N, -4 br label %for.body ; Loop: for.body: ; preds = %for.body, %for.body.preheader.new %lsr.iv = phi i64 [ %lsr.iv.next, %for.body ], [ 0, %for.body.preheader.new ] %i.07 = phi i64 [ 0, %for.body.preheader.new ], [ %inc.3, %for.body ] %inc.3 = add nuw i64 %i.07, 4 %lsr.iv.next = add nsw i64 %lsr.iv, -2 %niter.ncmp.3.not = icmp eq i64 %unroll_iter, %inc.3 br i1 %niter.ncmp.3.not, label %for.end.loopexit.unr-lcssa.loopexit, label %for.body, !llvm.loop !7 ; Exit blocks for.end.loopexit.unr-lcssa.loopexit: ; preds = %for.body %inc.3.lcssa = phi i64 [ %inc.3, %for.body ] %lsr.iv.next.lcssa11 = phi i64 [ %lsr.iv.next, %for.body ] %lsr.iv.next.lcssa = phi i64 [ %lsr.iv.next, %for.body ] br label %for.end.loopexit.unr-lcssa ``` rewriteLoopExitValues requires %lsr.iv.next value to have only 2 uses: one in LCSSA phi node, the other - in induction phi node. Here we have 3 uses of this value because of duplicated lcssa nodes, so the transform doesn't apply and leads to an extra add operation inside the loop. The proposed solution is to accumulate inserted instructions that will require LCSSA form update into SetVector and then call formLCSSAForInstructions for this SetVector once, so the same instructions don't process twice.	2024-09-06 18:39:47 +03:00
Philip Reames	27a62ec72a	[LSR] Split the -lsr-term-fold transformation into it's own pass (#104234 ) This transformation doesn't actually use any of the internal state of LSR and recomputes all information from SCEV. Splitting it out makes it easier to test. Note that long term I would like to write a version of this transform which is integrated with LSR's solver, but if that happens, we'll just delete the extra pass. Integration wise, I switched from using TTI to using a pass configuration variable. This seems slightly more idiomatic, and means we don't run the extra logic on any target other than RISCV.	2024-08-17 18:34:23 -07:00
Benjamin Maxwell	7fad04e94b	[LSR] Fix matching vscale immediates (#100080 ) Somewhat confusingly a `SCEVMulExpr` is a `SCEVNAryExpr`, so can have > 2 operands. Previously, the vscale immediate matching did not check the number of operands of the `SCEVMulExpr`, so would ignore any operands after the first two. This led to incorrect codegen (and results) for ArmSME in IREE (https://github.com/iree-org/iree), which sometimes addresses things that are a `vscale * vscale` multiple away. The test added with this change shows an example reduced from IREE. The second write should be offset from the first `16 * vscale * vscale` (* 4 bytes), however, previously LSR dropped the second vscale and instead offset the write by `#4, mul vl`, which is an offset of `16 * vscale` (* 4 bytes).	2024-07-24 10:06:34 +01:00
Benjamin Maxwell	c1b70fa5bf	Precommit vscale-fixups.ll test (NFC) Precommit test for #100080.	2024-07-23 10:03:14 +00:00
Luke Lau	557ef043af	[RISCV] Copy AVLs whose LiveIntervals aren't extendable in insertVSETVLI (#98342 ) Currently before forwarding an AVL we do a simple non-exhaustive check to see if its LiveInterval is extendable. But we also need to check for this when we're extending an AVL's LiveInterval via merging the VSETVLIInfos in transferBefore with equally zero AVLs. Rather than trying to conservatively prevent these cases, this inserts a copy of the AVL instead if we don't know we'll be able to extend it. This is likely to be more robust, and even if the extra copy is undesirable these cases should be rare in practice.	2024-07-15 13:18:12 +08:00
Shan Huang	d83d09facd	[DebugInfo][LoopStrengthReduce] Fix missing debug location updates (#97519 ) Fix #97510 . Note that, for the new phi instruction `NewPH`, which replaces the old phi `PH` and the cast `ShadowUse`, I choose to propagate the debug location of `PH` to it, because the cast is eliminated according to the optimization semantics.	2024-07-15 09:44:18 +08:00
Alex Bradbury	f8dbe1d09d	Revert "[RISCV] Enable TTI::shouldDropLSRSolutionIfLessProfitable by default" (#98328 ) Reverts llvm/llvm-project#89927 while we investigate performance regressions reported by @dtcxzyw	2024-07-10 15:33:20 +01:00
Alex Bradbury	af47a4ec50	[RISCV] Enable TTI::shouldDropLSRSolutionIfLessProfitable by default (#89927 ) This avoids some cases where LSR produces results that lead to very poor codegen. There's a chance we'll see minor degradations for some inputs in the case that our metrics say the found solution is worse, but in reality it's better than the starting point. Per the review thread, at least one vendor has been enabling this by defualt for some time and found overall it's an improvement. As such, we'll enable by default and aim to fix any as-yet-unknown regressions in-tree.	2024-07-10 13:23:31 +01:00
Luke Lau	db782b44b3	[RISCV] Don't forward AVL in VSETVLIInfo if it would clobber other definitions (#97264 ) This fixes a crash found when compiling OpenBLAS with -mllvm -verify-machineinstrs. When we "forward" the AVL from the output of a vsetvli, we might have to extend the LiveInterval of the AVL to where insert the new vsetvli. Most of the time we are able to extend the LiveInterval because there's only one val num (definition) for the register. But PHI elimination can assign multiple values to the same register, in which case we end up clobbering a different val num when extending: %x = PseudoVSETVLI %avl, ... %avl = ADDI ... %v = PseudoVADD ..., avl=%x ; %avl is forwarded to PseudoVADD: %x = PseudoVSETVLI %avl, ... %avl = ADDI ... %v = PseudoVADD ..., avl=%avl Here there's no way to extend the %avl from the vsetvli since %avl is redefined, i.e. we have two val nums. This fixes it by only forwarding it when we have exactly one val num, where it should be safe to extend it.	2024-07-05 11:44:59 +08:00
Graham Hunter	4311b14e9c	[LSR] Recognize vscale-relative immediates (#88124 ) Extends LoopStrengthReduce to recognize immediates multiplied by vscale, and query the current target for whether they are legal offsets for memory operations or adds.	2024-07-01 09:23:31 +01:00
Luke Lau	eb76bc38ff	[RISCV] Relax RISCVInsertVSETVLI output VL peeking to cover registers (#96200 ) If the AVL in a VSETVLIInfo is the output VL of a vsetvli with the same VLMAX, we treat it as the AVL of said vsetvli. This allows us to remove a true dependency as well as treating VSETVLIInfos as equal in more places and avoid toggles. We do this in two places, needVSETVLI and computeInfoForInstr. However we don't do this in computeInfoForInstr's vsetvli equivalent, getInfoForVSETVLI. We also have a restriction only in computeInfoForInstr that the AVL can't be a register as we want to avoid extending live ranges. This patch does two interlinked things: 1) It adds this AVL "peeking" to getInfoForVSETVLI 2) It relaxes the constraint that the AVL can't be a register in computeInfoForInstr, since it removes a use of the output VL which can actually reduce register pressure. E.g. see the diff in @vector_init_vsetvli_N and @test6 Now that getInfoForVSETVLI and computeInfoForInstr are consistent, we can remove the check in needVSETVLI. We also need to update how we update LiveIntervals in insertVSETVLI, as we can now end up needing to extend the LiveRange of the AVL across blocks.	2024-06-23 20:20:59 +08:00
Nikita Popov	76d3ab2cc3	[IR] Remove support for shl constant expressions (#96037 ) Remove support for shl constant expressions, as part of: https://discourse.llvm.org/t/rfc-remove-most-constant-expressions/63179	2024-06-20 08:59:29 +02:00
Philip Reames	cb76896d6e	[SCEVExpander] Recognize urem idiom during expansion (#96005 ) If we have a urem expression, emitting it as a urem is significantly better that letting the fully expansion kick in. We have the risk of a udiv or mul which could have previously been shared, but loosing that seems like a reasonable tradeoff for being able to round trip a urem w/o modification.	2024-06-19 08:40:04 -07:00
Stephen Tozer	094572701d	[RemoveDIs] Print IR with debug records by default (#91724 ) This patch makes the final major change of the RemoveDIs project, changing the default IR output from debug intrinsics to debug records. This is expected to break a large number of tests: every single one that tests for uses or declarations of debug intrinsics and does not explicitly disable writing records. If this patch has broken your downstream tests (or upstream tests on a configuration I wasn't able to run): 1. If you need to immediately unblock a build, pass `--write-experimental-debuginfo=false` to LLVM's option processing for all failing tests (remember to use `-mllvm` for clang/flang to forward arguments to LLVM). 2. For most test failures, the changes are trivial and mechanical, enough that they can be done by script; see the migration guide for a guide on how to do this: https://llvm.org/docs/RemoveDIsDebugInfo.html#test-updates 3. If any tests fail for reasons other than FileCheck check lines that need updating, such as assertion failures, that is most likely a real bug with this patch and should be reported as such. For more information, see the recent PSA: https://discourse.llvm.org/t/psa-ir-output-changing-from-debug-intrinsics-to-debug-records/79578	2024-06-14 15:07:27 +01:00
Graham Hunter	e16f2f5d24	[AArch64] Override isLSRCostLess, take number of instructions into account (#84189 ) Adds an AArch64-specific version of isLSRCostLess, changing the relative importance of the various terms from the formulae being evaluated. This has been split out from my vscale-aware LSR work, see the RFC for reference: https://discourse.llvm.org/t/rfc-vscale-aware-loopstrengthreduce/77131	2024-06-06 14:45:36 +01:00

1 2 3 4 5 ...

673 Commits