llvm-project

Author	SHA1	Message	Date
Nikita Popov	beba307c5b	[LSR] Clear SCEVExpander before deleting phi nodes Fixes https://github.com/llvm/llvm-project/issues/84709.	2024-03-12 16:24:10 +01:00
Nikita Popov	6409c21857	[SCEVExpander] Use PoisoningVH for OrigFlags It's common to delete some instructions after using SCEVExpander, while it is still live (but will not be used afterwards). In that case, the AssertingVH may trigger. Replace it with a PoisoningVH so that we only detect the case where the SCEVExpander actually is used in a problematic fashion after the instruction removal. The alternative would be to add clear() calls to more code paths. Fixes https://github.com/llvm/llvm-project/issues/83404.	2024-03-05 16:41:52 +01:00
Patrick O'Neill	8d6e867eb2	[LSR][term-fold] Ensure the simple recurrence is from the current loop (#83085 ) If the phi node found by matchSimpleRecurrence is not from the current loop, then isAlmostDeadIV panics. With this patch we bail out early. Signed-off-by: Patrick O'Neill <patrick@rivosinc.com> --------- Signed-off-by: Patrick O'Neill <patrick@rivosinc.com>	2024-03-04 16:40:40 -08:00
Nikita Popov	72763521c3	[LSR] Clear SCEVExpander before calling DeleteDeadPHIs To avoid an assertion failure when an AssertingVH is removed, as reported in: https://github.com/llvm/llvm-project/pull/82362#issuecomment-1960067147 Also remove an unnecessary use of SCEVExpanderCleaner.	2024-02-22 22:50:26 +01:00
Nikita Popov	365bf43b8b	[LSR] Regenerate test checks (NFC) Check output IR instead of debug output. The debug output will change in an upcoming patch in an irrecoverable way.	2024-02-07 11:14:49 +01:00
Nikita Popov	2d69827c5c	[Transforms] Convert tests to opaque pointers (NFC)	2024-02-05 11:57:34 +01:00
Philip Reames	28865da374	[LSR][term-fold] Adjust expansion budget based on trip count (#80304 ) Follow up to https://github.com/llvm/llvm-project/pull/74747 This change extends the previously added fixed expansion threshold by scaling down the cost allowed for an expansion for a loop with either a small known trip count or a profile which indicates the trip count is likely small. The goal here is to improve code generation for a loop nest where the outer loop has a high trip count, and the inner loop runs only a handful of iterations. --------- Co-authored-by: Nikita Popov <github@npopov.com>	2024-02-02 08:57:26 -08:00
Philip Reames	390d66b03b	[LSR] Add tests for restricting term-fold budget based on exact trip count	2024-02-01 07:47:04 -08:00
Philip Reames	f264da4322	[lsr][term-fold] Restrict transform to low cost expansions (#74747 ) This is a follow up to an item I noted in my submission comment for e947f95. I don't have a real world example where this is triggering unprofitably, but avoiding the transform when we estimate the loop to be short running from profiling seems quite reasonable. It's also now come up as a possibility in a regression twice in two days, so I'd like to get this in to close out the possibility if nothing else. The original review dropped the threshold for short trip count loops. I will return to that in a separate review if this lands.	2024-01-31 14:48:20 -08:00
Philip Reames	5282202db0	[LSR] Add a test case mentioned in review As mentioned in https://github.com/llvm/llvm-project/pull/74747, this case is triggering a particularly high cost trip count expansion.	2024-01-31 12:32:50 -08:00
paperchalice	e390c229a4	[Pass] Add hyphen to some pass names (#74287 ) Here is the list of the renamed passes: - `callbrprepare` -> `callbr-prepare` - `dwarfehprepare` -> `dwarf-eh-prepare` - `flattencfg` -> `flatten-cfg` - `loweratomic` -> `lower-atomic` - `lowerinvoke` -> `lower-invoke` - `lowerswitch` -> `lower-switch` - `winehprepare` -> `win-eh-prepare` - `targetir` -> `target-ir` - `targetlibinfo` -> `target-lib-info` Legacy passes are not affected.	2024-01-25 16:05:54 +08:00
Stephen Tozer	7c53e9f667	[RemoveDIs][DebugInfo] Add support for DPValues to LoopStrengthReduce (#78706 ) This patch trivially extends support for DbgValueInst recovery to DPValues in LoopStrengthReduce; they are handled identically, so this is mostly done by reusing the DbgValueInst code (using templates or auto-parameter lambdas to reduce actual code duplication).	2024-01-22 18:59:19 +00:00
Fangrui Song	9e9907f1cf	[AMDGPU,test] Change llc -march= to -mtriple= (#75982 ) Similar to 806761a7629df268c8aed49657aeccffa6bca449. For IR files without a target triple, -mtriple= specifies the full target triple while -march= merely sets the architecture part of the default target triple, leaving a target triple which may not make sense, e.g. amdgpu-apple-darwin. Therefore, -march= is error-prone and not recommended for tests without a target triple. The issue has been benign as we recognize $unknown-apple-darwin as ELF instead of rejecting it outrightly. This patch changes AMDGPU tests to not rely on the default OS/environment components. Tests that need fixes are not changed: ``` LLVM :: CodeGen/AMDGPU/fabs.f64.ll LLVM :: CodeGen/AMDGPU/fabs.ll LLVM :: CodeGen/AMDGPU/floor.ll LLVM :: CodeGen/AMDGPU/fneg-fabs.f64.ll LLVM :: CodeGen/AMDGPU/fneg-fabs.ll LLVM :: CodeGen/AMDGPU/r600-infinite-loop-bug-while-reorganizing-vector.ll LLVM :: CodeGen/AMDGPU/schedule-if-2.ll ```	2024-01-16 21:54:58 -08:00
Florian Hahn	ab33c0b96e	[LSR] Add test showing incorrectly adding nuw with #77827 . Extra test for https://github.com/llvm/llvm-project/pull/77827, where NUW gets added the AddRec due to the BTC being 0.	2024-01-15 21:31:28 +00:00
Philip Reames	e4d01bb227	[SCEV] Special case sext in isKnownNonZero (#77834 ) The existing logic in isKnownNonZero relies on unsigned ranges, which can be problematic when our range calculation is imprecise. Consider the following: %offset.nonzero = or i32 %offset, 1 --> %offset.nonzero U: [1,0) S: [1,0) %offset.i64 = sext i32 %offset.nonzero to i64 --> (sext i32 %offset.nonzero to i64) U: [-2147483648,2147483648) S: [-2147483648,2147483648) Note that the unsigned range for the sext does contain zero in this case despite the fact that it can never actually be zero. Instead, we can push the query down one level - relying on the fact that the sext is an invertible operation and that the result can only be zero if the input is. We could likely generalize this reasoning for other invertible operations, but special casing sext seems worthwhile.	2024-01-12 07:45:28 -08:00
Philip Reames	5ce067d592	Revert "[LSR][TTI][RISCV] Disable terminator folding for RISC-V." This reverts commit fdb87640ee2be63af9b0e0cd943cb13d79686a03, and thus re-enables terminator folding for RISCV. The reported miscompile has been fixed in f5dd70c58277d925710e5a7c25c86d7565cc3c6c.	2024-01-11 13:20:02 -08:00
Philip Reames	f5dd70c582	[LSR] Require non-zero step when considering wrap around for term folding (#77809 ) The term folding logic needs to prove that the induction variable does not cycle through the same set of values so that testing for the value of the IV on the exiting iteration is guaranteed to trigger only on that iteration. The prior code checked the no-self-wrap property on the IV, but this is insufficient as a zero step is trivially no-self-wrap per SCEV's definition but does repeat the same series of values. In the current form, this has the effect of basically disabling lsr's term-folding for all non-constant strides. This is still a net improvement as we've disabled term-folding entirely, so being able to enable it for constant strides is still a net improvement. As future work, there's two SCEV weakness worth investigating. First sext (or i32 %a, 1) to i64 does not return true for isKnownNonZero. This is because we check only the unsigned range in that query. We could either do query pushdown, or check the signed range as well. I tried the second locally and it has very broad impact - i.e. we have a bunch of missing optimizations here. Second, zext (or i32 %a, 1) to i64 as the increment to the IV in expensive_expand_short_tc causes the addrec to no longer be provably no-self-wrap. I didn't investigate this so it might be necessary, but the loop structure is such that I find this result surprising.	2024-01-11 10:07:17 -08:00
Craig Topper	fdb87640ee	[LSR][TTI][RISCV] Disable terminator folding for RISC-V. This is a partial revert of e947f953370abe8ffc8713b8f3250a3ec39599fe. It caused a miscompile in downstream testing. Spoke with Philip offline. We believe the issue is that LSR needs to make sure the Step of the other AddRec is non-zero. Reverting until Philip is back from vacation.	2023-12-27 15:13:32 -08:00
Jessica Del	32f9983c06	[AMDGPU] - Add address space for strided buffers (#74471 ) This is an experimental address space for strided buffers. These buffers can have structs as elements and a stride > 1. These pointers allow the indexed access in units of stride, i.e., they point at `buffer[index * stride]`. Thus, we can use the `idxen` modifier for buffer loads. We assign address space 9 to 192-bit buffer pointers which contain a 128-bit descriptor, a 32-bit offset and a 32-bit index. Essentially, they are fat buffer pointers with an additional 32-bit index.	2023-12-15 15:49:25 +01:00
Philip Reames	04cbfcc33a	[test][lsr] Add term-folding test cases with estimated trip counts	2023-12-07 10:34:29 -08:00
Nikita Popov	eecb99c5f6	[Tests] Add disjoint flag to some tests (NFC) These tests rely on SCEV looking recognizing an "or" with no common bits as an "add". Add the disjoint flag to relevant or instructions in preparation for switching SCEV to use the flag instead of the ValueTracking query. The IR with disjoint flag matches what InstCombine would produce.	2023-12-05 14:09:36 +01:00
Philip Reames	e947f95337	[LSR][TTI][RISCV] Enable terminator folding for RISC-V If looking for a miscompile revert candidate, look here! The transform being enabled prefers comparing to a loop invariant exit value for a secondary IV over using an otherwise dead primary IV. This increases register pressure (by requiring the exit value to be live through the loop), but reduces the number of instructions within the loop by one. On RISC-V which has a large number of scalar registers, this is generally a profitable transform. We loose the ability to use a beqz on what is typically a count down IV, and pay the cost of computing the exit value on the secondary IV in the loop preheader, but save an add or sub in the loop body. For anything except an extremely short running loop, or one with extreme register pressure, this is profitable. On spec2017, we see a 0.42% geomean improvement in dynamic icount, with no individual workload regressing by more than 0.25%. Code size wise, we trade a (possibly compressible) beqz and a (possibly compressible) addi for a uncompressible beq. We also add instructions in the preheader. Net result is a slight regression overall, but neutral or better inside the loop. Previous versions of this transform had numerous cornercase correctness bugs. All of them ones I can spot by inspection have been fixed, and I have run this through all of spec2017, but there may be further issues lurking. Adding uses to an IV is a fraught thing to do given poison semantics, so this transform is somewhat inherently risky. This patch is a reworked version of D134893 by @eop. That patch has been abandoned since May, so I picked it up, reworked it a bit, and am landing it.	2023-11-29 12:04:06 -08:00
Daniil Suchkov	1344b65c90	[SCEV] Fix incorrect NUW inference (#70521 ) This patch fixes a miscompile in LSR caused by incorrect inference of NUW flag for AddRec: we shouldn't infer no-wrap flags based on a comparison which doesn't fully control the loop exit.	2023-10-31 11:43:57 -07:00
Danila Malyutin	ba1349fc31	[SCEV] Fix "quick and dirty" difference that could lead to assert (#70688 ) The old algorithm would remove all operands matching %step SCEV when it intended to only remove a single one. This lead to assert when SCEVAddExpr was of the form %step + %step and potential miscompiles in similar cases. Such SCEVs could be created when construction reached depth thresholds. Fixes #70348	2023-10-31 00:50:57 +03:00
Daniil Suchkov	505e32302c	[Test] NFC. Add missing "REQUIRES: x86-registered-target" to LoopStrengthReduce/scev-incorrect-nuw-inference.ll	2023-10-27 20:48:54 +00:00
Daniil Suchkov	33330966e5	[Test] NFC. Add a test exposing a SCEV bug causing an LSR miscompile	2023-10-27 20:16:44 +00:00
Alex Richardson	e39f6c1844	[opt] Infer DataLayout from triple if not specified There are many tests that specify a target triple/CPU flags but no DataLayout which can lead to IR being generated that has unusual behaviour. This commit attempts to use the default DataLayout based on the relevant flags if there is no explicit override on the command line or in the IR file. One thing that is not currently possible to differentiate from a missing datalayout `target datalayout = ""` in the IR file since the current APIs don't allow detecting this case. If it is considered useful to support this case (instead of passing "-data-layout=" on the command line), I can change IR parsers to track whether they have seen such a directive and change the callback type. Differential Revision: https://reviews.llvm.org/D141060	2023-10-26 12:07:37 -07:00
Jay Foad	7b3bbd83c0	Revert "[CodeGen] Really renumber slot indexes before register allocation (#67038 )" This reverts commit 2501ae58e3bb9a70d279a56d7b3a0ed70a8a852c. Reverted due to various buildbot failures.	2023-10-09 12:31:32 +01:00
Jay Foad	2501ae58e3	[CodeGen] Really renumber slot indexes before register allocation (#67038 ) PR #66334 tried to renumber slot indexes before register allocation, but the numbering was still affected by list entries for instructions which had been erased. Fix this to make the register allocator's live range length heuristics even less dependent on the history of how instructions have been added to and removed from SlotIndexes's maps.	2023-10-09 11:44:41 +01:00
Nikita Popov	1c9b63f103	[LSR] Regenerate test checks (NFC)	2023-09-22 12:40:37 +02:00
Nikita Popov	4de93db447	[LSR] Regenerate test checks (NFC) While there also remove some UB from the test.	2023-09-21 16:34:44 +02:00
Vedant Paranjape	5a9a02f67b	[SCEV] Compute SCEV for ashr(add(shl(x, n), c), m) instr triplet %x = shl i64 %w, n %y = add i64 %x, c %z = ashr i64 %y, m The above given instruction triplet is seen many times in the generated LLVM IR, but SCEV model is not able to compute the SCEV value of AShr instruction in this case. This patch models the two cases of the above instruction pattern using the following expression: => sext(add(mul(trunc(w), 2^(n-m)), c >> m)) 1) when n = m the expression reduces to sext(add(trunc(w), c >> n)) as n-m=0, and multiplying with 2^0 gives the same result. 2) when n > m the expression works as given above. It also adds several unittest to verify that SCEV is able to compute the value. $ opt sext-add-inreg.ll -passes="print<scalar-evolution>" Comparing the snippets of the result of SCEV analysis: * SCEV of ashr before change ---------------------------- %idxprom = ashr exact i64 %sext, 32 --> %idxprom U: [-2147483648,2147483648) S: [-2147483648,2147483648) Exits: 8 LoopDispositions: { %for.body: Variant } * SCEV of ashr after change --------------------------- %idxprom = ashr exact i64 %sext, 32 --> {0,+,1}<nuw><nsw><%for.body> U: [0,9) S: [0,9) Exits: 8 LoopDispositions: { %for.body: Computable } LoopDisposition of the given SCEV was LoopVariant before, after adding the new way to model the instruction, the LoopDisposition becomes LoopComputable as it is able to compute the SCEV of the instruction. Differential Revision: https://reviews.llvm.org/D152278	2023-08-25 05:42:08 +00:00
Harvin Iriawan	db158c7c83	[AArch64] Update generic sched model to A510 Refresh of the generic scheduling model to use A510 instead of A55. Main benefits are to the little core, and introducing SVE scheduling information. Changes tested on various OoO cores, no performance degradation is seen. Differential Revision: https://reviews.llvm.org/D156799	2023-08-21 12:25:15 +01:00
Florian Hahn	3ba3ea3c06	[IVUsers] Check getExpr result in findAddRecForLoop. This fixes a crash if the SCEV for the use isn't invertible and nullptr is returned. Fixes https://github.com/llvm/llvm-project/issues/63840	2023-07-20 14:56:19 +01:00
Nikita Popov	ddb46abd3c	[LSR] Don't consider users of constant outside loop In CollectLoopInvariantFixupsAndFormulae(), LSR looks at users outside the loop. E.g. if we have an addrec based on %base, and %base is also used outside the loop, then we have to keep it in a register anyway, which may make it more profitable to use %base + %idx style addressing. This reasoning doesn't hold up when the base is a constant, because the constant can be rematerialized. The lsr-memcpy.ll test regressed when enabling opaque pointers, because inttoptr (i64 6442450944 to ptr) now also has a use outside the loop (previously it didn't due to a pointer type difference), and that extra "use" results in worse use of addressing modes in the loop. However, the use outside the loop actually gets rematerialized, so the alleged register saving does not occur. The same reasoning also applies to other types of constants, such as global variable references. Differential Revision: https://reviews.llvm.org/D155073	2023-07-13 12:22:38 +02:00
Nikita Popov	e8a5df7beb	[LSR] Add test variant with global variables (NFC) A variant of the test using globals instead of inttoptr expressions for D155073.	2023-07-13 12:12:48 +02:00
Nikita Popov	edb2fc6dab	[llvm] Remove explicit -opaque-pointers flag from tests (NFC) Opaque pointers mode is enabled by default, no need to explicitly enable it.	2023-07-12 14:35:55 +02:00
Nikita Popov	6c388e06f5	[LSR] Convert test to opaque pointers (NFC) This regresses with opaque pointers. I'll submit a patch to recover the regression.	2023-07-12 14:07:25 +02:00
Nikita Popov	4ec3ea8afa	[LSR] Convert some tests to opaque pointers (NFC) These no longer show codegen regressions.	2023-07-12 11:48:44 +02:00
Nikita Popov	bd0710c221	[LSR] Move test to target specific directory (NFC) Uses an x86 triple.	2023-07-12 11:44:09 +02:00
Nikita Popov	d69033d245	[SCEVExpander] Fix GEP IV inc reuse logic for opaque pointers Instead of checking the pointer type, check the element type of the GEP. Previously we ended up reusing GEP increments that were not in expanded form, thus not respecting LSRs choice of representation. The change in 2011-10-06-ReusePhi.ll recovers a regression that appeared when converting that test to opaque pointers. Changes in various Thumb tests now compute the step outside the loop instead of using add.w inside the loop, which is LSR's preferred representation for this target.	2023-07-12 11:32:13 +02:00
Nikita Popov	7a21efce72	[LSR] Move test to target-specific directory (NFC)	2023-07-12 10:10:49 +02:00
Nikita Popov	cfa9275888	[LSR] Convert some tests to opaque pointers (NFC)	2023-07-12 09:46:08 +02:00
Nikita Popov	7a78756118	[LSR] Regenerate test checks (NFC)	2023-07-12 09:40:10 +02:00
Florian Hahn	69ca5c9d62	[SCEV] Add flag to control invertible check for normalization. When normalizing a SCEV expression during expansion, there should be no need for it to be invertible, as it will only be used for code generation. This fixes a crash after 7f5b15ad150e. Fixes https://github.com/llvm/llvm-project/issues/63678.	2023-07-05 18:11:44 +01:00
Florian Hahn	7f5b15ad15	[LSR] Move normalization check to normalizeForPostIncUse. Move the logic added in 3a57152d85e1 to normalizeForPostIncUse to catch additional un-invertable cases. This fixes another mis-compile pointed out by @peixin in D153004.	2023-07-04 11:56:51 +01:00
Florian Hahn	02591d26b9	[LSR] Add test for another normalization miscompile. Based on @peixin test case shared in D153004.	2023-07-03 18:57:31 +01:00
Fangrui Song	d39b4ce3ce	[test] Replace aarch64-*-eabi with aarch64 Using "eabi" for aarch64 targets is a common mistake and warned by Clang Driver. We want to avoid it elsewhere as well. Just use the common "aarch64" without other triple components.	2023-06-27 20:02:52 -07:00
Nikita Popov	b51153792b	[LSR] Convert some tests to opaque pointers (NFC)	2023-06-23 17:13:57 +02:00
Nikita Popov	2c9aba9352	[LSR] Regenerate test checks (NFC)	2023-06-23 17:06:51 +02:00

1 2 3 4 5 ...

616 Commits