llvm-project

Author	SHA1	Message	Date
Jessica Del	32f9983c06	[AMDGPU] - Add address space for strided buffers (#74471 ) This is an experimental address space for strided buffers. These buffers can have structs as elements and a stride > 1. These pointers allow the indexed access in units of stride, i.e., they point at `buffer[index * stride]`. Thus, we can use the `idxen` modifier for buffer loads. We assign address space 9 to 192-bit buffer pointers which contain a 128-bit descriptor, a 32-bit offset and a 32-bit index. Essentially, they are fat buffer pointers with an additional 32-bit index.	2023-12-15 15:49:25 +01:00
Philip Reames	04cbfcc33a	[test][lsr] Add term-folding test cases with estimated trip counts	2023-12-07 10:34:29 -08:00
Nikita Popov	eecb99c5f6	[Tests] Add disjoint flag to some tests (NFC) These tests rely on SCEV looking recognizing an "or" with no common bits as an "add". Add the disjoint flag to relevant or instructions in preparation for switching SCEV to use the flag instead of the ValueTracking query. The IR with disjoint flag matches what InstCombine would produce.	2023-12-05 14:09:36 +01:00
Philip Reames	e947f95337	[LSR][TTI][RISCV] Enable terminator folding for RISC-V If looking for a miscompile revert candidate, look here! The transform being enabled prefers comparing to a loop invariant exit value for a secondary IV over using an otherwise dead primary IV. This increases register pressure (by requiring the exit value to be live through the loop), but reduces the number of instructions within the loop by one. On RISC-V which has a large number of scalar registers, this is generally a profitable transform. We loose the ability to use a beqz on what is typically a count down IV, and pay the cost of computing the exit value on the secondary IV in the loop preheader, but save an add or sub in the loop body. For anything except an extremely short running loop, or one with extreme register pressure, this is profitable. On spec2017, we see a 0.42% geomean improvement in dynamic icount, with no individual workload regressing by more than 0.25%. Code size wise, we trade a (possibly compressible) beqz and a (possibly compressible) addi for a uncompressible beq. We also add instructions in the preheader. Net result is a slight regression overall, but neutral or better inside the loop. Previous versions of this transform had numerous cornercase correctness bugs. All of them ones I can spot by inspection have been fixed, and I have run this through all of spec2017, but there may be further issues lurking. Adding uses to an IV is a fraught thing to do given poison semantics, so this transform is somewhat inherently risky. This patch is a reworked version of D134893 by @eop. That patch has been abandoned since May, so I picked it up, reworked it a bit, and am landing it.	2023-11-29 12:04:06 -08:00
Daniil Suchkov	1344b65c90	[SCEV] Fix incorrect NUW inference (#70521 ) This patch fixes a miscompile in LSR caused by incorrect inference of NUW flag for AddRec: we shouldn't infer no-wrap flags based on a comparison which doesn't fully control the loop exit.	2023-10-31 11:43:57 -07:00
Danila Malyutin	ba1349fc31	[SCEV] Fix "quick and dirty" difference that could lead to assert (#70688 ) The old algorithm would remove all operands matching %step SCEV when it intended to only remove a single one. This lead to assert when SCEVAddExpr was of the form %step + %step and potential miscompiles in similar cases. Such SCEVs could be created when construction reached depth thresholds. Fixes #70348	2023-10-31 00:50:57 +03:00
Daniil Suchkov	505e32302c	[Test] NFC. Add missing "REQUIRES: x86-registered-target" to LoopStrengthReduce/scev-incorrect-nuw-inference.ll	2023-10-27 20:48:54 +00:00
Daniil Suchkov	33330966e5	[Test] NFC. Add a test exposing a SCEV bug causing an LSR miscompile	2023-10-27 20:16:44 +00:00
Alex Richardson	e39f6c1844	[opt] Infer DataLayout from triple if not specified There are many tests that specify a target triple/CPU flags but no DataLayout which can lead to IR being generated that has unusual behaviour. This commit attempts to use the default DataLayout based on the relevant flags if there is no explicit override on the command line or in the IR file. One thing that is not currently possible to differentiate from a missing datalayout `target datalayout = ""` in the IR file since the current APIs don't allow detecting this case. If it is considered useful to support this case (instead of passing "-data-layout=" on the command line), I can change IR parsers to track whether they have seen such a directive and change the callback type. Differential Revision: https://reviews.llvm.org/D141060	2023-10-26 12:07:37 -07:00
Jay Foad	7b3bbd83c0	Revert "[CodeGen] Really renumber slot indexes before register allocation (#67038 )" This reverts commit 2501ae58e3bb9a70d279a56d7b3a0ed70a8a852c. Reverted due to various buildbot failures.	2023-10-09 12:31:32 +01:00
Jay Foad	2501ae58e3	[CodeGen] Really renumber slot indexes before register allocation (#67038 ) PR #66334 tried to renumber slot indexes before register allocation, but the numbering was still affected by list entries for instructions which had been erased. Fix this to make the register allocator's live range length heuristics even less dependent on the history of how instructions have been added to and removed from SlotIndexes's maps.	2023-10-09 11:44:41 +01:00
Nikita Popov	1c9b63f103	[LSR] Regenerate test checks (NFC)	2023-09-22 12:40:37 +02:00
Nikita Popov	4de93db447	[LSR] Regenerate test checks (NFC) While there also remove some UB from the test.	2023-09-21 16:34:44 +02:00
Vedant Paranjape	5a9a02f67b	[SCEV] Compute SCEV for ashr(add(shl(x, n), c), m) instr triplet %x = shl i64 %w, n %y = add i64 %x, c %z = ashr i64 %y, m The above given instruction triplet is seen many times in the generated LLVM IR, but SCEV model is not able to compute the SCEV value of AShr instruction in this case. This patch models the two cases of the above instruction pattern using the following expression: => sext(add(mul(trunc(w), 2^(n-m)), c >> m)) 1) when n = m the expression reduces to sext(add(trunc(w), c >> n)) as n-m=0, and multiplying with 2^0 gives the same result. 2) when n > m the expression works as given above. It also adds several unittest to verify that SCEV is able to compute the value. $ opt sext-add-inreg.ll -passes="print<scalar-evolution>" Comparing the snippets of the result of SCEV analysis: * SCEV of ashr before change ---------------------------- %idxprom = ashr exact i64 %sext, 32 --> %idxprom U: [-2147483648,2147483648) S: [-2147483648,2147483648) Exits: 8 LoopDispositions: { %for.body: Variant } * SCEV of ashr after change --------------------------- %idxprom = ashr exact i64 %sext, 32 --> {0,+,1}<nuw><nsw><%for.body> U: [0,9) S: [0,9) Exits: 8 LoopDispositions: { %for.body: Computable } LoopDisposition of the given SCEV was LoopVariant before, after adding the new way to model the instruction, the LoopDisposition becomes LoopComputable as it is able to compute the SCEV of the instruction. Differential Revision: https://reviews.llvm.org/D152278	2023-08-25 05:42:08 +00:00
Harvin Iriawan	db158c7c83	[AArch64] Update generic sched model to A510 Refresh of the generic scheduling model to use A510 instead of A55. Main benefits are to the little core, and introducing SVE scheduling information. Changes tested on various OoO cores, no performance degradation is seen. Differential Revision: https://reviews.llvm.org/D156799	2023-08-21 12:25:15 +01:00
Florian Hahn	3ba3ea3c06	[IVUsers] Check getExpr result in findAddRecForLoop. This fixes a crash if the SCEV for the use isn't invertible and nullptr is returned. Fixes https://github.com/llvm/llvm-project/issues/63840	2023-07-20 14:56:19 +01:00
Nikita Popov	ddb46abd3c	[LSR] Don't consider users of constant outside loop In CollectLoopInvariantFixupsAndFormulae(), LSR looks at users outside the loop. E.g. if we have an addrec based on %base, and %base is also used outside the loop, then we have to keep it in a register anyway, which may make it more profitable to use %base + %idx style addressing. This reasoning doesn't hold up when the base is a constant, because the constant can be rematerialized. The lsr-memcpy.ll test regressed when enabling opaque pointers, because inttoptr (i64 6442450944 to ptr) now also has a use outside the loop (previously it didn't due to a pointer type difference), and that extra "use" results in worse use of addressing modes in the loop. However, the use outside the loop actually gets rematerialized, so the alleged register saving does not occur. The same reasoning also applies to other types of constants, such as global variable references. Differential Revision: https://reviews.llvm.org/D155073	2023-07-13 12:22:38 +02:00
Nikita Popov	e8a5df7beb	[LSR] Add test variant with global variables (NFC) A variant of the test using globals instead of inttoptr expressions for D155073.	2023-07-13 12:12:48 +02:00
Nikita Popov	edb2fc6dab	[llvm] Remove explicit -opaque-pointers flag from tests (NFC) Opaque pointers mode is enabled by default, no need to explicitly enable it.	2023-07-12 14:35:55 +02:00
Nikita Popov	6c388e06f5	[LSR] Convert test to opaque pointers (NFC) This regresses with opaque pointers. I'll submit a patch to recover the regression.	2023-07-12 14:07:25 +02:00
Nikita Popov	4ec3ea8afa	[LSR] Convert some tests to opaque pointers (NFC) These no longer show codegen regressions.	2023-07-12 11:48:44 +02:00
Nikita Popov	bd0710c221	[LSR] Move test to target specific directory (NFC) Uses an x86 triple.	2023-07-12 11:44:09 +02:00
Nikita Popov	d69033d245	[SCEVExpander] Fix GEP IV inc reuse logic for opaque pointers Instead of checking the pointer type, check the element type of the GEP. Previously we ended up reusing GEP increments that were not in expanded form, thus not respecting LSRs choice of representation. The change in 2011-10-06-ReusePhi.ll recovers a regression that appeared when converting that test to opaque pointers. Changes in various Thumb tests now compute the step outside the loop instead of using add.w inside the loop, which is LSR's preferred representation for this target.	2023-07-12 11:32:13 +02:00
Nikita Popov	7a21efce72	[LSR] Move test to target-specific directory (NFC)	2023-07-12 10:10:49 +02:00
Nikita Popov	cfa9275888	[LSR] Convert some tests to opaque pointers (NFC)	2023-07-12 09:46:08 +02:00
Nikita Popov	7a78756118	[LSR] Regenerate test checks (NFC)	2023-07-12 09:40:10 +02:00
Florian Hahn	69ca5c9d62	[SCEV] Add flag to control invertible check for normalization. When normalizing a SCEV expression during expansion, there should be no need for it to be invertible, as it will only be used for code generation. This fixes a crash after 7f5b15ad150e. Fixes https://github.com/llvm/llvm-project/issues/63678.	2023-07-05 18:11:44 +01:00
Florian Hahn	7f5b15ad15	[LSR] Move normalization check to normalizeForPostIncUse. Move the logic added in 3a57152d85e1 to normalizeForPostIncUse to catch additional un-invertable cases. This fixes another mis-compile pointed out by @peixin in D153004.	2023-07-04 11:56:51 +01:00
Florian Hahn	02591d26b9	[LSR] Add test for another normalization miscompile. Based on @peixin test case shared in D153004.	2023-07-03 18:57:31 +01:00
Fangrui Song	d39b4ce3ce	[test] Replace aarch64-*-eabi with aarch64 Using "eabi" for aarch64 targets is a common mistake and warned by Clang Driver. We want to avoid it elsewhere as well. Just use the common "aarch64" without other triple components.	2023-06-27 20:02:52 -07:00
Nikita Popov	b51153792b	[LSR] Convert some tests to opaque pointers (NFC)	2023-06-23 17:13:57 +02:00
Nikita Popov	2c9aba9352	[LSR] Regenerate test checks (NFC)	2023-06-23 17:06:51 +02:00
Florian Hahn	3a57152d85	[LSR] Return nullptr from getExpr if the result isn't invertible. getExpr is missing a check to make sure the result is invertible. This can lead to incorrect results, so return nullptr in those cases like in other places in IVUsers. Fixes #62660. Reviewed By: qcolombet Differential Revision: https://reviews.llvm.org/D153202	2023-06-22 19:10:48 +01:00
Florian Hahn	93407f7675	[LSR] Adjust test to make sure it keeps testing for the original issue. Make sure the test keeps testing for the original issue after D153202.	2023-06-22 15:36:32 +01:00
Matt Arsenault	92ee60b66f	AMDGPU: Drop and upgrade llvm.amdgcn.atomic.inc/dec to atomicrmw	2023-06-21 21:20:26 -04:00
Florian Hahn	dae5cd73cb	Recommit "[LSR] Consider post-inc form when creating extends/truncates." This reverts the revert commit 1797ab36efc9c90c921cd725831f8c3f6a7125a2. The recommitted version now checks the PostIncLoopSets for all fixups and returns nullptr if the result doesn't match for all fixups.	2023-06-19 17:57:06 +01:00
Florian Hahn	798b6419bc	[LSR] Add test for for issue leading to revert of abfeda5af329b5. Add unit test triggering an assertion with abfeda5af329b5.	2023-06-19 15:35:48 +01:00
NAKAMURA Takumi	7400bdc19f	pr62660-normalization-failure.ll REQUIRES: asserts (#62660 )	2023-06-18 15:24:53 +09:00
Florian Hahn	8225698212	[LSR] Enable SCEV verification for test from f3a0ad2d and mark as XFAIL The test fails SCEV verification, which cause the expensive check bots to fail. Always run verification and mark as XFAIL until fixed.	2023-06-17 21:06:49 +01:00
Florian Hahn	1797ab36ef	Revert "[LSR] Consider post-inc form when creating extends/truncates." This reverts commit abfeda5af329b5889db709ff74506e20e0b569e9. and fe19036e1266d2a90b44725c82b898134906e4c3. The added assertion triggers during clang bootstrap builds. Revert while I investigate.	2023-06-17 17:58:41 +01:00
Florian Hahn	f3a0ad2d8b	[LSR] Add test for #62660 . Add test for LSR miscompile.	2023-06-17 17:37:25 +01:00
Florian Hahn	abfeda5af3	[LSR] Consider post-inc form when creating extends/truncates. GenerateTruncates at the moment creates extends/truncates for post-inc uses of normalized expressions. For example, if an add rec of the form {1,+,-1} is used outside the loop, the normalized form will use {1,+,-1} instead of {0,+,-1}. When naively sign-extending the normalized expression, it will get extended incorrectly to {1,+,-1} for the wider type, if the backedge-taken count of the loop is 1. To address this, the patch updates GenerateTruncates to check if the LSRUse contains any fixups with PostIncLoops. If that's the case, first de-normalize the expression, then perform the extend/truncate, then normalize again. There may be other places where similar checks are needed and the helper can be generalized for those cases. I'd not be surprised if other subtle mis-compiles are caused by this. Fixes #38847. Fixes #58039. Fixes #62852. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D153004	2023-06-17 09:58:37 +01:00
Florian Hahn	f63c038af4	[LSR] Add test case for #58039 .	2023-06-17 09:57:00 +01:00
Florian Hahn	672b35d554	[LSR] Move new test to X86 subdir. The test added in 1665cb06307 requires the X86 backend, so move it to the X86 subdirectory.	2023-06-15 11:11:06 +01:00
Florian Hahn	1665cb0630	[LSR] Add test cases showing bad handling of extends of post-inc uses. Tests from #38847, #62852.	2023-06-15 10:15:12 +01:00
Dmitry Makogon	0a3dc73e70	[Test] Move LoopStrengthReduce/pr62563.ll to X86 specific test folder (NFC) The test case is X86 specific. Should unblock buildbots after 253e3e2.	2023-05-31 20:24:30 +07:00
Dmitry Makogon	253e3e2619	[Test] Add test showing miscompilation in LoopStrengthReduce on min/max expressions (NFC) This is a test case from https://github.com/llvm/llvm-project/issues/62563.	2023-05-31 18:46:23 +07:00
sgokhale	c4a60c9d34	[CodeGen][ShrinkWrap] Enable PostShrinkWrap by default This is an attempt to reland D42600 and enabling this optimisation by default. This also resolves the issue pointed out in the context of PGO build. Differential Revision: https://reviews.llvm.org/D42600	2023-05-25 13:56:29 +05:30
Tobias Hieta	f84bac329b	[NFC][Py Reformat] Reformat lit.local.cfg python files in llvm This is a follow-up to b71edfaa4ec3c998aadb35255ce2f60bba2940b0 since I forgot the lit.local.cfg files in that one. Reformatting is done with `black`. If you end up having problems merging this commit because you have made changes to a python file, the best way to handle that is to run git checkout --ours <yourfile> and then reformat it with black. If you run into any problems, post to discourse about it and we will try to help. RFC Thread below: https://discourse.llvm.org/t/rfc-document-and-standardize-python-code-style Reviewed By: barannikov88, kwk Differential Revision: https://reviews.llvm.org/D150762	2023-05-17 17:03:15 +02:00
Alan Zhao	f4999d3535	Revert "[CodeGen][ShrinkWrap] Split restore point" This reverts commit 1ddfd1c8186735c62b642df05c505dc4907ffac4. The original commit causes a Chrome build assertion failure with ThinLTO: https://crbug.com/1443635	2023-05-08 16:27:59 -07:00

1 2 3 4 5 ...

598 Commits