llvm-project

Author	SHA1	Message	Date
Rahul Joshi	bee9664970	[TableGen] Emit OpName as an enum class instead of a namespace (#125313 ) - Change InstrInfoEmitter to emit OpName as an enum class instead of an anonymous enum in the OpName namespace. - This will help clearly distinguish between values that are OpNames vs just operand indices and should help avoid bugs due to confusion between the two. - Rename OpName::OPERAND_LAST to NUM_OPERAND_NAMES. - Emit declaration of getOperandIdx() along with the OpName enum so it doesn't have to be repeated in various headers. - Also updated AMDGPU, RISCV, and WebAssembly backends to conform to the new definition of OpName (mostly mechanical changes).	2025-02-12 08:19:30 -08:00
Kazu Hirata	975bba6f4b	[AMDGPU] Avoid repeated hash lookups (NFC) (#126001 )	2025-02-06 10:34:49 -08:00
Jay Foad	8d13e7b8c3	[AMDGPU] Qualify auto. NFC. (#110878 ) Generated automatically with: $ clang-tidy -fix -checks=-*,llvm-qualified-auto $(find lib/Target/AMDGPU/ -type f)	2024-10-03 13:07:54 +01:00
Jay Foad	7a30b9c0f0	[AMDGPU] Make more use of getWaveMaskRegClass. NFC. (#108186 )	2024-09-11 14:55:53 +01:00
Craig Topper	cd3667d1db	[CodeGen] Update a few places that were passing Register to raw_ostream::operator<< (#106877 ) These would implicitly cast the register to `unsigned`. Switch most of them to use printReg will give a more readable output. Change some others to use Register::id() so we can eventually remove the implicit cast to `unsigned`.	2024-09-02 00:19:19 -07:00
Akshat Oke	da13754103	AMDGPU/NewPM Port SILoadStoreOptimizer to NPM (#106362 )	2024-09-02 11:41:56 +05:30
Tim Gymnich	273e0a4c56	[AMDGPU] add missing checks in processBaseWithConstOffset (#102310 ) fixes https://github.com/llvm/llvm-project/issues/102231 by inserting missing checks.	2024-08-12 11:54:02 +04:00
Christudasan Devadasan	37d7b06da0	[AMDGPU][SILoadStoreOptimizer] Include constrained buffer load variants (#101619 ) Use the constrained buffer load opcodes while combining under-aligned loads for XNACK enabled subtargets.	2024-08-06 11:27:04 +05:30
Christudasan Devadasan	a1d7da05d0	[AMDGPU][SILoadStoreOptimizer] Merge constrained sloads (#96162 ) Consider the constrained multi-dword loads while merging individual loads to a single multi-dword load.	2024-07-23 13:50:42 +05:30
Stanislav Mekhanoshin	c771b670ea	[AMDGPU] Promote immediate offset to atomics (#94043 )	2024-06-06 12:05:51 -07:00
Stanislav Mekhanoshin	fc21387b65	[AMDGPU] Enable constant offset promotion to immediate FLAT (#93884 ) Currently it is only supported for FLAT Global.	2024-05-31 12:23:27 -07:00
Stanislav Mekhanoshin	215f92b979	[AMDGPU] Fix crash in the SILoadStoreOptimizer (#93862 ) It does not properly handle situation when address calculation uses V_ADDC_U32 0, 0, carry-in (i.e. with both src0 and src1 immediates).	2024-05-30 14:27:33 -07:00
Jay Foad	11f76b8511	[AMDGPU] Use some merging/unmerging helpers in SILoadStoreOptimizer (#90866 ) Factor out copyToDestRegs and copyFromSrcRegs for merging store sources and unmerging load results. NFC.	2024-05-02 21:01:51 +01:00
Jay Foad	e020e287c7	[AMDGPU] Modernize some syntax in SILoadStoreOptimizer. NFC. Use structured bindings and similar.	2024-05-02 15:34:02 +01:00
Jay Foad	0606747c96	[AMDGPU] Remove some pointless fallthrough annotations	2024-05-01 16:04:35 +01:00
David Stuttard	06cfbe3cfd	[AMDPU] Add support for idxen and bothen buffer load/store merging in SILoadStoreOptimizer (#86285 ) Added more buffer instruction merging support	2024-03-25 14:44:22 +00:00
David Green	601e102bdb	[CodeGen] Use LocationSize for MMO getSize (#84751 ) This is part of #70452 that changes the type used for the external interface of MMO to LocationSize as opposed to uint64_t. This means the constructors take LocationSize, and convert ~UINT64_C(0) to LocationSize::beforeOrAfter(). The getSize methods return a LocationSize. This allows us to be more precise with unknown sizes, not accidentally treating them as unsigned values, and in the future should allow us to add proper scalable vector support but none of that is included in this patch. It should mostly be an NFC. Global ISel is still expected to use the underlying LLT as it needs, and are not expected to see unknown sizes for generic operations. Most of the changes are hopefully fairly mechanical, adding a lot of getValue() calls and protecting them with hasValue() where needed.	2024-03-17 18:15:56 +00:00
Mirko Brkušanin	5879162f7f	[AMDGPU] CodeGen for GFX12 VBUFFER instructions (#75492 )	2023-12-15 13:45:03 +01:00
Mirko Brkušanin	26b14aedb7	[AMDGPU] CodeGen for GFX12 VIMAGE and VSAMPLE instructions (#75488 )	2023-12-15 12:40:23 +01:00
Mirko Brkušanin	a278ac577e	[AMDGPU] CodeGen for SMEM instructions (#75579 )	2023-12-15 12:10:33 +01:00
Konrad Kusiak	4fa8a5487e	[AMDGPU] Add sanity check that fixes bad shift operation in AMD backend There is a problem with the SILoadStoreOptimizer::dmasksCanBeCombined() function that can lead to UB. This boolean function decides if two masks can be combined into 1. The idea here is that the bits which are "on" in one mask, don't overlap with the "on" bits of the other. Consider an example (10 bits for simplicity): Mask 1: 0101101000 Mask 2: 0000000110 Those can be combined into a single mask: 0101101110. To check if such an operation is possible, the code takes the mask which is greater and counts how many 0s there are, starting from the LSB and stopping at the first 1. Then, it shifts 1u by this number and compares it with the smaller mask. The problem is that when both masks are 0, the counter will find 32 zeroes in the first mask and will try to do a shift by 32 positions which leads to UB. The fix is a simple sanity check, if the bigger mask is 0 or not. https://reviews.llvm.org/D155051	2023-08-11 15:26:35 -04:00
Jay Foad	c68c6c56fc	[AMDGPU] Minor refactoring in SILoadStoreOptimizer::offsetsCanBeCombined	2023-06-21 12:05:47 +01:00
Jay Foad	0c13e0b748	[AMDGPU] Do not handle _SGPR SMEM instructions in SILoadStoreOptimizer After D147334 we never select _SGPR forms of SMEM instructions on subtargets that also support the _SGPR_IMM form, so there is no need to handle them here. Differential Revision: https://reviews.llvm.org/D149139	2023-04-25 15:40:13 +01:00
mmarjano	f6e70ed1c7	[AMDGPU] Extend tbuffer_load_format merge Add support for merging _IDXEN and _BOTHEN variants of TBUFFER_LOAD_FORMAT instruction.	2023-04-10 12:24:21 +02:00
Kazu Hirata	7ada7bbee1	[Target] Use *{Set,Map}::contains (NFC)	2023-03-14 18:06:55 -07:00
Kazu Hirata	e078201835	[Target] Use llvm::count{l,r}_{zero,one} (NFC)	2023-01-28 09:23:07 -08:00
Kazu Hirata	caa99a01f5	Use llvm::popcount instead of llvm::countPopulation(NFC)	2023-01-22 12:48:51 -08:00
Jay Foad	6443c0ee02	[AMDGPU] Stop using make_pair and make_tuple. NFC. C++17 allows us to call constructors pair and tuple instead of helper functions make_pair and make_tuple. Differential Revision: https://reviews.llvm.org/D139828	2022-12-14 13:22:26 +00:00
Fangrui Song	67819a72c6	[CodeGen] llvm::Optional => std::optional	2022-12-13 09:06:36 +00:00
Kazu Hirata	20cde15415	[Target] Use std::nullopt instead of None (NFC) This patch mechanically replaces None with std::nullopt where the compiler would warn if None were deprecated. The intent is to reduce the amount of manual work required in migrating from Optional to std::optional. This is part of an effort to migrate from llvm::Optional to std::optional: https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716	2022-12-02 20:36:06 -08:00
Ivan Kosarev	1b560e6ab7	[AMDGPU][MC] Support TFE modifiers in MUBUF loads and stores. Reviewed By: dp, arsenm Differential Revision: https://reviews.llvm.org/D137783	2022-11-14 15:36:18 +00:00
Pierre van Houtryve	7425077e31	[AMDGPU] Add & use `hasNamedOperand`, NFC In a lot of places, we were just calling `getNamedOperandIdx` to check if the result was != or == to -1. This is fine in itself, but it's verbose and doesn't make the intention clear, IMHO. I added a `hasNamedOperand` and replaced all cases I could find with regexes and manually. Reviewed By: arsenm, foad Differential Revision: https://reviews.llvm.org/D137540	2022-11-08 07:57:21 +00:00
Ivan Kosarev	693f816288	[AMDGPU][SILoadStoreOptimizer] Merge SGPR_IMM scalar buffer loads. Reviewed By: foad, rampitec Differential Revision: https://reviews.llvm.org/D133787	2022-09-15 13:48:51 +01:00
Fangrui Song	de9d80c1c5	[llvm] LLVM_FALLTHROUGH => [[fallthrough]]. NFC With C++17 there is no Clang pedantic warning or MSVC C5051.	2022-08-08 11:24:15 -07:00
Carl Ritson	4c4db81630	[AMDGPU] Extend SILoadStoreOptimizer to s_load instructions Apply merging to s_load as is done for s_buffer_load. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D130742	2022-07-30 11:38:39 +09:00
Stanislav Mekhanoshin	33fb23f728	[AMDGPU] Merge flat with global in the SILoadStoreOptimizer Flat can be merged with flat global since address cast is a no-op. A combined memory operation needs to be promoted to flat. Differential Revision: https://reviews.llvm.org/D120431	2022-03-09 10:04:37 -08:00
Stanislav Mekhanoshin	517171ce20	[AMDGPU] Extend SILoadStoreOptimizer to handle flat load/stores TODO: merge flat with global promoting to flat. Differential Revision: https://reviews.llvm.org/D120351	2022-02-28 11:27:30 -08:00
Stanislav Mekhanoshin	3279e44063	[AMDGPU] Extend SILoadStoreOptimizer to handle global stores TODO: merge flat load/stores. TODO: merge flat with global promoting to flat. Differential Revision: https://reviews.llvm.org/D120346	2022-02-24 11:09:51 -08:00
Stanislav Mekhanoshin	cefa1c5ca9	[AMDGPU] Fix combined MMO in load-store merge Loads and stores can be out of order in the SILoadStoreOptimizer. When combining MachineMemOperands of two instructions operands are sent in the IR order into the combineKnownAdjacentMMOs. At the moment it picks the first operand and just replaces its offset and size. This essentially loses alignment information and may generally result in an incorrect base pointer to be used. Use a base pointer in memory addresses order instead and only adjust size. Differential Revision: https://reviews.llvm.org/D120370	2022-02-24 10:47:57 -08:00
Stanislav Mekhanoshin	9e055c0fff	[AMDGPU] Extend SILoadStoreOptimizer to handle global saddr loads This adds handling of the _SADDR forms to the GLOBAL_LOAD combining. TODO: merge global stores. TODO: merge flat load/stores. TODO: merge flat with global promoting to flat. Differential Revision: https://reviews.llvm.org/D120285	2022-02-22 09:01:43 -08:00
Stanislav Mekhanoshin	ba17bd2674	[AMDGPU] Extend SILoadStoreOptimizer to handle global loads There can be situations where global and flat loads and stores are not combined by the vectorizer, in particular if their address space differ in the IR but they end up the same class instructions after selection. For example a divergent load from constant address space ends up being the same global_load as a load from global address space. TODO: merge global stores. TODO: handle SADDR forms. TODO: merge flat load/stores. TODO: merge flat with global promoting to flat. Differential Revision: https://reviews.llvm.org/D120279	2022-02-22 08:42:36 -08:00
Stanislav Mekhanoshin	dc0981562e	[AMDGPU] Remove redundand check in the SILoadStoreOptimizer Differential Revision: https://reviews.llvm.org/D120268	2022-02-21 15:04:44 -08:00
Jay Foad	359a792f9b	[AMDGPU] SILoadStoreOptimizer: avoid unbounded register pressure increases Previously when combining two loads this pass would sink the first one down to the second one, putting the combined load where the second one was. It would also sink any intervening instructions which depended on the first load down to just after the combined load. For example, if we started with this sequence of instructions (code flowing from left to right): X A B C D E F Y After combining loads X and Y into XY we might end up with: A B C D E F XY But if B D and F depended on X, we would get: A C E XY B D F Now if the original code had some short disjoint live ranges from A to B, C to D and E to F, in the transformed code these live ranges will be long and overlapping. In this way a single merge of two loads could cause an unbounded increase in register pressure. To fix this, change the way the way that loads are moved in order to merge them so that: - The second load is moved up to the first one. (But when merging stores, we still move the first store down to the second one.) - Intervening instructions are never moved. - Instead, if we find an intervening instruction that would need to be moved, give up on the merge. But this case should now be pretty rare because normal stores have no outputs, and normal loads only have address register inputs, but these will be identical for any pair of loads that we try to merge. As well as fixing the unbounded register pressure increase problem, moving loads up and stores down seems like it should usually be a win for memory latency reasons. Differential Revision: https://reviews.llvm.org/D119006	2022-02-21 10:51:14 +00:00
Sebastian Neubauer	6527b2a4d5	[AMDGPU][NFC] Fix typos Fix some typos in the amdgpu backend. Differential Revision: https://reviews.llvm.org/D119235	2022-02-18 15:05:21 +01:00
Jay Foad	a456ace9c1	[AMDGPU] SILoadStoreOptimizer: rewrite checkAndPrepareMerge. NFCI. Separate the function clearly into: - Checks that can be done on CI and Paired before the loop. - The loop over all instructions between CI and Paired. - Checks that must be done on InstsToMove after the loop. Previously these were mostly done inside the loop in a very confusing way. Differential Revision: https://reviews.llvm.org/D118994	2022-02-04 17:17:29 +00:00
Jay Foad	001cb43159	[AMDGPU] SILoadStoreOptimizer: fewer calls to offsetsCanBeCombined Only call offsetsCanBeCombined with Modify = true in cases where it will really do something. NFC.	2022-02-04 14:52:11 +00:00
Jay Foad	00bbda07ae	[AMDGPU] SILoadStoreOptimizer: simplify class/subclass checks Also add a comment explaining the difference between class and subclass. NFCI.	2022-02-04 14:08:04 +00:00
Jay Foad	33ef8bdf36	[AMDGPU] SILoadStoreOptimizer: simplify optimizeInstsWithSameBaseAddr Common up all the calls to CI.setMI. NFCI.	2022-02-04 13:25:05 +00:00
Jay Foad	ca05edd927	[AMDGPU] SILoadStoreOptimizer: simplify OptimizeListAgain test At this point CI represents the combined access (original CI combined with Paired) so it doesn't make any sense to add in Paired.width again. NFCI.	2022-02-04 13:02:19 +00:00
Jay Foad	68e3946270	[AMDGPU] SILoadStoreOptimizer: break lists on instructions with side effects This just helps to keep the lists shorter and faster to sort. NFCI. Differential Revision: https://reviews.llvm.org/D118384	2022-01-28 18:03:42 +00:00

1 2 3 4

169 Commits