llvm-project

Author	SHA1	Message	Date
Petar Avramovic	3ad810ea9a	AMDGPU/GlobalISel: Disable LCSSA pass (#124297 ) Disable LCSSA pass in preparation for implementing temporal divergence lowering in amdgpu divergence lowering. Breaks all cases where sgpr or i1 values are used outside of the cycle with divergent exit. Regenerate regression tests for amdgpu divergence lowering with LCSSA disabled. Update IntrinsicLaneMaskAnalyzer to stop tracking lcssa phis that are lane masks.	2025-03-12 11:09:50 +01:00
Daniil Fukalov	68d90cff58	[AMDGPU][GlobalISel] Fix assert on APInt creation. (#124608 ) Since 3494ee95902cef62f767489802e469c58a13ea04 APInt stopped to implicitly truncate values, therefore it asserts on a big signed value converted to (implicitly) unsigned APInt. The change explicitly marks offset as a signed value.	2025-01-28 15:53:17 +01:00
Petar Avramovic	0ee037b861	AMDGPU/GlobalISel: AMDGPURegBankLegalize (#112864 ) Lower G_ instructions that can't be inst-selected with register bank assignment from AMDGPURegBankSelect based on uniformity analysis. - Lower instruction to perform it on assigned register bank - Put uniform value in vgpr because SALU instruction is not available - Execute divergent instruction in SALU - "waterfall loop" Given LLTs on all operands after legalizer, some register bank assignments require lowering while other do not. Note: cases where all register bank assignments would require lowering are lowered in legalizer. AMDGPURegBankLegalize goals: - Define Rules: when and how to perform lowering - Goal of defining Rules it to provide high level table-like brief overview of how to lower generic instructions based on available target features and uniformity info (uniform vs divergent). - Fast search of Rules, depends on how complicated Rule.Predicate is - For some opcodes there would be too many Rules that are essentially all the same just for different combinations of types and banks. Write custom function that handles all cases. - Rules are made from enum IDs that correspond to each operand. Names of IDs are meant to give brief description what lowering does for each operand or the whole instruction. - AMDGPURegBankLegalizeHelper implements lowering algorithms Since this is the first patch that actually enables -new-reg-bank-select here is the summary of regression tests that were added earlier: - if instruction is uniform always select SALU instruction if available - eliminate back to back vgpr to sgpr to vgpr copies of uniform values - fast rules: small differences for standard and vector instruction - enabling Rule based on target feature - salu_float - how to specify lowering algorithm - vgpr S64 AND to S32 - on G_TRUNC in reg, it is up to user to deal with truncated bits G_TRUNC in reg is treated as no-op. - dealing with truncated high bits - ABS S16 to S32 - sgpr S1 phi lowering - new opcodes for vcc-to-scc and scc-to-vcc copies - lowering for vgprS1-to-vcc copy (formally this is vgpr-to-vcc G_TRUNC) - S1 zext and sext lowering to select - uniform and divergent S1 AND(OR and XOR) lowering - inst-selected into SALU instruction - divergent phi with uniform inputs - divergent instruction with temporal divergent use, source instruction is defined as uniform(AMDGPURegBankSelect) - missing temporal divergence lowering - uniform phi, because of undef incoming, is assigned to vgpr. Will be fixed in AMDGPURegBankSelect via another fix in machine uniformity analysis.	2025-01-24 12:12:45 +01:00
Petar Avramovic	f8a56df36e	AMDGPU/GlobalISel: AMDGPURegBankSelect (#112863 ) Assign register banks to virtual registers. Does not use generic RegBankSelect. After register bank selection all register operand of G_ instructions have LLT and register banks exclusively. If they had register class, reassign appropriate register bank. Assign register banks using machine uniformity analysis: Sgpr - uniform values and some lane masks Vgpr - divergent, non S1, values Vcc - divergent S1 values(lane masks) AMDGPURegBankSelect does not consider available instructions and, in some cases, G_ instructions with some register bank assignment can't be inst-selected. This is solved in RegBankLegalize. Exceptions when uniformity analysis does not work: S32/S64 lane masks: - need to end up with sgpr register class after instruction selection - In most cases Uniformity analysis declares them as uniform (forced by tablegen) resulting in sgpr S32/S64 reg bank - When Uniformity analysis declares them as divergent (some phis), use intrinsic lane mask analyzer to still assign sgpr register bank temporal divergence copy: - COPY to vgpr with implicit use of $exec inside of the cycle - this copy is declared as uniform by uniformity analysis - make sure that assigned bank is vgpr Note: uniformity analysis does not consider that registers with vgpr def are divergent (you can have uniform value in vgpr). - TODO: implicit use of $exec could be implemented as indicator that instruction is divergent	2025-01-24 11:06:02 +01:00
Kazu Hirata	be187369a0	[AMDGPU] Remove unused includes (NFC) (#116154 ) Identified with misc-include-cleaner.	2024-11-13 21:10:03 -08:00
Shilei Tian	bfcf7a0707	[AMDGPU] Remove `hasAtomicFaddRtnForTy` as it is not used anywhere (#82841 )	2024-02-23 21:14:38 -05:00
Nico Weber	184ca39529	[llvm] Move CodeGenTypes library to its own directory (#79444 ) Finally addresses https://reviews.llvm.org/D148769#4311232 :) No behavior change.	2024-01-25 12:01:31 -05:00
Acim Maravic	01c1c7a19e	[AMDGPU][CodeGen] Update support (soffset + offset) s_buffer_load's (#68302 ) getBaseWithConstantOffset() is used for scalar and non-scalar buffer loads. Diffrence between s_load and load instruction is that s_load instruction extends 32-bit offset to 64-bits, so a 32-bit (address + offset) should not cause unsigned 32-bit integer wraparound, because it performs addition in 64-bits.	2023-11-14 19:06:45 +01:00
NAKAMURA Takumi	9cfeba5b12	Restore CodeGen/LowLevelType from `Support` This is rework of; - D30046 (LLT) Since I have introduced `llvm-min-tblgen` as D146352, `llvm-tblgen` may depend on `CodeGen`. `LowLevlType.h` originally belonged to `CodeGen`. Almost all userse are still under `CodeGen` or `Target`. I think `CodeGen` is the right place to put `LowLevelType.h`. `MachineValueType.h` may be moved as well. (later, D149024) I have made many modules depend on `CodeGen`. It is consistent but inefficient. It will be split out later, D148769 Besides, I had to isolate MVT and LLT in modmap, since `llvm::PredicateInfo` clashes between `TableGen/CodeGenSchedule.h` and `Transforms/Utils/PredicateInfo.h`. (I think better to introduce namespace llvm::TableGen) Depends on D145937, D146352, and D148768. Differential Revision: https://reviews.llvm.org/D148767	2023-05-03 00:13:19 +09:00
Jay Foad	6443c0ee02	[AMDGPU] Stop using make_pair and make_tuple. NFC. C++17 allows us to call constructors pair and tuple instead of helper functions make_pair and make_tuple. Differential Revision: https://reviews.llvm.org/D139828	2022-12-14 13:22:26 +00:00
Pierre van Houtryve	4d39552abe	[AMDGPU][NFC] Remove isLegalVOP3PShuffleMask Unused function since D134967 Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D138493	2022-11-22 14:31:59 +00:00
Jay Foad	ea09a426a9	[AMDGPU] Assume getDefIgnoringCopies will succeed. NFC. getDefIgnoringCopies and getSrcRegIgnoringCopies should not fail on valid MIR, so don't bother to check for failure. Differential Revision: https://reviews.llvm.org/D136238	2022-10-19 11:10:00 +01:00
Ivan Kosarev	5db8d6fd2b	[AMDGPU][CodeGen] Support (base \| offset) SMEM loads. Prevents generation of unnecessary s_or_b32 instructions. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D132552	2022-09-05 14:22:06 +01:00
Joe Nash	ae72fee74e	[AMDGPU] gfx11 Select on Buffer Atomic FAdd Rtn type Reviewed By: #amdgpu, foad, rampitec Differential Revision: https://reviews.llvm.org/D128205	2022-06-23 11:05:32 -04:00
Mirko Brkusanin	4b422708ba	[AMDGPU][GlobalISel] Handle G_PTR_ADD when looking for constant offset Look throught G_PTRTOINT and G_PTR_ADD nodes when looking for constant offset for buffer stores. This also helps with merging of these instructions later on. Differential Revision: https://reviews.llvm.org/D95242	2021-01-28 11:20:09 +01:00
Jay Foad	0ad4d04002	[AMDGPU] Remove an unused return value. NFC. Differential Revision: https://reviews.llvm.org/D91063	2020-11-10 09:15:14 +00:00
Matt Arsenault	72eef820d5	AMDGPU/GlobalISel: Select G_SHUFFLE_VECTOR G_SHUFFLE_VECTOR is legal since it theoretically may help match op_sel for VOP3P instructions. Expand it in some other way in case it doesn't fold into the use instructions.	2020-02-21 13:35:40 -05:00
Matt Arsenault	9861a8538c	AMDGPU/GlobalISel: Add new utils file There are some things that are shareable between the legalizer, regbankselect, and the selector that don't have an obvious place to go.	2020-01-03 15:25:50 -05:00

18 Commits