llvm-project

Author	SHA1	Message	Date
Fabian Ritter	db597084c5	[AMDGPU][docs] Replace gfx940 and gfx941 with gfx942 in llvm/docs (#126887 ) gfx940 and gfx941 are no longer supported. This is one of a series of PRs to remove them from the code base. This PR removes all documentation occurrences of gfx940/gfx941 except for the gfx940 ISA description, which will be the subject of a separate PR. For SWDEV-512631	2025-02-19 10:31:47 +01:00
Fabian Ritter	8615f9aaff	[AMDGPU] Replace gfx940 and gfx941 with gfx942 in llvm (#126763 ) gfx940 and gfx941 are no longer supported. This is one of a series of PRs to remove them from the code base. This PR removes all non-documentation occurrences of gfx940/gfx941 from the llvm directory, and the remaining occurrences in clang. Documentation changes will follow. For SWDEV-512631	2025-02-19 10:20:48 +01:00
Krzysztof Drewniak	f7d03707d1	[AMDGPU] Generalize amdgcn.make.buffer.rsrc to fat pointers (#126828 ) Attempting to pass a `ptr addrspace(7)` to functions that take `ptr` arguments produces undesirable `addrspacecast(addrspacecast(p8 x to p7) to p0) => addrspacecast(p8 x to p0)` folds. This results in illegal GEP operations on buffer resources, which can't be GEP'd. (However, note that, while unimplemneted, addressspacecast from ptr addrspace(7) to ptr is legal - it's just an effective address computation) To resolve this problem, and thus prevent illegal `getelementptr T, ptr addrspace(8) %x, ...` s from being produces, this commit extends amdgcn.make.buffer.rsrc to also be variadic in its result type, auto-upgrading old manglings. The logic for handling a make.buffer.rsrc in instruction selection remains untouched and expects the output type to be a ptr addrspace(8), as does the Clang lowering for its builtin (the pointer-to-pointer version might want a different name in clang). LowerBufferFatPointers has been updated to lower amdgcn.make.buffer.rsrc.p7.p* to amdgcn.make.buffer.rsrc.p8.p* . This'll also make exposing buffer fat pointers in Clang easier, since you don't have to cast between a `__amdgcn_rsrc_t` and a pointer.	2025-02-18 14:15:28 -06:00
Stanislav Mekhanoshin	7639242155	[AMDGPU] Create new directive .amdhsa_inst_pref_size (#126622 ) The field INST_PREF_SIZE is available since gfx11.	2025-02-11 08:35:45 -08:00
Konstantin Zhuravlyov	fc4210fb6c	AMDGPU/Docs: Fix target properties for gfx9-4-generic (#125593 ) gfx9-4-generic has architected flat scratch, not absolute	2025-02-04 21:47:43 -05:00
Carl Ritson	1f38d38d54	[AMDGPU] Fix documentation table formatting from #118750 (NFC)	2025-01-30 14:27:25 +09:00
Carl Ritson	a3a3e6997b	[AMDGPU] Rewrite GFX12 SGPR hazard handling to dedicated pass (#118750 ) - Algorithm operates over whole IR to attempt to minimize waits. - Add support for VALU->VALU SGPR hazards via VA_SDST/VA_VCC.	2025-01-30 11:21:11 +09:00
Jeffrey Byrnes	db1ee18eda	NFC: Typo fix Change-Id: I08470bc617490558250136ea35a4964003fa9981	2025-01-24 15:59:13 -08:00
Jun Wang	77c23fd0aa	[AMDGPU] Update AMDGPUUsage.rst to document two intrinsics (#123816 ) The AMDGPUUsage.rst file is updated to document two intrinsics: llvm.amdgcn.mov.dpp and llvm.amdgcn.update.dpp.	2025-01-24 14:12:18 -08:00
Austin Kerbow	2e5c298281	[AMDGPU] Add backward compatibility layer for kernarg preloading (#119167 ) Add a prologue to the kernel entry to handle cases where code designed for kernarg preloading is executed on hardware equipped with incompatible firmware. If hardware has compatible firmware the 256 bytes at the start of the kernel entry will be skipped. This skipping is done automatically by hardware that supports the feature. A pass is added which is intended to be run at the very end of the pipeline to avoid any optimizations that would assume the prologue is a real predecessor block to the actual code start. In reality we have two possible entry points for the function. 1. The optimized path that supports kernarg preloading which begins at an offset of 256 bytes. 2. The backwards compatible entry point which starts at offset 0.	2025-01-10 11:39:02 -08:00
Austin Kerbow	aebd3389a9	[AMDGPU] Fix user SGPR alloc order in docs (#119092 ) NFC. Preload kernarg SGPRs are allocated after the private segment size SGPR. This patch updates AMDGPUUsage.rst to reflect this.	2024-12-07 13:08:35 -08:00
Jeffrey Byrnes	9ac52ce8d6	[AMDGPU] Add iglp_opt(3) for simple mfma / exp interleaving (#117269 ) Adds a minimal iglp_opt to do simple exp / mfma interleaving.	2024-12-06 15:19:07 -08:00
Shilei Tian	17cfd016b4	[AMDGPU][Doc] Add `gfx950` to `gfx9-4-generic` in the document	2024-12-03 11:17:22 -05:00
Matt Arsenault	d1cca3133a	AMDGPU: Add v_permlane16_swap_b32 and v_permlane32_swap_b32 for gfx950 (#117260 ) This was a bit annoying because these introduce a new special case encoding usage. op_sel is repurposed as a subset of dpp controls, and is eligible for VOP3->VOP1 shrinking. For some reason fi also uses an enum value, so we need to convert the raw boolean to 1 instead of -1. The 2 registers are swapped, so this has 2 defs. Ideally the builtin would return a pair, but that's difficult so return a vector instead. This would make a hypothetical builtin that supports v2f16 directly uglier.	2024-11-22 20:12:50 -08:00
Matt Arsenault	01c9a14ccf	AMDGPU: Define v_mfma_f32_{16x16x128\|32x32x64}_f8f6f4 instructions (#116723 ) These use a new VOP3PX encoding for the v_mfma_scale_* instructions, which bundles the pre-scale v_mfma_ld_scale_b32. None of the modifiers are supported yet (op_sel, neg or clamp). I'm not sure the intrinsic should really expose op_sel (or any of the others). If I'm reading the documentation correctly, we should be able to just have the raw scale operands and auto-match op_sel to byte extract patterns. The op_sel syntax also seems extra horrible in this usage, especially with the usual assumed op_sel_hi=-1 behavior.	2024-11-21 08:51:58 -08:00
Matt Arsenault	5a556d55fb	AMDGPU: Increase the LDS size to support to 160 KB for gfx950 (#116309 )	2024-11-18 10:48:56 -08:00
Matt Arsenault	a6fc489bb7	AMDGPU: Add gfx950 subtarget definitions (#116307 ) Mostly a stub, but adds some baseline tests and tests for removed instructions.	2024-11-18 10:41:14 -08:00
Diana Picus	2aa6cedfa8	[AMDGPU] Clarify amdgpu.cs.chain + init whole wave. NFC (#115452 ) Add some docs clarifying how inactive lanes are handled in the amdgpu_cs_chain calling convention when the llvm.amdgcn.init.whole.wave intrinsic is used.	2024-11-14 10:10:33 +01:00
Shilei Tian	de0fd64bed	[AMDGPU] Introduce a new generic target `gfx9-4-generic` (#115190 ) This patch introduces a new generic target, `gfx9-4-generic`. Since it doesn’t support FP8 and XF32-related instructions, the patch includes several code reorganizations to accommodate these changes.	2024-11-12 23:11:05 -05:00
Matt Arsenault	0b40f97929	AMDGPU: Treat uint32_max as the default value for amdgpu-max-num-workgroups (#113751 ) 0 does not make sense as a value for this to be, much less the default. Also stop emitting each individual field if it is the default, rather than if any element was the default. Also fix the name of the test since it didn't exactly match the real attribute name.	2024-11-05 12:50:44 -08:00
Carl Ritson	076aac59ac	[AMDGPU] Add a new target for gfx1153 (#113138 )	2024-10-23 12:56:58 +09:00
Jay Foad	e7f1dae412	[AMDGPU] gfx1152 does not have Feature1_5xVGPRs (#113163 )	2024-10-22 11:12:00 +01:00
Petar Avramovic	7b0d56be1d	AMDGPU/GlobalISel: Fix inst-selection of ballot (#109986 ) Both input and output of ballot are lane-masks: result is lane-mask with 'S32/S64 LLT and SGPR bank' input is lane-mask with 'S1 LLT and VCC reg bank'. Ballot copies bits from input lane-mask for all active lanes and puts 0 for inactive lanes. GlobalISel did not set 0 in result for inactive lanes for non-constant input.	2024-10-11 11:40:27 +02:00
Pierre van Houtryve	924a64a348	[AMDGPU] Only emit SCOPE_SYS global_wb (#110636 ) global_wb with scopes lower than SCOPE_SYS is unnecessary for correctness. I was initially optimistic they would be very cheap no-ops but they can actually be quite expensive so let's avoid them.	2024-10-07 07:35:31 +02:00
Austin Kerbow	c4d89203f3	[AMDGPU] Support preloading hidden kernel arguments (#98861 ) Adds hidden kernel arguments to the function signature and marks them inreg if they should be preloaded into user SGPRs. The normal kernarg preloading logic then takes over with some additional checks for the correct implicitarg_ptr alignment. Special care is needed so that metadata for the hidden arguments is not added twice when generating the code object.	2024-10-06 17:44:33 -07:00
Jakub Kuderski	5d45815473	[docs][amdgpu] Update kernarg documentation for gfx90a (#109690 ) Update the docs to mention that kernel argument preloading is not supported on MI210.	2024-09-30 13:51:41 -04:00
Janek van Oirschot	c897c13dde	[AMDGPU] Convert AMDGPUResourceUsageAnalysis pass from Module to MF pass (#102913 ) Converts AMDGPUResourceUsageAnalysis pass from Module to MachineFunction pass. Moves function resource info propagation to to MC layer (through helpers in AMDGPUMCResourceInfo) by generating MCExprs for every function resource which the emitters have been prepped for. Fixes https://github.com/llvm/llvm-project/issues/64863	2024-09-30 11:43:34 +01:00
Scott Egerton	396f677514	[AMDGPU] Remove unused VGPRSingleUseHintInsts feature (#109769 )	2024-09-24 10:58:00 +01:00
Jay Foad	8663a75fa2	[AMDGPU] Add link to RDNA 3.5 docs (#108977 )	2024-09-17 16:32:27 +01:00
Pierre van Houtryve	eaac4a2613	[AMDGPU] Document & Finalize GFX12 Memory Model (#98599 ) Documents the memory model implemented as of #98591, with some fixes/optimizations to the implementation.	2024-09-09 15:35:28 +02:00
Scott Linder	9171881d64	[AMDGPU][Docs] DWARF aspace-aware base types (post-review fixes)	2024-09-04 22:19:25 +00:00
Aarni Koskela	df5840f9f0	[AMDGPU][Docs] Update product names for some targets (#106973 ) Based on https://rocm.docs.amd.com/projects/install-on-linux/en/latest/reference/system-requirements.html#supported-gpus.	2024-09-04 16:58:17 +04:00
Scott Linder	22825ddd88	[AMDGPU][Docs] DWARF aspace-aware base types Propose an extension to base type DIEs such that DW_ATE_address-encoded base types can include an architecture specific address space. Use this to implement DW_OP_convert conversions between AMDGPU address space addresses where meaningful.	2024-08-19 19:55:15 +00:00
lancesix	cc78639453	[AMDGPU][NFC] AMDGPUUsage.rst: document corefile format (#104419 ) This patch adds a description of the core file format used for AMDGPU. Reference implementation for creating and loading AMDGPU core dump is available in [ROCgdb-6.2](https://github.com/ROCm/ROCgdb/tree/rocm-6.2.x/gdb)	2024-08-16 12:22:19 +02:00
pvanhout	db27905a0b	[AMDGPU] Remove trailing spaces in AMDGPUUsage.rst	2024-07-12 09:02:46 +02:00
Matt Arsenault	62d949766b	AMDGPU: Add description for new atomicrmw metadata (#85052 ) Add a spec for yet-to-be-implemented metadata to allow the backend to fully handle atomicrmw lowering. This is the base of an alternative to #69229, which inverts the direction to be correct by default, and extends to cover the peer device case.	2024-07-10 17:39:04 +04:00
Vikram Hegde	35f7b60aa6	[AMDGPU] Extend permlane16, permlanex16 and permlane64 intrinsic lowering for generic types (#92725 ) These are incremental changes over #89217 , with core logic being the same. This patch along with #89217 and #91190 should get us ready to enable 64 bit optimizations in atomic optimizer.	2024-06-26 09:24:09 +05:30
Vikram Hegde	5feb32ba92	[AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (#89217 ) This patch is intended to be the first of a series with end goal to adapt atomic optimizer pass to support i64 and f64 operations (along with removing all unnecessary bitcasts). This legalizes 64 bit readlane, writelane and readfirstlane ops pre-ISel --------- Co-authored-by: vikramRH <vikhegde@amd.com>	2024-06-25 14:35:19 +05:30
Nicolai Hähnle	4a70981d21	AMDGPU/gfx12: Minor documentation update (#96079 )	2024-06-19 16:49:18 +02:00
Pierre van Houtryve	a45080f091	[AMDGPU] Document amdgpu-as in AMDGPUUsage (#94335 ) Add a section about fence & address spaces that covers amdgpu-as.	2024-06-11 14:31:26 +02:00
Shilei Tian	1ca0055f45	[AMDGPU] Add a new target gfx1152 (#94534 )	2024-06-06 12:16:11 -04:00
Krzysztof Drewniak	e31bfc040a	[AMDGPU] Strengthen preload intrinsics to noundef and nonnull (#92801 ) The various preloaded registers (workitem IDs, workgroup IDs, and various implicit pointers) always have a finite, invariant, well-defined value throughout a well-defined program. In cases where the compiler infers or the user declares that some implicit input will not be used (ex. via amdgcn-no-workitem-id-y), the behavior of the entire program is undefined, since that misdeclaration can cause arbitrary other preloaded-register intrinsics to access the wrong register. This case is not expected to arise in practice, but could occur when the no implicit argument attributes were not cleared correctly in the presence of external functions, indrect calls, or other means of executing un-analyzable code. Failure to detect that case would be a bug in the attributor. This commit updates the documentation to reflect this long-standing reality. Then, on the basis that all implicit arguments are defined in all correct programs, the intrinsics that return those values are annototated with `noundef``. Some implicit pointer arguments gain a `nonnull`, but the kernel argument segment pointer or implicit argument pointers don't necessarily have this property. This will prevent spurious calls to `freeze` in front-end optimizations that destroy user-provided ranges on built-in IDs. (While I'm here, this commit adds a test for `noundef` on kernel arguments which is currently unimplemented)	2024-06-03 16:37:08 -05:00
Konstantin Zhuravlyov	775f1cd34d	AMDGPU: Add gfx12-generic target (#93875 )	2024-05-31 12:46:44 -04:00
Konstantin Zhuravlyov	949ef57dd2	AMDGPU/NFC: Reserve 0x058 EF_AMDGPU_MACHs (#93696 )	2024-05-29 12:52:34 -04:00
Lu Weining	74014b5a34	Fix typo in AMDGPUUsage. NFC (#93652 ) The vendor name is mesa but not mesa3d.	2024-05-29 17:39:38 +08:00
Konstantin Zhuravlyov	315a83145b	AMDGPU/NFC: Reserve 0x056 and 0x057 EF_AMDGPU_MACHs (#92917 )	2024-05-21 13:35:39 -04:00
Krzysztof Drewniak	ac0d415552	Update documentation for buffer fat pointers (#92034 ) Now that we've got (minus some issues around datatypes and invariant loads) working lowerings for address space 7, update the table in the AMDGPU usage guide to properly indicate the nature of these address spaces.	2024-05-14 10:03:48 -05:00
Matt Arsenault	d654278bde	Reapply "AMDGPU: Implement llvm.set.rounding (#88587 )" series (#91113 ) Revert "Revert 4 last AMDGPU commits to unbreak Windows bots" This reverts commit 0d493ed2c6e664849a979b357a606dcd8273b03f. MSVC does not like constexpr on the definition after an extern declaration of a global.	2024-05-06 09:09:19 +02:00
Mehdi Amini	0d493ed2c6	Revert 4 last AMDGPU commits to unbreak Windows bots Revert "AMDGPU: Try to fix build error with old gcc" This reverts commit c7ad12d0d7606b0b9fb531b0b273bdc5f1490ddb. Revert "AMDGPU: Use umin in set.rounding expansion" This reverts commit a56f0b51dd988ad2b533de759c98457c1ed42456. Revert "AMDGPU: Optimize set_rounding if input is known to fit in 2 bits (#88588)" This reverts commit b4e751e2ab0ff152ed18dea59ebf9691e963e1dd. Revert "AMDGPU: Implement llvm.set.rounding (#88587)" This reverts commit 9731b77e80261c627d79980f8c275700bdaf6591.	2024-05-04 19:57:33 +02:00
Matt Arsenault	9731b77e80	AMDGPU: Implement llvm.set.rounding (#88587 ) Use a shift of a magic constant and some offseting to convert from flt_rounds values. I don't know why the enum defines Dynamic = 7. The standard suggests -1 is the cannot determine value. If we could start the extended values at 4 we wouldn't need the extra compare sub and select. https://reviews.llvm.org/D153257	2024-05-03 09:41:27 +02:00

1 2 3 4 5 ...

383 Commits