llvm-project

Author	SHA1	Message	Date
Matt Arsenault	0b40f97929	AMDGPU: Treat uint32_max as the default value for amdgpu-max-num-workgroups (#113751 ) 0 does not make sense as a value for this to be, much less the default. Also stop emitting each individual field if it is the default, rather than if any element was the default. Also fix the name of the test since it didn't exactly match the real attribute name.	2024-11-05 12:50:44 -08:00
Carl Ritson	076aac59ac	[AMDGPU] Add a new target for gfx1153 (#113138 )	2024-10-23 12:56:58 +09:00
Jay Foad	e7f1dae412	[AMDGPU] gfx1152 does not have Feature1_5xVGPRs (#113163 )	2024-10-22 11:12:00 +01:00
Petar Avramovic	7b0d56be1d	AMDGPU/GlobalISel: Fix inst-selection of ballot (#109986 ) Both input and output of ballot are lane-masks: result is lane-mask with 'S32/S64 LLT and SGPR bank' input is lane-mask with 'S1 LLT and VCC reg bank'. Ballot copies bits from input lane-mask for all active lanes and puts 0 for inactive lanes. GlobalISel did not set 0 in result for inactive lanes for non-constant input.	2024-10-11 11:40:27 +02:00
Pierre van Houtryve	924a64a348	[AMDGPU] Only emit SCOPE_SYS global_wb (#110636 ) global_wb with scopes lower than SCOPE_SYS is unnecessary for correctness. I was initially optimistic they would be very cheap no-ops but they can actually be quite expensive so let's avoid them.	2024-10-07 07:35:31 +02:00
Austin Kerbow	c4d89203f3	[AMDGPU] Support preloading hidden kernel arguments (#98861 ) Adds hidden kernel arguments to the function signature and marks them inreg if they should be preloaded into user SGPRs. The normal kernarg preloading logic then takes over with some additional checks for the correct implicitarg_ptr alignment. Special care is needed so that metadata for the hidden arguments is not added twice when generating the code object.	2024-10-06 17:44:33 -07:00
Jakub Kuderski	5d45815473	[docs][amdgpu] Update kernarg documentation for gfx90a (#109690 ) Update the docs to mention that kernel argument preloading is not supported on MI210.	2024-09-30 13:51:41 -04:00
Janek van Oirschot	c897c13dde	[AMDGPU] Convert AMDGPUResourceUsageAnalysis pass from Module to MF pass (#102913 ) Converts AMDGPUResourceUsageAnalysis pass from Module to MachineFunction pass. Moves function resource info propagation to to MC layer (through helpers in AMDGPUMCResourceInfo) by generating MCExprs for every function resource which the emitters have been prepped for. Fixes https://github.com/llvm/llvm-project/issues/64863	2024-09-30 11:43:34 +01:00
Scott Egerton	396f677514	[AMDGPU] Remove unused VGPRSingleUseHintInsts feature (#109769 )	2024-09-24 10:58:00 +01:00
Jay Foad	8663a75fa2	[AMDGPU] Add link to RDNA 3.5 docs (#108977 )	2024-09-17 16:32:27 +01:00
Pierre van Houtryve	eaac4a2613	[AMDGPU] Document & Finalize GFX12 Memory Model (#98599 ) Documents the memory model implemented as of #98591, with some fixes/optimizations to the implementation.	2024-09-09 15:35:28 +02:00
Scott Linder	9171881d64	[AMDGPU][Docs] DWARF aspace-aware base types (post-review fixes)	2024-09-04 22:19:25 +00:00
Aarni Koskela	df5840f9f0	[AMDGPU][Docs] Update product names for some targets (#106973 ) Based on https://rocm.docs.amd.com/projects/install-on-linux/en/latest/reference/system-requirements.html#supported-gpus.	2024-09-04 16:58:17 +04:00
Scott Linder	22825ddd88	[AMDGPU][Docs] DWARF aspace-aware base types Propose an extension to base type DIEs such that DW_ATE_address-encoded base types can include an architecture specific address space. Use this to implement DW_OP_convert conversions between AMDGPU address space addresses where meaningful.	2024-08-19 19:55:15 +00:00
lancesix	cc78639453	[AMDGPU][NFC] AMDGPUUsage.rst: document corefile format (#104419 ) This patch adds a description of the core file format used for AMDGPU. Reference implementation for creating and loading AMDGPU core dump is available in [ROCgdb-6.2](https://github.com/ROCm/ROCgdb/tree/rocm-6.2.x/gdb)	2024-08-16 12:22:19 +02:00
pvanhout	db27905a0b	[AMDGPU] Remove trailing spaces in AMDGPUUsage.rst	2024-07-12 09:02:46 +02:00
Matt Arsenault	62d949766b	AMDGPU: Add description for new atomicrmw metadata (#85052 ) Add a spec for yet-to-be-implemented metadata to allow the backend to fully handle atomicrmw lowering. This is the base of an alternative to #69229, which inverts the direction to be correct by default, and extends to cover the peer device case.	2024-07-10 17:39:04 +04:00
Vikram Hegde	35f7b60aa6	[AMDGPU] Extend permlane16, permlanex16 and permlane64 intrinsic lowering for generic types (#92725 ) These are incremental changes over #89217 , with core logic being the same. This patch along with #89217 and #91190 should get us ready to enable 64 bit optimizations in atomic optimizer.	2024-06-26 09:24:09 +05:30
Vikram Hegde	5feb32ba92	[AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (#89217 ) This patch is intended to be the first of a series with end goal to adapt atomic optimizer pass to support i64 and f64 operations (along with removing all unnecessary bitcasts). This legalizes 64 bit readlane, writelane and readfirstlane ops pre-ISel --------- Co-authored-by: vikramRH <vikhegde@amd.com>	2024-06-25 14:35:19 +05:30
Nicolai Hähnle	4a70981d21	AMDGPU/gfx12: Minor documentation update (#96079 )	2024-06-19 16:49:18 +02:00
Pierre van Houtryve	a45080f091	[AMDGPU] Document amdgpu-as in AMDGPUUsage (#94335 ) Add a section about fence & address spaces that covers amdgpu-as.	2024-06-11 14:31:26 +02:00
Shilei Tian	1ca0055f45	[AMDGPU] Add a new target gfx1152 (#94534 )	2024-06-06 12:16:11 -04:00
Krzysztof Drewniak	e31bfc040a	[AMDGPU] Strengthen preload intrinsics to noundef and nonnull (#92801 ) The various preloaded registers (workitem IDs, workgroup IDs, and various implicit pointers) always have a finite, invariant, well-defined value throughout a well-defined program. In cases where the compiler infers or the user declares that some implicit input will not be used (ex. via amdgcn-no-workitem-id-y), the behavior of the entire program is undefined, since that misdeclaration can cause arbitrary other preloaded-register intrinsics to access the wrong register. This case is not expected to arise in practice, but could occur when the no implicit argument attributes were not cleared correctly in the presence of external functions, indrect calls, or other means of executing un-analyzable code. Failure to detect that case would be a bug in the attributor. This commit updates the documentation to reflect this long-standing reality. Then, on the basis that all implicit arguments are defined in all correct programs, the intrinsics that return those values are annototated with `noundef``. Some implicit pointer arguments gain a `nonnull`, but the kernel argument segment pointer or implicit argument pointers don't necessarily have this property. This will prevent spurious calls to `freeze` in front-end optimizations that destroy user-provided ranges on built-in IDs. (While I'm here, this commit adds a test for `noundef` on kernel arguments which is currently unimplemented)	2024-06-03 16:37:08 -05:00
Konstantin Zhuravlyov	775f1cd34d	AMDGPU: Add gfx12-generic target (#93875 )	2024-05-31 12:46:44 -04:00
Konstantin Zhuravlyov	949ef57dd2	AMDGPU/NFC: Reserve 0x058 EF_AMDGPU_MACHs (#93696 )	2024-05-29 12:52:34 -04:00
Lu Weining	74014b5a34	Fix typo in AMDGPUUsage. NFC (#93652 ) The vendor name is mesa but not mesa3d.	2024-05-29 17:39:38 +08:00
Konstantin Zhuravlyov	315a83145b	AMDGPU/NFC: Reserve 0x056 and 0x057 EF_AMDGPU_MACHs (#92917 )	2024-05-21 13:35:39 -04:00
Krzysztof Drewniak	ac0d415552	Update documentation for buffer fat pointers (#92034 ) Now that we've got (minus some issues around datatypes and invariant loads) working lowerings for address space 7, update the table in the AMDGPU usage guide to properly indicate the nature of these address spaces.	2024-05-14 10:03:48 -05:00
Matt Arsenault	d654278bde	Reapply "AMDGPU: Implement llvm.set.rounding (#88587 )" series (#91113 ) Revert "Revert 4 last AMDGPU commits to unbreak Windows bots" This reverts commit 0d493ed2c6e664849a979b357a606dcd8273b03f. MSVC does not like constexpr on the definition after an extern declaration of a global.	2024-05-06 09:09:19 +02:00
Mehdi Amini	0d493ed2c6	Revert 4 last AMDGPU commits to unbreak Windows bots Revert "AMDGPU: Try to fix build error with old gcc" This reverts commit c7ad12d0d7606b0b9fb531b0b273bdc5f1490ddb. Revert "AMDGPU: Use umin in set.rounding expansion" This reverts commit a56f0b51dd988ad2b533de759c98457c1ed42456. Revert "AMDGPU: Optimize set_rounding if input is known to fit in 2 bits (#88588)" This reverts commit b4e751e2ab0ff152ed18dea59ebf9691e963e1dd. Revert "AMDGPU: Implement llvm.set.rounding (#88587)" This reverts commit 9731b77e80261c627d79980f8c275700bdaf6591.	2024-05-04 19:57:33 +02:00
Matt Arsenault	9731b77e80	AMDGPU: Implement llvm.set.rounding (#88587 ) Use a shift of a magic constant and some offseting to convert from flt_rounds values. I don't know why the enum defines Dynamic = 7. The standard suggests -1 is the cannot determine value. If we could start the extended values at 4 we wouldn't need the extra compare sub and select. https://reviews.llvm.org/D153257	2024-05-03 09:41:27 +02:00
Emma Pilkington	68e814d911	[AMDGPU] Add disassembler diagnostics for invalid kernel descriptors (#87400 ) These mostly are checking for various reserved bits being set. The diagnostics for gpu-dependent reserved bits have a bit more context since they seem like the most likely ones to be observed in practice. This commit also improves the error handling mechanism for MCDisassembler::onSymbolStart(). Previously it had a comment stream parameter that was just being ignored by llvm-objdump, now it returns errors using Expected<T>.	2024-04-18 13:44:22 -04:00
Fabian Ritter	7b8625ec16	[AMDGPU][Docs] Fix broken link to HRF memory model reference (#88696 ) The link to the Heterogeneous-race-free Memory Models ASPLOS'14 paper by Hower et al. pointed to a bogus website, probably because the domain ownership has changed. This patch updates it to a version hosted on research.cs.wisc.edu.	2024-04-17 14:54:14 +02:00
Matt Arsenault	c13556c0b0	AMDGPU: Document more backend recognized attributes (#80239 )	2024-03-28 14:27:14 +03:00
Matt Arsenault	b6b703b2df	AMDGPU: Infer no-agpr usage in AMDGPUAttributor (#85948 ) SIMachineFunctionInfo has a scan of the function body for inline asm which may use AGPRs, or callees in SIMachineFunctionInfo. Move this into the attributor, so it actually works interprocedurally. Could probably avoid most of the test churn if this bothered to avoid adding this on subtargets without AGPRs. We should also probably try to delete the MIR scan in usesAGPRs but it seems to be trickier to eliminate.	2024-03-21 14:24:06 +05:30
Janek van Oirschot	f7bebc1914	Reland [AMDGPU] Add AMDGPU specific variadic operation MCExprs (#84562 ) Adds AMDGPU specific variadic MCExpr operations 'max' and 'or'. Relands #82022 with fixes	2024-03-14 14:31:00 +00:00
Matt Arsenault	5c3d001668	AMDGPU: Don't use table for metadata docs, and fix section headers (#85046 )	2024-03-13 18:34:23 +05:30
Matt Arsenault	cd2f616313	AMDGPU: Use list-table for metadata table (#85024 ) The table syntax for sphinx is really insufferably whitespace dependent. I've been meaning to convert the existing attribute and intrinsic tables to use list-table, which is less painful to merge.	2024-03-13 12:42:15 +05:30
Jun Wang	c4e517f59c	[AMDGPU] Adding the amdgpu_num_work_groups function attribute (#79035 ) A new function attribute named amdgpu_num_work_groups is added. This attribute, which consists of three integers, allows programmers to let the compiler know the number of workgroups to be launched in each of the three dimensions and do optimizations based on that information. --------- Co-authored-by: Jun Wang <jun.wang7@amd.com>	2024-03-12 10:30:39 -07:00
Pierre van Houtryve	63c77d8475	[AMDGPU] Make generic versioning docs easier to find (#84761 )	2024-03-11 15:56:17 +01:00
Florian Mayer	0083c3eb83	Revert "[AMDGPU] Add AMDGPU specific variadic operation MCExprs" (#84273 ) Reverts llvm/llvm-project#82022 Fails on hwasan build bot: https://lab.llvm.org/buildbot/#/builders/236/builds/9874/steps/10/logs/stdio	2024-03-06 19:37:49 -08:00
Janek van Oirschot	bec2d105c7	[AMDGPU] Add AMDGPU specific variadic operation MCExprs (#82022 ) Adds AMDGPU specific variadic MCExpr operations 'max' and 'or'.	2024-03-06 21:01:54 +00:00
Mirko Brkušanin	1fd1f4c0e1	[AMDGPU] Handle amdgpu.last.use metadata (#83816 ) Convert !amdgpu.last.use metadata into MachineMemOperand for last use and handle it in SIMemoryLegalizer similar to nontemporal and volatile.	2024-03-06 16:33:52 +01:00
Joseph Huber	1fc5e50ceb	[AMDGPU] Implement 'llvm.get.fpenv' and 'llvm.set.fpenv' (#83906 ) Summary: This patch implements the LLVM floating point environment control intrinsics and also exposes it through clang. We encode the floating point environment as a 64-bit value that simply concatenates the values of the mode registers and the current trap status. We only fetch the bits relevant for floating point instructions. That is, rounding mode, denormalization mode, ieee, dx10 clamp, debug, enabled traps, f16 overflow, and active exceptions.	2024-03-06 08:11:54 -06:00
Pierre van Houtryve	43c7eb5d7b	[AMDGPU] Replace '.' with '-' in generic target names (#81718 ) The dot is too confusing for tools. Output temporaries would have '10.3-generic' so tools could parse it as an extension, device libs & the associated clang driver logic are also confused by the dot. After discussions, we decided it's better to just remove the '.' from the target name than fix each issue one by one.	2024-02-14 15:19:04 +01:00
Pierre van Houtryve	87d7711934	[AMDGPU][SIMemoryLegalizer] Fix order of GL0/1_INV on GFX10/11 (#81450 ) Fixes SWDEV-443292	2024-02-13 09:07:51 +01:00
Austin Kerbow	4bcbeaed63	[AMDGPU] Enable kernel arg preloading with gfx90a (#81180 ) Add a trap instruction to the beginning of the kernel prologue to handle cases where preloading is attempted on HW loaded with incompatible firmware.	2024-02-12 22:33:29 -08:00
Konstantin Zhuravlyov	75a1c4e10b	AMDGPU/NFC: Reserve 0x055 MACH in e_flag for future use (#81501 )	2024-02-12 13:37:25 -05:00
Mariusz Sikora	0c63453714	[AMDGPU][NFC] Docs - remove duplicates (#81465 )	2024-02-12 12:25:54 +01:00
Pierre van Houtryve	f93aa5157a	[AMDGPU] Introduce GFX9/10.1/10.3/11 Generic Targets (#76955 ) These generic targets include multiple GPUs and will, in the future, provide a way to build once and run on multiple GPU, at the cost of less optimization opportunities. Note that this is just doing the compiler side of things, device libs an runtimes/loader/etc. don't know about these targets yet, so none of them actually work in practice right now. This is just the initial commit to make LLVM aware of them. This contains the documentation changes for both this change and #76954 as well.	2024-02-12 10:18:20 +01:00

1 2 3 4 5 ...

414 Commits