llvm-project

Author	SHA1	Message	Date
Matt Arsenault	c13556c0b0	AMDGPU: Document more backend recognized attributes (#80239 )	2024-03-28 14:27:14 +03:00
Matt Arsenault	b6b703b2df	AMDGPU: Infer no-agpr usage in AMDGPUAttributor (#85948 ) SIMachineFunctionInfo has a scan of the function body for inline asm which may use AGPRs, or callees in SIMachineFunctionInfo. Move this into the attributor, so it actually works interprocedurally. Could probably avoid most of the test churn if this bothered to avoid adding this on subtargets without AGPRs. We should also probably try to delete the MIR scan in usesAGPRs but it seems to be trickier to eliminate.	2024-03-21 14:24:06 +05:30
Janek van Oirschot	f7bebc1914	Reland [AMDGPU] Add AMDGPU specific variadic operation MCExprs (#84562 ) Adds AMDGPU specific variadic MCExpr operations 'max' and 'or'. Relands #82022 with fixes	2024-03-14 14:31:00 +00:00
Matt Arsenault	5c3d001668	AMDGPU: Don't use table for metadata docs, and fix section headers (#85046 )	2024-03-13 18:34:23 +05:30
Matt Arsenault	cd2f616313	AMDGPU: Use list-table for metadata table (#85024 ) The table syntax for sphinx is really insufferably whitespace dependent. I've been meaning to convert the existing attribute and intrinsic tables to use list-table, which is less painful to merge.	2024-03-13 12:42:15 +05:30
Jun Wang	c4e517f59c	[AMDGPU] Adding the amdgpu_num_work_groups function attribute (#79035 ) A new function attribute named amdgpu_num_work_groups is added. This attribute, which consists of three integers, allows programmers to let the compiler know the number of workgroups to be launched in each of the three dimensions and do optimizations based on that information. --------- Co-authored-by: Jun Wang <jun.wang7@amd.com>	2024-03-12 10:30:39 -07:00
Pierre van Houtryve	63c77d8475	[AMDGPU] Make generic versioning docs easier to find (#84761 )	2024-03-11 15:56:17 +01:00
Florian Mayer	0083c3eb83	Revert "[AMDGPU] Add AMDGPU specific variadic operation MCExprs" (#84273 ) Reverts llvm/llvm-project#82022 Fails on hwasan build bot: https://lab.llvm.org/buildbot/#/builders/236/builds/9874/steps/10/logs/stdio	2024-03-06 19:37:49 -08:00
Janek van Oirschot	bec2d105c7	[AMDGPU] Add AMDGPU specific variadic operation MCExprs (#82022 ) Adds AMDGPU specific variadic MCExpr operations 'max' and 'or'.	2024-03-06 21:01:54 +00:00
Mirko Brkušanin	1fd1f4c0e1	[AMDGPU] Handle amdgpu.last.use metadata (#83816 ) Convert !amdgpu.last.use metadata into MachineMemOperand for last use and handle it in SIMemoryLegalizer similar to nontemporal and volatile.	2024-03-06 16:33:52 +01:00
Joseph Huber	1fc5e50ceb	[AMDGPU] Implement 'llvm.get.fpenv' and 'llvm.set.fpenv' (#83906 ) Summary: This patch implements the LLVM floating point environment control intrinsics and also exposes it through clang. We encode the floating point environment as a 64-bit value that simply concatenates the values of the mode registers and the current trap status. We only fetch the bits relevant for floating point instructions. That is, rounding mode, denormalization mode, ieee, dx10 clamp, debug, enabled traps, f16 overflow, and active exceptions.	2024-03-06 08:11:54 -06:00
Pierre van Houtryve	43c7eb5d7b	[AMDGPU] Replace '.' with '-' in generic target names (#81718 ) The dot is too confusing for tools. Output temporaries would have '10.3-generic' so tools could parse it as an extension, device libs & the associated clang driver logic are also confused by the dot. After discussions, we decided it's better to just remove the '.' from the target name than fix each issue one by one.	2024-02-14 15:19:04 +01:00
Pierre van Houtryve	87d7711934	[AMDGPU][SIMemoryLegalizer] Fix order of GL0/1_INV on GFX10/11 (#81450 ) Fixes SWDEV-443292	2024-02-13 09:07:51 +01:00
Austin Kerbow	4bcbeaed63	[AMDGPU] Enable kernel arg preloading with gfx90a (#81180 ) Add a trap instruction to the beginning of the kernel prologue to handle cases where preloading is attempted on HW loaded with incompatible firmware.	2024-02-12 22:33:29 -08:00
Konstantin Zhuravlyov	75a1c4e10b	AMDGPU/NFC: Reserve 0x055 MACH in e_flag for future use (#81501 )	2024-02-12 13:37:25 -05:00
Mariusz Sikora	0c63453714	[AMDGPU][NFC] Docs - remove duplicates (#81465 )	2024-02-12 12:25:54 +01:00
Pierre van Houtryve	f93aa5157a	[AMDGPU] Introduce GFX9/10.1/10.3/11 Generic Targets (#76955 ) These generic targets include multiple GPUs and will, in the future, provide a way to build once and run on multiple GPU, at the cost of less optimization opportunities. Note that this is just doing the compiler side of things, device libs an runtimes/loader/etc. don't know about these targets yet, so none of them actually work in practice right now. This is just the initial commit to make LLVM aware of them. This contains the documentation changes for both this change and #76954 as well.	2024-02-12 10:18:20 +01:00
pvanhout	f5399e89a2	Remove trailing whitespaces in AMDGPUUsage.rst	2024-02-12 09:30:10 +01:00
Corbin Robeck	fcb59203c8	[AMDGPU][DOC] Add MI200 Names to AMDGPUUsage Doc (#81252 )	2024-02-09 10:05:26 -05:00
Jan Patrick Lehr	f661057865	Revert "[AMDGPU] Compiler should synthesize private buffer resource descriptor from flat_scratch_init" (#81234 ) Reverts llvm/llvm-project#79586 This broke the AMDGPU OpenMP Offload buildbot. The typical error message was that the GPU attempted to read beyong the largest legal address. Error message: AMDGPU fatal error 1: Received error in queue 0x7f8363f22000: HSA_STATUS_ERROR_MEMORY_APERTURE_VIOLATION: The agent attempted to access memory beyond the largest legal address.	2024-02-09 09:57:38 +01:00
alex-t	88e52511ca	[AMDGPU] Compiler should synthesize private buffer resource descriptor from flat_scratch_init (#79586 ) This change implements synthesizing the private buffer resource descriptor in the kernel prolog instead of using the preloaded kernel argument.	2024-02-08 20:27:36 +01:00
Saiyedul Islam	082f87c9d4	[AMDGPU] Change default AMDHSA Code Object version to 5 (#79038 ) Also update LIT tests and docs. For more details, see https://llvm.org/docs/AMDGPUUsage.html#code-object-v5-metadata Corresponding llvm-objdump AMDGPU lit tests are updated in a follow-up PR.	2024-01-23 17:08:18 +05:30
Konstantin Zhuravlyov	726d940586	AMDGPU/Docs: Add link to MI300 Instruction Set Architecture (#78777 )	2024-01-22 10:32:35 -05:00
Emma Pilkington	bc82cfb38d	[AMDGPU] Add an asm directive to track code_object_version (#76267 ) Named '.amdhsa_code_object_version'. This directive sets the e_ident[ABIVERSION] in the ELF header, and should be used as the assumed COV for the rest of the asm file. This commit also weakens the --amdhsa-code-object-version CL flag. Previously, the CL flag took precedence over the IR flag. Now the IR flag/asm directive take precedence over the CL flag. This is implemented by merging a few COV-checking functions in AMDGPUBaseInfo.h.	2024-01-21 11:54:47 -05:00
Jay Foad	9ca36932b5	[AMDGPU] Work around s_getpc_b64 zero extending on GFX12 (#78186 )	2024-01-18 10:23:27 +00:00
Mariusz Sikora	c99da46fc1	[AMDGPU][GFX12] Add Atomic cond_sub_u32 (#76224 ) Co-authored-by: Vang Thao <Vang.Thao@amd.com>	2024-01-17 19:23:42 +01:00
Chaitanya	9803de0e8e	[AMDGPU] Add dynamic LDS size implicit kernel argument to CO-v5 (#65273 ) "hidden_dynamic_lds_size" argument will be added in the reserved section at offset 120 of the implicit argument layout. Add "isDynamicLDSUsed" flag to AMDGPUMachineFunction to identify if a function uses dynamic LDS. hidden argument will be added in below cases: - LDS global is used in the kernel. - Kernel calls a function which uses LDS global. - LDS pointer is passed as argument to kernel itself.	2024-01-04 19:05:12 +05:30
Jay Foad	c01e844a7e	[AMDGPU] Update compute program resource registers for GFX12 (#75911 ) Co-authored-by: Konstantin Zhuravlyov <kzhuravl@amd.com>	2024-01-02 13:24:42 +00:00
Jeffrey Byrnes	f1156fb622	[AMDGPU][IGLP]: Add SchedGroupMask::TRANS (#75416 ) Makes constructing SchedGroups of this type easier, and provides ability to create them with __builtin_amdgcn_sched_group_barrier	2023-12-19 16:54:18 -08:00
Jessica Del	32f9983c06	[AMDGPU] - Add address space for strided buffers (#74471 ) This is an experimental address space for strided buffers. These buffers can have structs as elements and a stride > 1. These pointers allow the indexed access in units of stride, i.e., they point at `buffer[index * stride]`. Thus, we can use the `idxen` modifier for buffer loads. We assign address space 9 to 192-bit buffer pointers which contain a 128-bit descriptor, a 32-bit offset and a 32-bit index. Essentially, they are fat buffer pointers with an additional 32-bit index.	2023-12-15 15:49:25 +01:00
Piotr Sobczak	fac093dd08	[AMDGPU] Update IEEE and DX10_CLAMP for GFX12 (#75030 ) Co-authored-by: Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>	2023-12-13 13:52:40 +01:00
Michael Halkenhaeuser	19fa27605c	[NFC][docs] Add AMDGPU documentation for `LIBOMPTARGET_STACK_SIZE` Add documentation w.r.t. changes by #72606, which allows to set the dynamic callstack size.	2023-11-28 14:09:42 -05:00
Jay Foad	a1b3b78c55	[AMDGPU] Clarify description of _HI relocation types (#73663 ) Clarify how the addend is used in _HI relocation types like R_AMDGPU_ABS32_HI based on the current behaviour of the Mesa and AMDPAL ELF loaders. This affects Mesa and AMDPAL because they use REL relocation records, so the addend for these types is the 32-bit literal value from the instruction being relocated. AMDHSA is not affected because it uses RELA relocation records which have a 64-bit addend.	2023-11-28 17:06:11 +00:00
Jay Foad	82a5708c7a	[AMDGPU] Document that PAL uses Elf64_Rel relocation records (#73648 )	2023-11-28 14:37:10 +00:00
Jay Foad	cf1e0c0b07	[AMDGPU] Define new targets gfx1200 and gfx1201 (#73133 ) Define target names and ELF numbers for new GFX12 targets gfx1200 and gfx1201. For now they behave identically to GFX11.	2023-11-23 16:44:05 +00:00
Pierre van Houtryve	4428b01faa	Reland: [AMDGPU] Remove Code Object V3 (#67118 ) V3 has been deprecated for a while as well, so it can safely be removed like V2 was removed. - [Clang] Set minimum code object version to 4 - [lld] Fix tests using code object v3 - Remove code object V3 from the AMDGPU backend, and delete or port v3 tests to v4. - Update docs to make it clear V3 can no longer be emitted.	2023-11-07 12:23:03 +01:00
bcahoon	f4b54f799a	[AMDGPU] Add documentation for scheduler intrinsics (#69854 ) Adding sched_barrier, sched_group_barrier, and iglp_opt.	2023-11-02 18:47:45 -05:00
Konstantin Zhuravlyov	8b36a19b3f	AMDGPU/Docs: Memory model updates for GFX940, GFX941, GFX942 (#71091 ) - Update memory model sequences for GFX940, GFX941, GFX942 to match implementation - Re-title "Memory Model GFX940" to "Memory Model GFX942" Co-authored with @t-tye Change-Id: I82f1707b7c3e010ce1fe8207fcca18c4570057a3 Co-authored-by: Konstantin Zhuravlyov <kzhuravl@amd.com>	2023-11-02 14:48:21 -04:00
Konstantin Zhuravlyov	8b61ef0925	AMDGPU/Docs: Add links to instruction descriptions for gfx941, gfx942 (#70941 ) Co-authored-by: Konstantin Zhuravlyov <kzhuravl@amd.com>	2023-11-01 14:14:48 -04:00
Austin Kerbow	d681461098	[AMDGPU] Add doc updates for kernarg preloading (#67516 )	2023-10-19 13:43:35 -07:00
pvanhout	868abf0961	Revert "[AMDGPU] Remove Code Object V3 (#67118 )" This reverts commit 544d91280c26fd5f7acd70eac4d667863562f4cc.	2023-10-18 12:55:36 +02:00
Pierre van Houtryve	544d91280c	[AMDGPU] Remove Code Object V3 (#67118 ) V3 has been deprecated for a while as well, so it can safely be removed like V2 was removed. - [Clang] Set minimum code object version to 4 - [lld] Fix tests using code object v3 - Remove code object V3 from the AMDGPU backend, and delete or port v3 tests to v4. - Update docs to make it clear V3 can no longer be emitted.	2023-10-16 08:21:48 +02:00
Pierre van Houtryve	fe2f67e4ba	[AMDGPU] Remove Code Object V2 (#65715 ) Code Object V2 has been deprecated for more than a year now. We can safely remove it from LLVM. - [clang] Remove support for the `-mcode-object-version=2` option. - [lld] Remove/refactor tests that were still using COV2 - [llvm] Update AMDGPUUsage.rst - Code Object V2 docs are left for informational purposes because those code objects may still be supported by the runtime/loaders for a while. - [AMDGPU] Remove COV2 emission capabilities. - [AMDGPU] Remove `MetadataStreamerYamlV2` which was only used by COV2 - [AMDGPU] Update all tests that were still using COV2 - They are either deleted or ported directly to code object v4 (as v3 is also planned to be removed soon).	2023-09-21 12:00:45 +02:00
Saiyedul Islam	466a8149b3	Revert "[AMDGPU] Make default AMDHSA Code Object Version to be 5 (#65410 )" (#66060 ) This reverts commit 0a8d17e79b02a92814a2a788d79df1f54d70ec3e.	2023-09-12 15:13:59 +05:30
Saiyedul Islam	0a8d17e79b	[AMDGPU] Make default AMDHSA Code Object Version to be 5 (#65410 ) Also update LIT tests and docs. For more details, see https://llvm.org/docs/AMDGPUUsage.html#code-object-v5-metadata Reviewed By: arsenm, jhuber6 Github PR: #65410 Differential Revision: https://reviews.llvm.org/D129818	2023-09-12 13:53:31 +05:30
Matt Arsenault	17bd80601e	AMDGPU: Implement llvm.get.fpmode Currently s_getreg_b32 is missing the possible mode use. Really we need separate pseudos for mode-only accesses, but leave this as a pre-existing issue. https://reviews.llvm.org/D152710	2023-09-10 10:19:19 +03:00
Matt Arsenault	5f8ee45d5a	AMDGPU: Implement llvm.get.rounding There are really two rounding modes, so only return the standard values if both modes are the same. Otherwise, return a bitmask representing the two modes. Annoyingly the register doesn't use the same values as FLT_ROUNDS. Use a simple integer table we can shift into to convert. https://reviews.llvm.org/D153158	2023-08-30 14:06:13 -04:00
Kazu Hirata	3a14993fa4	Fix typos in documentation	2023-08-27 00:18:14 -07:00
Jeffrey Byrnes	3ba8dabbf3	[AMDGPU] Add sdot4 / sdot8 intrinsics for gfx11 This provides a uniform way to lower into the relevant instructions across all generations. Differential Revision: https://reviews.llvm.org/D158468 Change-Id: I1f7ba4b15ee470738535cf1c7d177a11fc471e43	2023-08-25 11:45:55 -07:00
Diana Picus	71eb8c07dd	[AMDGPU] Update amdgpu_cs_chain_preserve docs. NFC We no longer allow calls to functions with the `amdgpu_gfx` calling convention from functions with the `amdgpu_cs_chain_preserve` calling convention. See D153517. Also mention that we can't have a chain call from amdgpu_cs_chain_preserve using more VGPRs than it has received. Differential Revision: https://reviews.llvm.org/D156408	2023-08-24 10:17:00 +02:00

1 2 3 4 5 ...

331 Commits