llvm-project

Author	SHA1	Message	Date
Sameer Sahasrabuddhe	f9adee2f6b	[AMDGPU] asyncmark support for ASYNC_CNT (#185813 ) Some checks failed Bazel Checks / Buildifier (push) Has been cancelled Details Bazel Checks / Bazel Build/Test (push) Has been cancelled Details Build CI Tooling Containers / Build Container abi-tests (push) Has been cancelled Details Build CI Tooling Containers / Build Container format (push) Has been cancelled Details Build CI Tooling Containers / Build Container lint (push) Has been cancelled Details Build Windows CI Container / build-ci-container-windows (push) Has been cancelled Details Build CI Container / Build Container X64 (push) Has been cancelled Details Build CI Container / Build Container ARM64 (push) Has been cancelled Details Build CI Container / Build Container agent X64 (push) Has been cancelled Details Build CI Container / Build Container agent ARM64 (push) Has been cancelled Details Build libc Container / Build libc container (ubuntu-24.04) (push) Has been cancelled Details Build libc Container / Build libc container (ubuntu-24.04-arm) (push) Has been cancelled Details Build Metrics Container / build-metrics-container (push) Has been cancelled Details Check CI Scripts / Check Python Tests (push) Has been cancelled Details Test documentation build / Test documentation build (push) Has been cancelled Details Libclang Python Binding Tests / Build and run Python unit tests (3.13) (push) Has been cancelled Details Libclang Python Binding Tests / Build and run Python unit tests (3.8) (push) Has been cancelled Details Build Docker images for libc++ CI / build-and-push (push) Has been cancelled Details Test Unprivileged Download Artifact Action / Upload Test Artifact (push) Has been cancelled Details Zizmor GitHub Actions Analysis / Run zizmor (push) Has been cancelled Details Build CI Tooling Containers / push-ci-container (push) Has been cancelled Details Build Windows CI Container / push-ci-container (push) Has been cancelled Details Build CI Container / push-ci-container (push) Has been cancelled Details Build libc Container / push-libc-container (push) Has been cancelled Details Build Metrics Container / push-metrics-container (push) Has been cancelled Details Test Unprivileged Download Artifact Action / Test Unprivileged Download Artifact (push) Has been cancelled Details Commit Access Review / commit-access-review (push) Has been cancelled Details The ASYNC_CNT is used to track the progress of asynchronous copies between global and LDS memories. By including it in asyncmark, the compiler can now assist the programmer in generating waits for ASYNC_CNT. Assisted-By: Claude Sonnet 4.5 This is part of a stack: - #185813 - #185810 Fixes: LCOMPILER-332	2026-04-07 07:23:09 +05:30
Mirko Brkušanin	93d7583f4f	[AMDGPU] Update features for gfx1170 (#186107 ) - Enable `NoF16PseudoScalarTransInlineConstants` for 11.7. - Add test for `RequiredExportPriority`, one of the differences between 11.5 and 11.7.	2026-03-20 17:04:17 +01:00
Mirko Brkušanin	a5aa136eb3	[AMDGPU] Add GFX11_7Insts feature, eliminate isGFX1170 helpers. NFC (#185878 )	2026-03-11 17:05:18 +01:00
michaelselehov	cb3fbe921b	[AMDGPU] Set preferred function alignment based on icache geometry (#183064 ) Non-entry functions were unconditionally aligned to 4 bytes with no architecture-specific preferred alignment, and setAlignment() was used instead of ensureAlignment(), overwriting any explicit IR attributes. Add instruction cache line size and fetch alignment data to GCNSubtarget for each generation (GFX9: 64B/32B, GFX10: 64B/4B, GFX11+: 128B/4B). Use this to call setPrefFunctionAlignment() in SITargetLowering, aligning non-entry functions to the cache line size by default. Change setAlignment to ensureAlignment in AMDGPUAsmPrinter so explicit IR align attributes are respected. Empirical thread trace analysis on gfx942, gfx1030, gfx1100, and gfx1200 showed that only GFX9 exhibits measurable fetch stalls when functions cross the 32-byte fetch window boundary. GFX10+ showed no alignment sensitivity. A hidden option -amdgpu-align-functions-for-fetch-only is provided to use the fetch granularity instead of cache line size. Assisted-by: Claude Opus	2026-03-11 07:57:37 -04:00
Matt Arsenault	8ec961e1a9	Reapply "AMDGPU: Annotate group size ABI loads with range metadata (#185420 )" (#185588 ) This reverts commit d5685ac6db0ae4cbca1745f18d8f2f7dc7d673a5. Fix off by one error. The end of the range is open.	2026-03-10 07:41:26 +00:00
Matt Arsenault	3545e51093	Revert "AMDGPU: Annotate group size ABI loads with range metadata (#185420 )" (#185521 ) This reverts commit 76daf31b4000623d5c9548348a859ea3ed8712e1. Bot failure.	2026-03-10 01:04:02 +00:00
Matt Arsenault	76daf31b40	AMDGPU: Annotate group size ABI loads with range metadata (#185420 ) We previously did the same for the grid size when annotated. The group size is easier, so it's weird that this wasn't implemented first.	2026-03-09 19:11:59 +01:00
Mirko Brkušanin	d0f50d5574	[AMDGPU] Remove DX10_CLAMP and IEEE bits from gfx1170 (#182107 ) Add `DX10ClampAndIEEEMode` feature and set it for every subtarget prior to gfx1170	2026-03-04 12:16:41 +01:00
LU-JOHN	7585ab05d6	[AMDGPU] Enable shift64 hazard recognition for gfx9 (#183839 ) Enable shift64 hazard recognition for gfx9 cores. --------- Signed-off-by: John Lu <John.Lu@amd.com>	2026-02-28 08:59:55 -06:00
zGoldthorpe	20dba979f7	[AMDGPU] Add target features to guard DPP controls (#182391 ) This patch adds target features: - `+dpp-wavefront-shifts`, for DPP `wave_shl/rol/shr/ror` - `+dpp-row-bcast`, for DPP `row_bcast15/31` These DPP controls are not available in gfx10+, so these target features enable `AMDGPURemoveIncompatibleFunctions` to remove functions that rely on these controls when compiling for newer GPUs.	2026-02-20 07:59:10 -07:00
michaelselehov	ed0ba3cb45	[AMDGPU] Align loop headers to prevent instruction fetch split on GFX950 (#181999 ) On GFX9, the instruction sequencer fetches 32 bytes at a time. When an 8-byte instruction at a loop header straddles a 32-byte fetch window boundary, the sequencer must perform two fetches after a backward branch, incurring a delay. On GFX950, this causes additional performance issues. This patch adds 32-byte alignment (.p2align 5, , 4) for loop headers on GFX950 when the first real instruction is 8 bytes. At most one s_nop (4 bytes, 1 quad-cycle before the loop) is used for padding. If more than 4 bytes of padding were needed, the 8-byte instruction would not straddle a 32-byte boundary anyway, so alignment is skipped. Note: the alignment decision is made during block-placement, before si-insert-waitcnts. In loops where a 4-byte S_WAITCNT is later inserted as the first instruction, the alignment becomes redundant but mostly harmless (at most one extra s_nop per affected loop). Assisted-by: Claude (Anthropic)	2026-02-19 14:18:44 -05:00
Mirko Brkušanin	829afc4c91	[AMDGPU] Add WMMA and SWMMAC instructions for gfx1170 (#180731 ) Introduce two new subtarget features: - WMMA256bInsts for GFX11 WMMA instructions and - WMMA128bInsts for GFX1170 and GFX12 WMMA and SWMMAC instructions Some WMMA instructions have changed from GFX 11.0 to GFX 11.7 so new Real versions were added with "_gfx1170" suffix. For consistency all WMMA and SWMMAC GFX11.7 instructions use this suffix. To resolve decoding issues between different formats for some WMMA instructions between GFX 11 and GFX 11.7, new decoding tables were added.	2026-02-18 19:17:48 +01:00
Domenic Nutile	5c72240617	[AMDGPU] Add DPP16 Row Share optimization for llvm.amdgcn.wave.shuffle (#177470 ) Adds logic to detect cases where the llvm.amdgcn.wave.shuffle intrinsic is being applied to an index operand that would make the result equivalent to the various Row Share flavors of DPP16 operations, and replaces the intrinsic and the instructions computing the index with an equivalent llvm.amdgcn.update.dpp call.	2026-02-06 15:31:34 -05:00
Carl Ritson	61f272d5cc	[AMDGPU] Pre-GFX10 does not need added latency for workgroup fences (#177157 ) Wait counts will not typically be introduced for workgroup scope fences in pre-GFX10 ASICs. Hence avoid adding scheduling latency for these.	2026-01-27 10:24:05 +09:00
Shilei Tian	786a20710d	[NFCI][AMDGPU] Use `GET_SUBTARGETINFO_MACRO` in `GCNSubtarget.h` and `R600Subtarget.h` (#177402 ) We can finally get rid of the manually defined boolean variables, like other targets. Even though most of them are now defined by macros, we still need to add the entries.	2026-01-25 09:38:42 -05:00
Shilei Tian	a7732479c1	[NFCI][AMDGPU] Move more attributes from `AMDGPUSubtarget` to `GCNSubtarget` (#177670 ) They are simply not used by `AMDGPUSubtarget &` but directly via `GCNSubtarget &`.	2026-01-24 07:31:42 -05:00
Shilei Tian	bc4b2765eb	[NFCI][AMDGPU] Refine `AMDGPUSubtarget.h` (#177473 ) This PR is to move code around to pave the path for using `GET_SUBTARGETINFO_MACRO` in `GCNSubtarget.h`.	2026-01-22 18:40:07 -05:00
Shilei Tian	4b1cfc5d7c	[NFCI][AMDGPU] Final touch before moving to `GET_SUBTARGETINFO_MACRO` (#177401 )	2026-01-22 17:33:17 +00:00
Shilei Tian	9f536c771d	[NFC][AMDGPU] Remove unused `FeatureDisable` (#177288 )	2026-01-22 09:07:28 -05:00
Shilei Tian	02d34a76f7	[NFCI][AMDGPU] Remove more redundant code from `GCNSubtarget.h` (#177297 ) We are getting pretty close to use `GET_SUBTARGETINFO_MACRO` in the header with this cleanup.	2026-01-22 09:07:15 -05:00
Shilei Tian	b857faeda6	[NFC][AMDGPU] Remove stale/dangling comments	2026-01-21 20:16:17 -05:00
Shilei Tian	2692f5ed53	[NFCI][AMDGPU] Convert more `SubtargetFeatures` to use `AMDGPUSubtargetFeature` and X-macros (#177256 ) Extend the X-macro pattern to eliminate boilerplate for additional subtarget features. This reduces ~50 lines of repetitive member declarations and getter definitions.	2026-01-21 18:03:32 -05:00
Shilei Tian	fa4f7657a2	[AMDGPU] Further improve `AMDGPUSubtargetFeature` multiclass (#177077 ) This PR extends the multiclass to support two additional parameters: one for specifying whether an `AssemblerPredicate` should be generated, and another for dependent `SubtargetFeatures`. This allows 15 more definitions to be converted to use the multiclass.	2026-01-21 21:05:13 +00:00
Shilei Tian	1843a7fe9f	[NFCI][AMDGPU] Use X-macro to reduce boilerplate in `GCNSubtarget.h` (#176844 ) `GCNSubtarget.h` contained a large amount of repetitive code following the pattern `bool HasXXX = false;` for member declarations and `bool hasXXX() const { return HasXXX; }` for getters. This boilerplate made the file unnecessarily long and harder to maintain. This patch introduces an X-macro pattern `GCN_SUBTARGET_HAS_FEATURE` that consolidates 135 simple subtarget features into a single list. The macro is expanded twice: once in the protected section to generate member variable declarations, and once in the public section to generate the corresponding getter methods. This reduces the file by approximately 600 lines while preserving the exact same API and functionality. Features with complex getter logic or inconsistent naming conventions are left as manual implementations for future improvement. Ideally, these could be generated by TableGen using `GET_SUBTARGETINFO_MACRO`, similar to the X86 backend. However, `AMDGPU.td` has several issues that prevent direct adoption: duplicate field names (e.g., `DumpCode` is set by both `FeatureDumpCode` and `FeatureDumpCodeLower`), and inconsistent naming conventions where many features don't have the `Has` prefix (e.g., `FlatAddressSpace`, `GFX10Insts`, `FP64`). Fixing these issues would require renaming fields in `AMDGPU.td` and updating all references, which is left for future work.	2026-01-21 15:29:09 -05:00
Shilei Tian	c253b9f9ca	[AMDGPU] Fix inline constant encoding for `v_pk_fmac_f16` (#176659 ) This PR handles`v_pk_fmac_f16` inline constant encoding/decoding differences between pre-GFX11 and GFX11+ hardware. - Pre-GFX11: fp16 inline constants produce `(f16, 0)` - value in low 16 bits, zero in high. - GFX11+: fp16 inline constants are duplicated to both halves `(f16, f16)`. Fixes #94116.	2026-01-20 19:14:59 -05:00
Stanislav Mekhanoshin	dd947ebcf3	[AMDGPU] Update gfx1250 memory model for global acquire/release (#175865 ) Inserts required waits around GLOBAL_INV/GLOBAL_WBINV for agent scope and above.	2026-01-15 03:25:03 -08:00
sstipano	cc1e10d50b	[AMDGPU] Disable s_add_pc_i64 instruction (#175644 ) s_add_pc_i64 instruction is broken on gfx1250. Disable it by default.	2026-01-14 23:01:43 +01:00
Shoreshen	26624d51d1	[AMDGPU]Add specific instruction feature for multicast load (#175503 )	2026-01-13 09:10:09 +08:00
Shilei Tian	df3629dc0c	[AMDGPU] Handle `s_setreg_imm32_b32` targeting `MODE` register (#174681 ) On certain hardware, this instruction clobbers VGPR MSB `bits[12:19]`, so we need to restore the current mode. Fixes SWDEV-571581.	2026-01-09 14:43:41 -05:00
Jay Foad	475f022cb7	[AMDGPU] Add support for GFX12 expert scheduling mode 2 (#170319 )	2026-01-09 15:49:10 +00:00
saxlungs	7bbaf2e16b	[AMDGPU] Improve llvm.amdgcn.wave.shuffle handling for pre-GFX8 (#174845 ) Before, GlobalISel would still return true for lowering the intrinsic for GFX7 and earlier even though the required ds_bpermute_b32 instruction is not supported. After this change, GlobalISel will properly report failure to select in this case. Testing is updated appropriately. Signed-off-by: Domenic Nutile <domenic.nutile@gmail.com>	2026-01-07 21:48:11 +01:00
saxlungs	c262893f4b	Reland "[AMDGPU] Add new llvm.amdgcn.wave.shuffle intrinsic (#167372 )" (#174614 ) This change adds a new intrinsic for AMDGPU that implements a wave shuffle, allowing arbitrary swizzling between lanes using an index. In the initial version of this commit, there was an issue in one of the tests added that returned a signal, causing testing to fail when combined with another recent change to 'not'. For context on the initial commit see #167372 --------- Signed-off-by: Domenic Nutile <domenic.nutile@gmail.com> Co-authored-by: Jay Foad <jay.foad@gmail.com>	2026-01-06 15:02:08 -05:00
Joe Nash	4bca00d56b	Revert "[AMDGPU] Add new llvm.amdgcn.wave.shuffle intrinsic" (#174501 ) Reverts llvm/llvm-project#167372	2026-01-05 17:52:28 -05:00
saxlungs	b9fbc19017	[AMDGPU] Add new llvm.amdgcn.wave.shuffle intrinsic (#167372 ) This intrinsic will be useful for implementing the OpGroupNonUniformShuffle operation in the SPIR-V reference --------- Signed-off-by: Domenic Nutile <domenic.nutile@gmail.com> Co-authored-by: Jay Foad <jay.foad@gmail.com>	2026-01-05 17:15:58 -05:00
Jay Foad	35c2dbd481	[AMDGPU] Remove trivially true predicates from GCNSubtarget. NFC. (#172830 )	2025-12-18 11:05:34 +00:00
Mirko Brkušanin	5759a3a779	[AMDGPU] Add s_wakeup_barrier instruction for gfx1250 (#170501 )	2025-12-10 09:45:13 +01:00
anjenner	740a3ad1f7	AMDGPU: Add codegen for atomicrmw operations usub_cond and usub_sat (#141068 ) Split off from https://github.com/llvm/llvm-project/pull/105553 as per discussion there.	2025-12-05 12:37:33 +00:00
Stanislav Mekhanoshin	9dd3346589	[AMDGPU] Prevent folding of flat_scr_base_hi into a 64-bit SALU (#170373 ) Fixes: SWDEV-563886	2025-12-02 16:08:00 -08:00
Pierre van Houtryve	a086fb2fbb	[AMDGPU][gfx1250] Add wait_xcnt before any access that cannot be repeated (#168852 ) The xcnt wait is actually required before any memory access that can only be done once, so atomic stores and volatile accesses are affected. This patch also ensures buffer instructions are handled.	2025-11-25 10:11:04 +01:00
Shoreshen	52a58a4193	[AMDGPU] Adding instruction specific features (#167809 )	2025-11-19 11:06:00 +08:00
Shilei Tian	b4aa3d3ae3	[NFC] Check operand type instead of opcode (#168641 ) A folow-up of #168458.	2025-11-18 21:37:56 -05:00
Shilei Tian	6665642ce4	[AMDGPU] Don't fold an i64 immediate value if it can't be replicated from its lower 32-bit (#168458 ) On some targets, a packed f32 instruction can only read 32 bits from a scalar operand (SGPR or literal) and replicates the bits to both channels. In this case, we should not fold an immediate value if it can't be replicated from its lower 32-bit. Fixes SWDEV-567139.	2025-11-18 17:11:10 -05:00
Matt Arsenault	dfdada1b78	CodeGen: Remove target hook for terminal rule (#165962 ) Enables the terminal rule for remaining targets	2025-11-12 21:12:19 +00:00
Matt Arsenault	e95f6fa123	RegisterCoalescer: Enable terminal rule by default for AMDGPU (#161621 ) Introduce a target hook to incrementally flip the behavior of targets with test changes, and start by implementing it for AMDGPU. This appears to be forgotten switch flip from 2015. This seems to do a nicer job with subregister copies. Most of the test changes are improvements or neutral, not that many are light regressions. The worst AMDGPU regressions are for true16 in the atomic tests, but I think that's due to existing true16 issues.	2025-11-10 09:37:14 -08:00
Jay Foad	60f20ea465	[AMDGPU] Add target feature for waits before system scope stores. NFC. (#164993 )	2025-10-27 10:31:37 +00:00
Stanislav Mekhanoshin	9b5bc98743	[AMDGPU] Add intrinsics for v_[pk]_add_{min\|max}_* instructions (#164731 )	2025-10-22 17:46:33 -07:00
Matt Arsenault	d4b504ff20	AMDGPU: Remove triple field from subtarget (#164208 ) This is redundant and already exists in the base class, and is also unused.	2025-10-20 06:58:16 +00:00
Shilei Tian	9e8dda1034	[NFC] Change spelling of cluster feature to "clusters" (#162103 )	2025-10-06 15:55:39 +00:00
Shilei Tian	bea0225c30	[AMDGPU] Make cluster a target feature (#162040 ) This replaces the original arch check.	2025-10-06 05:05:53 +00:00
Matt Arsenault	0a80631142	AMDGPU: Ensure both wavesize features are not set (#159234 ) Make sure we cannot be in a mode with both wavesizes. This prevents assertions in a future change. This should probably just be an error, but we do not have a good way to report errors from the MCSubtargetInfo constructor.	2025-09-25 09:46:34 +00:00

1 2 3 4 5 ...

293 Commits