llvm-project

Author	SHA1	Message	Date
Sameer Sahasrabuddhe	128437fb6a	[AMDGPU] Introduce asyncmark/wait intrinsics (#180467 ) Asynchronous operations are memory transfers (usually between the global memory and LDS) that are completed independently at an unspecified scope. A thread that requests one or more asynchronous transfers can use async marks to track their completion. The thread waits for each mark to be completed, which indicates that requests initiated in program order before this mark have also completed. For now, we implement asyncmark/wait operations on pre-GFX12 architectures that support "LDS DMA" operations. Future work will extend support to GFX12Plus architectures that support "true" async operations. This is part of a stack split out from #173259 - #180467 - #180466 Co-authored-by: Ryan Mitchell ryan.mitchell@amd.com Fixes: SWDEV-521121	2026-02-11 07:15:51 +00:00
Sameer Sahasrabuddhe	b02b395a1e	[AMDGPU] Asynchronous loads from global/buffer to LDS on pre-GFX12 (#180466 ) The existing "LDS DMA" builtins/intrinsics copy data from global/buffer pointer to LDS. These are now augmented with their ".async" version, where the compiler does not automatically track completion. The completion is now tracked using explicit mark/wait intrinsics, which must be inserted by the user. This makes it possible to write programs with efficient waits in software pipeline loops. The program can now wait for only the oldest outstanding operations to finish, while launching more operations for later use. This change only contains the new names of the builtins/intrinsics, which continue to behave exactly like their non-async counterparts. A later change will implement the actual mark/wait semantics in SIInsertWaitcnts. This is part of a stack split out from #173259: - #180467 - #180466 Fixes: SWDEV-521121	2026-02-11 05:26:58 +00:00
Diana Picus	24405f070f	[AMDGPU] Add intrinsic exposing s_alloc_vgpr (#163951 ) Make it possible to use `s_alloc_vgpr` at the IR level. This is a huge footgun and use for anything other than compiler internal purposes is heavily discouraged. The calling code must make sure that it does not allocate fewer VGPRs than necessary - the intrinsic is NOT a request to the backend to limit the number of VGPRs it uses (in essence it's not so different from what we do with the dynamic VGPR flags of the `amdgcn.cs.chain` intrinsic, it just makes it possible to use this functionality in other scenarios).	2026-02-10 09:28:31 +01:00
Jay Foad	4a6697f393	[AMDGPU] Fix and simplify patterns selecting fsub to v_fma_mix_f32 (#180169 ) Select (fsub x, y) -> (fma y, -1.0, x). Using -1.0 as the constant avoids the need for ComplexPatterns to negate x or y. This also fixes the bad pattern (fsub x, y) -> (fma -x, 1.0, y).	2026-02-06 14:39:13 +00:00
Acim Maravic	b0827f3b36	[LLVM] Select fma_mix for v_cvt_f32_f16 and v_add_f32/v_mul_f32 (#160151 )	2026-02-05 11:51:25 +01:00
Juan Manuel Martinez Caamaño	04c56505f8	[NFC][LLVM] Make `constrainSelectedInstRegOperands` return `void` (#179501 ) `constrainSelectedInstRegOperands` always returns `true`; so it can be safely transformed to return `void` instead. A follow-up patch should update `MachineInstrBuilder::constrainAllUses`.	2026-02-04 08:59:16 +01:00
serge-sans-paille	85919fbfa4	[perf] Replace copy-assign by move-assign in llvm/lib/Target/AMDGPU/ (#179460 )	2026-02-03 14:24:31 +00:00
Sam Elliott	7184229fea	[NFC][MI] Tidy Up RegState enum use (2/2) (#177090 ) This Change makes `RegState` into an enum class, with bitwise operators. It also: - Updates declarations of flag variables/arguments/returns from `unsigned` to `RegState`. - Updates empty RegState initializers from 0 to `{}`. If this is causing problems in downstream code: - Adopt the `RegState getXXXRegState(bool)` functions instead of using a ternary operator such as `bool ? RegState::XXX : 0`. - Adopt the `bool hasRegState(RegState, RegState)` function instead of using a bitwise check of the flags.	2026-01-23 00:19:03 -08:00
Shilei Tian	02d34a76f7	[NFCI][AMDGPU] Remove more redundant code from `GCNSubtarget.h` (#177297 ) We are getting pretty close to use `GET_SUBTARGETINFO_MACRO` in the header with this cleanup.	2026-01-22 09:07:15 -05:00
saxlungs	7bbaf2e16b	[AMDGPU] Improve llvm.amdgcn.wave.shuffle handling for pre-GFX8 (#174845 ) Before, GlobalISel would still return true for lowering the intrinsic for GFX7 and earlier even though the required ds_bpermute_b32 instruction is not supported. After this change, GlobalISel will properly report failure to select in this case. Testing is updated appropriately. Signed-off-by: Domenic Nutile <domenic.nutile@gmail.com>	2026-01-07 21:48:11 +01:00
saxlungs	c262893f4b	Reland "[AMDGPU] Add new llvm.amdgcn.wave.shuffle intrinsic (#167372 )" (#174614 ) This change adds a new intrinsic for AMDGPU that implements a wave shuffle, allowing arbitrary swizzling between lanes using an index. In the initial version of this commit, there was an issue in one of the tests added that returned a signal, causing testing to fail when combined with another recent change to 'not'. For context on the initial commit see #167372 --------- Signed-off-by: Domenic Nutile <domenic.nutile@gmail.com> Co-authored-by: Jay Foad <jay.foad@gmail.com>	2026-01-06 15:02:08 -05:00
Joe Nash	4bca00d56b	Revert "[AMDGPU] Add new llvm.amdgcn.wave.shuffle intrinsic" (#174501 ) Reverts llvm/llvm-project#167372	2026-01-05 17:52:28 -05:00
saxlungs	b9fbc19017	[AMDGPU] Add new llvm.amdgcn.wave.shuffle intrinsic (#167372 ) This intrinsic will be useful for implementing the OpGroupNonUniformShuffle operation in the SPIR-V reference --------- Signed-off-by: Domenic Nutile <domenic.nutile@gmail.com> Co-authored-by: Jay Foad <jay.foad@gmail.com>	2026-01-05 17:15:58 -05:00
Mirko Brkušanin	5759a3a779	[AMDGPU] Add s_wakeup_barrier instruction for gfx1250 (#170501 )	2025-12-10 09:45:13 +01:00
anjenner	740a3ad1f7	AMDGPU: Add codegen for atomicrmw operations usub_cond and usub_sat (#141068 ) Split off from https://github.com/llvm/llvm-project/pull/105553 as per discussion there.	2025-12-05 12:37:33 +00:00
Matt Arsenault	2ee12f191a	AMDGPU: Use RegClassByHwMode to manage GWS operand special case (#169373 ) On targets that require even aligned 64-bit VGPRs, GWS operands require even alignment of a 32-bit operand. Previously we had a hacky post-processing which added an implicit operand to try to manage the constraint. This would require special casing in other passes to avoid breaking the operand constraint. This moves the handling into the instruction definition, so other passes no longer need to consider this edge case. MC still does need to special case this, to print/parse as a 32-bit register. This also still ends up net less work than introducing even aligned 32-bit register classes. This also should be applied to the image special case.	2025-11-25 18:55:34 +00:00
Jay Foad	72c69aefba	[AMDGPU] Make use of getFunction and getMF. NFC. (#167872 )	2025-11-14 11:00:57 +00:00
LU-JOHN	87b1d3537a	[AMDGPU][NFC] Avoid copying MachineOperands (#166293 ) Avoid copying machine operands. Signed-off-by: John Lu <John.Lu@amd.com>	2025-11-04 23:18:40 -06:00
Abhay Kanhere	d998f92a00	[CodeGen] MachineVerifier to check early-clobber constraint (#151421 ) Currently MachineVerifier is missing verifying early-clobber operand constraint. The only other machine operand constraint - TiedTo is already verified.	2025-11-04 18:39:31 -08:00
vangthao95	d1d635083d	[AMDGPU][GlobalISel] Clean up selectCOPY_SCC_VCC function (#165797 ) Follow-up patch to address the comments in https://github.com/llvm/llvm-project/pull/165355.	2025-10-31 13:17:44 -07:00
vangthao95	ba5cde79aa	[AMDGPU][GlobalISel] Fix issue with copy_scc_vcc on gfx7 (#165355 ) When selecting for G_AMDGPU_COPY_SCC_VCC, we use S_CMP_LG_U64 or S_CMP_LG_U32 for wave64 and wave32 respectively. However, on gfx7 we do not have the S_CMP_LG_U64 instruction. Work around this issue by using S_OR_B64 instead.	2025-10-30 08:19:12 -07:00
Harrison Hao	d604ab6288	[AMDGPU] Support image atomic no return instructions (#150742 ) Add support for no-return variants of image atomic operations (e.g. IMAGE_ATOMIC_ADD_NORTN, IMAGE_ATOMIC_CMPSWAP_NORTN). These variants are generated when the return value of the intrinsic is unused, allowing the backend to select no return type instructions.	2025-10-29 10:42:15 +08:00
Krzysztof Drewniak	d37141776f	[AMDGPU] Enable volatile and non-temporal for loads to LDS (#153244 ) The primary purpose of this commit is to enable marking loads to LDS (global.load.lds, buffer.*.load.lds) volatile (using bit 31 of the aux as with normal buffer loads) and to ensure that their !nontemporal annotations translate to appropriate settings of te cache control bits. However, in the process of implementing this feature, we also fixed - Incorrect handling of buffer loads to LDS in GlobalISel - Updating the handling of volatile on buffers in SIMemoryLegalizer: previously, the mapping of address spaces would cause volatile on buffer loads to be silently dropped on at least gfx10. --------- Co-authored-by: Matt Arsenault <arsenm2@gmail.com>	2025-10-20 12:42:22 -05:00
Petar Avramovic	98d43ef2d8	AMDGPU: Use srcvalue and delete Ignore complex pattern (#161359 )	2025-09-30 16:18:51 +02:00
Petar Avramovic	1553b3de71	AMDGPU: Fix gcc build break (#161354 )	2025-09-30 14:01:08 +02:00
Petar Avramovic	709a74dfb3	AMDGPU: Fix s_barrier_leave to write to scc (#161221 ) s_barrier_leave implicitly defines $scc and does not use imm that represents type of barrier, isel pattern ignores imm operand from llvm intrinsic. Test if SIInsertWaitcnts tracks this scc write.	2025-09-30 12:55:35 +02:00
Changpeng Fang	7753f61f61	[AMDGPU] Support cluster_load_async_to_lds instructions on gfx1250 (#156595 )	2025-09-03 11:22:10 -07:00
Changpeng Fang	d3d1d8ff21	[AMDGPU] Support cluster load instructions for gfx1250 (#156548 )	2025-09-02 16:34:20 -07:00
Nicolai Hähnle	353b5e43c6	AMDGPU: Refactor lowering of s_barrier to split barriers (#154648 ) Let's do the lowering of non-split into split barriers in a new IR pass, AMDGPULowerIntrinsics. That way, there is no code duplication between SelectionDAG and GlobalISel. This simplifies some upcoming extensions to the code.	2025-08-28 07:01:20 -07:00
Stanislav Mekhanoshin	d0ee82040c	[AMDGPU] Add s_barrier_init\|join\|leave instructions (#153296 )	2025-08-12 15:07:07 -07:00
Fabian Ritter	e9ece175f9	[AMDGPU][GISel] Only fold flat offsets if they are inbounds (#153001 ) For flat memory instructions where the address is supplied as a base address register with an immediate offset, the memory aperture test ignores the immediate offset. Currently, ISel does not respect that, which leads to miscompilations where valid input programs crash when the address computation relies on the immediate offset to get the base address in the proper memory aperture. Global or scratch instructions are not affected. This patch only selects flat instructions with immediate offsets from address computations with the inbounds flag: If the address computation does not leave the bounds of the allocated object, it cannot leave the bounds of the memory aperture and is therefore safe to handle with an immediate offset. Relevant tests are in fold-gep-offset.ll. Analogous to #132353 for SDAG (which is not yet in a mergeable state, its progress is currently blocked by #146076). Fixes SWDEV-516125 for GISel.	2025-08-12 10:14:20 +02:00
Changpeng Fang	1e815ced81	[AMDGPU] Use SDNodeXForm to select a few VOP3P modifiers, NFC (#151907 ) It is not necessary to use ComplexPattern to select VOP3PModsNeg, VOP3PModsNegs and VOP3PModsNegAbs. We can use SDNodeXForm instead.	2025-08-04 12:51:48 -07:00
Changpeng Fang	7d2332391f	[AMDGPU] Fix destination op_sel for v_cvt_scale32_* and v_cvt_sr_* (#151411 ) GFX950 uses OP_SEL[MSB:LSB] for both src reads and dest writes. So this patch essentially revert the work from https://github.com/llvm/llvm-project/pull/151286 regarding dest writes.	2025-07-30 16:15:50 -07:00
Changpeng Fang	180281b8ec	[AMDGPU] Fix op_sel settings for v_cvt_scale32_* and v_cvt_sr_* (#151286 ) For OPF_OPSEL_SRCBYTE: Vector instruction uses OPSEL[1:0] to specify a byte select for the first source operand. So op_sel [0, 0], [1, 0], [0, 1] and [1, 1] should map to byte 0, 1, 2 and 3, respectively. For OPF_OPSEL_DSTBYTE: OPSEL is used as a destination byte select. OPSEL[2:3] specify which byte of the destination to write to. Note that the order of the bits is different from that of OPF_OPSEL_SRCBYT. So the mapping should be: op_sel [0, 0], [0, 1], [1, 0] and [1, 1] map to byte 0, 1, 2 and 3, respectively. Fixes: SWDEV-544901	2025-07-30 12:24:51 -07:00
Stanislav Mekhanoshin	7eaf1f2b2d	[AMDGPU] Bitop3 opcodes for gfx1250 (#151235 )	2025-07-29 15:36:56 -07:00
Stanislav Mekhanoshin	d99238263c	[AMDGPU] Implement v_mad_u32/v_mad_nc_u\|i64_u32 on gfx1250 (#151226 )	2025-07-29 15:06:35 -07:00
Changpeng Fang	3b66d4a987	[AMDGPU] Support builtin/intrinsics for async loads/stores on gfx1250 (#151058 )	2025-07-29 08:20:05 -07:00
Changpeng Fang	d7a38a94cd	[AMDGPU] Support builtin/intrinsics for load monitors on gfx1250 (#150540 )	2025-07-24 16:23:33 -07:00
Stanislav Mekhanoshin	96e5eed92a	[AMDGPU] Select VMEM prefetch for llvm.prefetch on gfx1250 (#150493 ) We have a choice to use a scalar or vector prefetch for an uniform pointer. Since we do not have scalar stores our scalar cache is practically readonly. The rw argument of the prefetch intrinsic is used to force vector operation even for an uniform case. On GFX12 scalar prefetch will be used anyway, it is still useful but it will only bring data to L2.	2025-07-24 13:22:50 -07:00
Stanislav Mekhanoshin	c6e560a25b	[AMDGPU] Select scale_offset for scratch instructions on gfx1250 (#150111 )	2025-07-22 15:24:55 -07:00
Stanislav Mekhanoshin	a0973de745	[AMDGPU] Select scale_offset for global instructions on gfx1250 (#150107 ) Also switches immediate offset to signed for the subtarget.	2025-07-22 15:04:52 -07:00
Stanislav Mekhanoshin	a0aebb1935	[AMDGPU] Select scale_offset with SMEM instructions (#150078 )	2025-07-22 13:26:28 -07:00
Diana Picus	20d8398825	[AMDGPU] ISel & PEI for whole wave functions (#145858 ) Whole wave functions are functions that will run with a full EXEC mask. They will not be invoked directly, but instead will be launched by way of a new intrinsic, `llvm.amdgcn.call.whole.wave` (to be added in a future patch). These functions are meant as an alternative to the `llvm.amdgcn.init.whole.wave` or `llvm.amdgcn.strict.wwm` intrinsics. Whole wave functions will set EXEC to -1 in the prologue and restore the original value of EXEC in the epilogue. They must have a special first argument, `i1 %active`, that is going to be mapped to EXEC. They may have either the default calling convention or amdgpu_gfx. The inactive lanes need to be preserved for all registers used, active lanes only for the CSRs. At the IR level, arguments to a whole wave function (other than `%active`) contain poison in their inactive lanes. Likewise, the return value for the inactive lanes is poison. This patch contains the following work: * 2 new pseudos, SI_SETUP_WHOLE_WAVE_FUNC and SI_WHOLE_WAVE_FUNC_RETURN used for managing the EXEC mask. SI_SETUP_WHOLE_WAVE_FUNC will return a SReg_1 representing `%active`, which needs to be passed into SI_WHOLE_WAVE_FUNC_RETURN. * SelectionDAG support for generating these 2 new pseudos and the special handling of %active. Since the return may be in a different basic block, it's difficult to add the virtual reg for %active to SI_WHOLE_WAVE_FUNC_RETURN, so we initially generate an IMPLICIT_DEF which is later replaced via a custom inserter. * Expansion of the 2 pseudos during prolog/epilog insertion. PEI also marks any used VGPRs as WWM registers, which are then spilled and restored with the usual logic. Future patches will include the `llvm.amdgcn.call.whole.wave` intrinsic and a lot of optimization work (especially in order to reduce spills around function calls). --------- Co-authored-by: Matt Arsenault <Matthew.Arsenault@amd.com> Co-authored-by: Shilei Tian <i@tianshilei.me>	2025-07-21 10:39:09 +02:00
Stanislav Mekhanoshin	cfa918bec1	[AMDGPU] Select flat GVS atomics on gfx1250 (#149554 )	2025-07-18 12:31:29 -07:00
Changpeng Fang	868793fa8e	AMDGPU: Support intrinsic selection for gfx1250 wmma instructions (#148957 ) Co-authored-by: Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com> Co-authored-by: Shilei Tian <Shilei.Tian@amd.com>	2025-07-15 15:25:05 -07:00
Fabian Ritter	a1edb1dbc6	[AMDGPU] Fix broken uses of isLegalFLATOffset and splitFlatOffset (#147469 ) The last parameter of these functions used to be `Signed`, and it looks like a few calls weren't updated when that was changed to `FlatVariant`. Effectively, the functions were called with `FlatVariant=SALU` due to integer promotions, which doesn't make any sense.	2025-07-08 11:18:36 +02:00
Shoreshen	99df642168	[AMDGPU] Re-Re-apply: Implement vop3p complex pattern optmization for gisel (#146984 ) Reverts llvm/llvm-project#146982 Fix up reported building error for https://github.com/llvm/llvm-project/pull/136262 with: ``` FAILED: lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUInstructionSelector.cpp.o CCACHE_CPP2=yes CCACHE_HASHDIR=yes CCACHE_SLOPPINESS=pch_defines,time_macros /usr/bin/ccache /home/b/sanitizer-aarch64-linux/build/llvm_build0/bin/clang++ -DGTEST_HAS_RTTI=0 -DLLVM_EXPORTS -D_DEBUG -D_GLIBCXX_ASSERTIONS -D_GNU_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -I/home/b/sanitizer-aarch64-linux/build/build_default/lib/Target/AMDGPU -I/home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/lib/Target/AMDGPU -I/home/b/sanitizer-aarch64-linux/build/build_default/include -I/home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/include -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Werror -Werror=date-time -Werror=unguarded-availability-new -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -pedantic -Wno-long-long -Wc++98-compat-extra-semi -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wsuggest-override -Wstring-conversion -Wmisleading-indentation -Wctad-maybe-unsupported -fdiagnostics-color -ffunction-sections -fdata-sections -O3 -DNDEBUG -std=c++17 -fvisibility=hidden -fno-exceptions -funwind-tables -fno-rtti -UNDEBUG -MD -MT lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUInstructionSelector.cpp.o -MF lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUInstructionSelector.cpp.o.d -o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUInstructionSelector.cpp.o -c /home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp /home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp:4566:1: error: non-void function does not return a value in all control paths [-Werror,-Wreturn-type] 4566 \| } \| ^ 1 error generated. ninja: build stopped: subcommand failed. ``` --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>	2025-07-08 15:10:44 +08:00
Shoreshen	5b8304d6b9	Revert "[AMDGPU] Re-apply: Implement vop3p complex pattern optmization for gisel" (#146982 ) Reverts llvm/llvm-project#136262 Due to building error: ``` FAILED: lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUInstructionSelector.cpp.o CCACHE_CPP2=yes CCACHE_HASHDIR=yes CCACHE_SLOPPINESS=pch_defines,time_macros /usr/bin/ccache /home/b/sanitizer-aarch64-linux/build/llvm_build0/bin/clang++ -DGTEST_HAS_RTTI=0 -DLLVM_EXPORTS -D_DEBUG -D_GLIBCXX_ASSERTIONS -D_GNU_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -I/home/b/sanitizer-aarch64-linux/build/build_default/lib/Target/AMDGPU -I/home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/lib/Target/AMDGPU -I/home/b/sanitizer-aarch64-linux/build/build_default/include -I/home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/include -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Werror -Werror=date-time -Werror=unguarded-availability-new -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -pedantic -Wno-long-long -Wc++98-compat-extra-semi -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wsuggest-override -Wstring-conversion -Wmisleading-indentation -Wctad-maybe-unsupported -fdiagnostics-color -ffunction-sections -fdata-sections -O3 -DNDEBUG -std=c++17 -fvisibility=hidden -fno-exceptions -funwind-tables -fno-rtti -UNDEBUG -MD -MT lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUInstructionSelector.cpp.o -MF lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUInstructionSelector.cpp.o.d -o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUInstructionSelector.cpp.o -c /home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp /home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp:4566:1: error: non-void function does not return a value in all control paths [-Werror,-Wreturn-type] 4566 \| } \| ^ 1 error generated. ninja: build stopped: subcommand failed. ```	2025-07-04 09:43:44 +08:00
Shoreshen	db03c27763	[AMDGPU] Re-apply: Implement vop3p complex pattern optmization for gisel (#136262 ) This is a fix up for patch https://github.com/llvm/llvm-project/pull/130234, which is reverted in https://github.com/llvm/llvm-project/pull/136249 The main reason of building failure are: 1. ``` /home/botworker/bbot/amdgpu-offload-rhel-9-cmake-build-only/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp: In function ‘llvm::SmallVector<std::pair<const llvm::MachineOperand, SrcStatus> > getSrcStats(const llvm::MachineOperand, const llvm::MachineRegisterInfo&, searchOptions, int)’: /home/botworker/bbot/amdgpu-offload-rhel-9-cmake-build-only/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp:4669: error: could not convert ‘Statlist’ from ‘SmallVector<[...],4>’ to ‘SmallVector<[...],3>’ 4669 \| return Statlist; ``` 2. ``` /home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp:4554:1: error: non-void function does not return a value in all control paths [-Werror,-Wreturn-type] 4554 \| } \| ^ /home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp:4644:39: error: overlapping comparisons always evaluate to true [-Werror,-Wtautological-overlap-compare] 4644 \| (Stat >= SrcStatus::NEG_START \|\| Stat <= SrcStatus::NEG_END)) { \| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~ /home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp:4893:66: error: captured structured bindings are a C++20 extension [-Werror,-Wc++20-extensions] 4893 \| [=](MachineInstrBuilder &MIB) { MIB.addImm(getAllKindImm(Op)); }, \| ^ /home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp:4890:9: note: 'Op' declared here 4890 \| auto [Op, Mods] = selectVOP3PModsImpl(&Root, MRI, IsDOT); \| ^ /home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp:4894:52: error: captured structured bindings are a C++20 extension [-Werror,-Wc++20-extensions] 4894 \| [=](MachineInstrBuilder &MIB) { MIB.addImm(Mods); } // src_mods \| ^ /home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp:4890:13: note: 'Mods' declared here 4890 \| auto [Op, Mods] = selectVOP3PModsImpl(&Root, MRI, IsDOT); \| ^ /home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp:4899:50: error: captured structured bindings are a C++20 extension [-Werror,-Wc++20-extensions] 4899 \| [=](MachineInstrBuilder &MIB) { MIB.addReg(Op->getReg()); }, \| ^ /home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp:4890:9: note: 'Op' declared here 4890 \| auto [Op, Mods] = selectVOP3PModsImpl(&Root, MRI, IsDOT); \| ^ /home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp:4900:50: error: captured structured bindings are a C++20 extension [-Werror,-Wc++20-extensions] 4900 \| [=](MachineInstrBuilder &MIB) { MIB.addImm(Mods); } // src_mods \| ^ /home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp:4890:13: note: 'Mods' declared here 4890 \| auto [Op, Mods] = selectVOP3PModsImpl(&Root, MRI, IsDOT); \| ^ 6 errors generated. ``` Both error cannot be reproduced at my local machine, the fix applied are: 1. In `llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp` function `getSrcStats` replace ``` SmallVector<std::pair<const MachineOperand , SrcStatus>, 4> Statlist; ``` with ``` SmallVector<std::pair<const MachineOperand , SrcStatus>> Statlist; ``` 2. In `llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp` function `AMDGPUInstructionSelector::selectVOP3PRetHelper` replace ``` auto [Op, Mods] = selectVOP3PModsImpl(&Root, MRI, IsDOT); ``` with ``` auto Results = selectVOP3PModsImpl(&Root, MRI, IsDOT); const MachineOperand *Op = Results.first; unsigned Mods = Results.second; ``` These change hasn't be testified since both errors cannot be reproduced in local	2025-07-04 09:23:59 +08:00
Matt Arsenault	ed155ff9f2	AMDGPU: Avoid report_fatal_error on ds ordered intrinsics (#145202 )	2025-06-23 13:09:09 +09:00

1 2 3 4 5 ...

555 Commits