llvm-project

Author	SHA1	Message	Date
Stanislav Mekhanoshin	d0ee82040c	[AMDGPU] Add s_barrier_init\|join\|leave instructions (#153296 )	2025-08-12 15:07:07 -07:00
Fabian Ritter	e9ece175f9	[AMDGPU][GISel] Only fold flat offsets if they are inbounds (#153001 ) For flat memory instructions where the address is supplied as a base address register with an immediate offset, the memory aperture test ignores the immediate offset. Currently, ISel does not respect that, which leads to miscompilations where valid input programs crash when the address computation relies on the immediate offset to get the base address in the proper memory aperture. Global or scratch instructions are not affected. This patch only selects flat instructions with immediate offsets from address computations with the inbounds flag: If the address computation does not leave the bounds of the allocated object, it cannot leave the bounds of the memory aperture and is therefore safe to handle with an immediate offset. Relevant tests are in fold-gep-offset.ll. Analogous to #132353 for SDAG (which is not yet in a mergeable state, its progress is currently blocked by #146076). Fixes SWDEV-516125 for GISel.	2025-08-12 10:14:20 +02:00
Changpeng Fang	1e815ced81	[AMDGPU] Use SDNodeXForm to select a few VOP3P modifiers, NFC (#151907 ) It is not necessary to use ComplexPattern to select VOP3PModsNeg, VOP3PModsNegs and VOP3PModsNegAbs. We can use SDNodeXForm instead.	2025-08-04 12:51:48 -07:00
Changpeng Fang	7d2332391f	[AMDGPU] Fix destination op_sel for v_cvt_scale32_* and v_cvt_sr_* (#151411 ) GFX950 uses OP_SEL[MSB:LSB] for both src reads and dest writes. So this patch essentially revert the work from https://github.com/llvm/llvm-project/pull/151286 regarding dest writes.	2025-07-30 16:15:50 -07:00
Changpeng Fang	180281b8ec	[AMDGPU] Fix op_sel settings for v_cvt_scale32_* and v_cvt_sr_* (#151286 ) For OPF_OPSEL_SRCBYTE: Vector instruction uses OPSEL[1:0] to specify a byte select for the first source operand. So op_sel [0, 0], [1, 0], [0, 1] and [1, 1] should map to byte 0, 1, 2 and 3, respectively. For OPF_OPSEL_DSTBYTE: OPSEL is used as a destination byte select. OPSEL[2:3] specify which byte of the destination to write to. Note that the order of the bits is different from that of OPF_OPSEL_SRCBYT. So the mapping should be: op_sel [0, 0], [0, 1], [1, 0] and [1, 1] map to byte 0, 1, 2 and 3, respectively. Fixes: SWDEV-544901	2025-07-30 12:24:51 -07:00
Stanislav Mekhanoshin	7eaf1f2b2d	[AMDGPU] Bitop3 opcodes for gfx1250 (#151235 )	2025-07-29 15:36:56 -07:00
Stanislav Mekhanoshin	d99238263c	[AMDGPU] Implement v_mad_u32/v_mad_nc_u\|i64_u32 on gfx1250 (#151226 )	2025-07-29 15:06:35 -07:00
Changpeng Fang	3b66d4a987	[AMDGPU] Support builtin/intrinsics for async loads/stores on gfx1250 (#151058 )	2025-07-29 08:20:05 -07:00
Changpeng Fang	d7a38a94cd	[AMDGPU] Support builtin/intrinsics for load monitors on gfx1250 (#150540 )	2025-07-24 16:23:33 -07:00
Stanislav Mekhanoshin	96e5eed92a	[AMDGPU] Select VMEM prefetch for llvm.prefetch on gfx1250 (#150493 ) We have a choice to use a scalar or vector prefetch for an uniform pointer. Since we do not have scalar stores our scalar cache is practically readonly. The rw argument of the prefetch intrinsic is used to force vector operation even for an uniform case. On GFX12 scalar prefetch will be used anyway, it is still useful but it will only bring data to L2.	2025-07-24 13:22:50 -07:00
Stanislav Mekhanoshin	c6e560a25b	[AMDGPU] Select scale_offset for scratch instructions on gfx1250 (#150111 )	2025-07-22 15:24:55 -07:00
Stanislav Mekhanoshin	a0973de745	[AMDGPU] Select scale_offset for global instructions on gfx1250 (#150107 ) Also switches immediate offset to signed for the subtarget.	2025-07-22 15:04:52 -07:00
Stanislav Mekhanoshin	a0aebb1935	[AMDGPU] Select scale_offset with SMEM instructions (#150078 )	2025-07-22 13:26:28 -07:00
Diana Picus	20d8398825	[AMDGPU] ISel & PEI for whole wave functions (#145858 ) Whole wave functions are functions that will run with a full EXEC mask. They will not be invoked directly, but instead will be launched by way of a new intrinsic, `llvm.amdgcn.call.whole.wave` (to be added in a future patch). These functions are meant as an alternative to the `llvm.amdgcn.init.whole.wave` or `llvm.amdgcn.strict.wwm` intrinsics. Whole wave functions will set EXEC to -1 in the prologue and restore the original value of EXEC in the epilogue. They must have a special first argument, `i1 %active`, that is going to be mapped to EXEC. They may have either the default calling convention or amdgpu_gfx. The inactive lanes need to be preserved for all registers used, active lanes only for the CSRs. At the IR level, arguments to a whole wave function (other than `%active`) contain poison in their inactive lanes. Likewise, the return value for the inactive lanes is poison. This patch contains the following work: * 2 new pseudos, SI_SETUP_WHOLE_WAVE_FUNC and SI_WHOLE_WAVE_FUNC_RETURN used for managing the EXEC mask. SI_SETUP_WHOLE_WAVE_FUNC will return a SReg_1 representing `%active`, which needs to be passed into SI_WHOLE_WAVE_FUNC_RETURN. * SelectionDAG support for generating these 2 new pseudos and the special handling of %active. Since the return may be in a different basic block, it's difficult to add the virtual reg for %active to SI_WHOLE_WAVE_FUNC_RETURN, so we initially generate an IMPLICIT_DEF which is later replaced via a custom inserter. * Expansion of the 2 pseudos during prolog/epilog insertion. PEI also marks any used VGPRs as WWM registers, which are then spilled and restored with the usual logic. Future patches will include the `llvm.amdgcn.call.whole.wave` intrinsic and a lot of optimization work (especially in order to reduce spills around function calls). --------- Co-authored-by: Matt Arsenault <Matthew.Arsenault@amd.com> Co-authored-by: Shilei Tian <i@tianshilei.me>	2025-07-21 10:39:09 +02:00
Stanislav Mekhanoshin	cfa918bec1	[AMDGPU] Select flat GVS atomics on gfx1250 (#149554 )	2025-07-18 12:31:29 -07:00
Changpeng Fang	868793fa8e	AMDGPU: Support intrinsic selection for gfx1250 wmma instructions (#148957 ) Co-authored-by: Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com> Co-authored-by: Shilei Tian <Shilei.Tian@amd.com>	2025-07-15 15:25:05 -07:00
Fabian Ritter	a1edb1dbc6	[AMDGPU] Fix broken uses of isLegalFLATOffset and splitFlatOffset (#147469 ) The last parameter of these functions used to be `Signed`, and it looks like a few calls weren't updated when that was changed to `FlatVariant`. Effectively, the functions were called with `FlatVariant=SALU` due to integer promotions, which doesn't make any sense.	2025-07-08 11:18:36 +02:00
Shoreshen	99df642168	[AMDGPU] Re-Re-apply: Implement vop3p complex pattern optmization for gisel (#146984 ) Reverts llvm/llvm-project#146982 Fix up reported building error for https://github.com/llvm/llvm-project/pull/136262 with: ``` FAILED: lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUInstructionSelector.cpp.o CCACHE_CPP2=yes CCACHE_HASHDIR=yes CCACHE_SLOPPINESS=pch_defines,time_macros /usr/bin/ccache /home/b/sanitizer-aarch64-linux/build/llvm_build0/bin/clang++ -DGTEST_HAS_RTTI=0 -DLLVM_EXPORTS -D_DEBUG -D_GLIBCXX_ASSERTIONS -D_GNU_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -I/home/b/sanitizer-aarch64-linux/build/build_default/lib/Target/AMDGPU -I/home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/lib/Target/AMDGPU -I/home/b/sanitizer-aarch64-linux/build/build_default/include -I/home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/include -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Werror -Werror=date-time -Werror=unguarded-availability-new -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -pedantic -Wno-long-long -Wc++98-compat-extra-semi -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wsuggest-override -Wstring-conversion -Wmisleading-indentation -Wctad-maybe-unsupported -fdiagnostics-color -ffunction-sections -fdata-sections -O3 -DNDEBUG -std=c++17 -fvisibility=hidden -fno-exceptions -funwind-tables -fno-rtti -UNDEBUG -MD -MT lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUInstructionSelector.cpp.o -MF lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUInstructionSelector.cpp.o.d -o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUInstructionSelector.cpp.o -c /home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp /home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp:4566:1: error: non-void function does not return a value in all control paths [-Werror,-Wreturn-type] 4566 \| } \| ^ 1 error generated. ninja: build stopped: subcommand failed. ``` --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>	2025-07-08 15:10:44 +08:00
Shoreshen	5b8304d6b9	Revert "[AMDGPU] Re-apply: Implement vop3p complex pattern optmization for gisel" (#146982 ) Reverts llvm/llvm-project#136262 Due to building error: ``` FAILED: lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUInstructionSelector.cpp.o CCACHE_CPP2=yes CCACHE_HASHDIR=yes CCACHE_SLOPPINESS=pch_defines,time_macros /usr/bin/ccache /home/b/sanitizer-aarch64-linux/build/llvm_build0/bin/clang++ -DGTEST_HAS_RTTI=0 -DLLVM_EXPORTS -D_DEBUG -D_GLIBCXX_ASSERTIONS -D_GNU_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -I/home/b/sanitizer-aarch64-linux/build/build_default/lib/Target/AMDGPU -I/home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/lib/Target/AMDGPU -I/home/b/sanitizer-aarch64-linux/build/build_default/include -I/home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/include -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Werror -Werror=date-time -Werror=unguarded-availability-new -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -pedantic -Wno-long-long -Wc++98-compat-extra-semi -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wsuggest-override -Wstring-conversion -Wmisleading-indentation -Wctad-maybe-unsupported -fdiagnostics-color -ffunction-sections -fdata-sections -O3 -DNDEBUG -std=c++17 -fvisibility=hidden -fno-exceptions -funwind-tables -fno-rtti -UNDEBUG -MD -MT lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUInstructionSelector.cpp.o -MF lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUInstructionSelector.cpp.o.d -o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUInstructionSelector.cpp.o -c /home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp /home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp:4566:1: error: non-void function does not return a value in all control paths [-Werror,-Wreturn-type] 4566 \| } \| ^ 1 error generated. ninja: build stopped: subcommand failed. ```	2025-07-04 09:43:44 +08:00
Shoreshen	db03c27763	[AMDGPU] Re-apply: Implement vop3p complex pattern optmization for gisel (#136262 ) This is a fix up for patch https://github.com/llvm/llvm-project/pull/130234, which is reverted in https://github.com/llvm/llvm-project/pull/136249 The main reason of building failure are: 1. ``` /home/botworker/bbot/amdgpu-offload-rhel-9-cmake-build-only/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp: In function ‘llvm::SmallVector<std::pair<const llvm::MachineOperand, SrcStatus> > getSrcStats(const llvm::MachineOperand, const llvm::MachineRegisterInfo&, searchOptions, int)’: /home/botworker/bbot/amdgpu-offload-rhel-9-cmake-build-only/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp:4669: error: could not convert ‘Statlist’ from ‘SmallVector<[...],4>’ to ‘SmallVector<[...],3>’ 4669 \| return Statlist; ``` 2. ``` /home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp:4554:1: error: non-void function does not return a value in all control paths [-Werror,-Wreturn-type] 4554 \| } \| ^ /home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp:4644:39: error: overlapping comparisons always evaluate to true [-Werror,-Wtautological-overlap-compare] 4644 \| (Stat >= SrcStatus::NEG_START \|\| Stat <= SrcStatus::NEG_END)) { \| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~ /home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp:4893:66: error: captured structured bindings are a C++20 extension [-Werror,-Wc++20-extensions] 4893 \| [=](MachineInstrBuilder &MIB) { MIB.addImm(getAllKindImm(Op)); }, \| ^ /home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp:4890:9: note: 'Op' declared here 4890 \| auto [Op, Mods] = selectVOP3PModsImpl(&Root, MRI, IsDOT); \| ^ /home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp:4894:52: error: captured structured bindings are a C++20 extension [-Werror,-Wc++20-extensions] 4894 \| [=](MachineInstrBuilder &MIB) { MIB.addImm(Mods); } // src_mods \| ^ /home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp:4890:13: note: 'Mods' declared here 4890 \| auto [Op, Mods] = selectVOP3PModsImpl(&Root, MRI, IsDOT); \| ^ /home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp:4899:50: error: captured structured bindings are a C++20 extension [-Werror,-Wc++20-extensions] 4899 \| [=](MachineInstrBuilder &MIB) { MIB.addReg(Op->getReg()); }, \| ^ /home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp:4890:9: note: 'Op' declared here 4890 \| auto [Op, Mods] = selectVOP3PModsImpl(&Root, MRI, IsDOT); \| ^ /home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp:4900:50: error: captured structured bindings are a C++20 extension [-Werror,-Wc++20-extensions] 4900 \| [=](MachineInstrBuilder &MIB) { MIB.addImm(Mods); } // src_mods \| ^ /home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp:4890:13: note: 'Mods' declared here 4890 \| auto [Op, Mods] = selectVOP3PModsImpl(&Root, MRI, IsDOT); \| ^ 6 errors generated. ``` Both error cannot be reproduced at my local machine, the fix applied are: 1. In `llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp` function `getSrcStats` replace ``` SmallVector<std::pair<const MachineOperand , SrcStatus>, 4> Statlist; ``` with ``` SmallVector<std::pair<const MachineOperand , SrcStatus>> Statlist; ``` 2. In `llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp` function `AMDGPUInstructionSelector::selectVOP3PRetHelper` replace ``` auto [Op, Mods] = selectVOP3PModsImpl(&Root, MRI, IsDOT); ``` with ``` auto Results = selectVOP3PModsImpl(&Root, MRI, IsDOT); const MachineOperand *Op = Results.first; unsigned Mods = Results.second; ``` These change hasn't be testified since both errors cannot be reproduced in local	2025-07-04 09:23:59 +08:00
Matt Arsenault	ed155ff9f2	AMDGPU: Avoid report_fatal_error on ds ordered intrinsics (#145202 )	2025-06-23 13:09:09 +09:00
Nicolai Hähnle	3bee9ba015	AMDGPU/GFX12: Fix s_barrier_signal_isfirst for single-wave workgroups (#143634 ) Barrier instructions are no-ops in single-wave workgroups. This includes s_barrier_signal_isfirst, which will leave SCC unmodified. Model this correctly (via an implicit use of SCC) and ensure SCC==1 before the barrier instruction (if the wave is the only one of the workgroup, then it is the first). --------- Co-authored-by: Matt Arsenault <arsenm2@gmail.com>	2025-06-19 11:22:49 -07:00
Justin Bogner	b7bb256703	Warn on misuse of DiagnosticInfo classes that hold Twines (#137397 ) This annotates the `Twine` passed to the constructors of the various DiagnosticInfo subclasses with `[[clang::lifetimebound]]`, which causes us to warn when we would try to print the twine after it had already been destructed. We also update `DiagnosticInfoUnsupported` to hold a `const Twine &` like all of the other DiagnosticInfo classes, since this warning allows us to clean up all of the places where it was being used incorrectly.	2025-05-28 12:26:39 -07:00
Krzysztof Drewniak	4bdd116b80	[AMDGPU] Add a new amdgcn.load.to.lds intrinsic (#137425 ) This PR adds a amdgns_load_to_lds intrinsic that abstracts over loads to LDS from global (address space 1) pointers and buffer fat pointers (address space 7), since they use the same API and "gather from a pointer to LDS" is something of an abstract operation. This commit adds the intrinsic and its lowerings for addrspaces 1 and 7, and updates the MLIR wrappers to use it (loosening up the restrictions on loads to LDS along the way to match the ground truth from target features). It also plumbs the intrinsic through to clang.	2025-05-19 07:15:04 -07:00
Matt Arsenault	1e353fa5c3	AMDGPU: Fix -Wextra (#138539 ) Another stupid gcc warning. Ideally we would directly use the enum type, but subregister indexes are emitted as an anonymous enum. Fixes #125548	2025-05-05 20:02:39 +02:00
Diana Picus	45d96df797	[AMDGPU] Support arbitrary types in amdgcn.dead (#134841 ) Legalize the amdgcn.dead intrinsic to work with types other than i32. It still generates IMPLICIT_DEFs. Remove some of the previous code for selecting/reg bank mapping it for 32-bit types, since everything is done in the legalizer now.	2025-05-05 14:08:00 +02:00
Jay Foad	886f1199f0	[AMDGPU] Use variadic isa<>. NFC. (#137016 )	2025-04-24 08:19:09 +01:00
Shoreshen	a3f38f27cd	Revert "[AMDGPU] Implement vop3p complex pattern optmization for gisel" (#136249 ) Reverts llvm/llvm-project#130234	2025-04-17 23:45:30 -04:00
Shoreshen	a04580f71b	[AMDGPU] Implement vop3p complex pattern optmization for gisel (#130234 ) Seeking opportunities to optimize VOP3P instructions by altering opsel, opsel_hi, neg, neg_hi bits Tests differences: 1. fix op_sel_hi bit for inline constant: 1. `CodeGen/AMDGPU/packed-fp32.ll` 2. use neg bit to remove xor with 0x80008000 1. `CodeGen/AMDGPU/strict_fsub.f16.ll` 2. `CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.fdot2.ll` 3. `CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.sdot4.ll` 4. `CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.sdot8.ll` 5. `CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.udot2.ll` 6. `CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.udot4.ll` 7. `CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.udot8.ll` 3. Remove xor 0x80008000, and use opsel, opsel_hi to remove alignbit 1. `CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.sdot2.ll`	2025-04-18 10:56:20 +08:00
Juan Manuel Martinez Caamaño	beae0e9f1a	[AMDGPU] Use a target feature to enable __builtin_amdgcn_global_load_lds on gfx9/10 (#133055 ) This patch introduces the `vmem-to-lds-load-insts` target feature, which can be used to enable builtins `__builtin_amdgcn_global_load_lds` and `__builtin_amdgcn_raw_ptr_buffer_load_lds` on platforms which have this feature. This feature is only available on gfx9/10. A limitation of using a common target feature for both builtins is that we could have made `__builtin_amdgcn_raw_ptr_buffer_load_lds` available on gfx6,7,8.	2025-04-02 20:00:09 +02:00
Tim Gymnich	1d0005a69a	[GlobalISel][NFC] Rename GISelKnownBits to GISelValueTracking (#133466 ) - rename `GISelKnownBits` to `GISelValueTracking` to analyze more than just `KnownBits` in the future	2025-03-29 11:51:29 +01:00
Mariusz Sikora	4f5ccf22fa	[AMDGPU] Support image_bvh8_intersect_ray instruction and intrinsic. (#130041 ) Co-authored-by: Ivan Kosarev <ivan.kosarev@amd.com>	2025-03-19 16:08:08 +01:00
Mariusz Sikora	575fde0995	[AMDGPU] Add intrinsic and MI for image_bvh_dual_intersect_ray (#130038 ) - Add llvm.amdgcn.image.bvh.dual.intersect.ray intrinsic and image_bvh_dual_intersect_ray machine instruction. - Add llvm_v10i32_ty and llvm_v10f32_ty --------- Co-authored-by: Mateja Marjanovic <mateja.marjanovic@amd.com>	2025-03-19 07:35:09 +01:00
Mariusz Sikora	6b47bba440	[AMDGPU] Add intrinsics and MIs for ds_bvh_stack_* (#130007 ) New intrinsics / instructions : int_amdgcn_ds_bvh_stack_push4_pop1_rtn / ds_bvh_stack_push4_pop1_rtn_b32 int_amdgcn_ds_bvh_stack_push8_pop1_rtn / ds_bvh_stack_push8_pop1_rtn_b32 int_amdgcn_ds_bvh_stack_push8_pop2_rtn / ds_bvh_stack_push8_pop2_rtn_b64 Co-authored-by: Mateja Marjanovic <mateja.marjanovic@amd.com>	2025-03-17 09:13:20 +01:00
Brox Chen	15a5b3a192	[AMDGPU][True16][CodeGen] gisel true16 for ICMP (#128913 ) GlobalIsel true16 selection for ICMP	2025-03-13 12:03:17 -04:00
Mariusz Sikora	bbabf4e2b8	[AMDGPU][NFC] Update name for BVH Intersect Ray (#130036 ) Co-authored-by: Ivan Kosarev <ivan.kosarev@amd.com>	2025-03-06 14:26:11 +01:00
Mariusz Sikora	cd3acd1bff	[AMDGPU] Remove unused s_barrier_{init,join,leave} instructions (#129548 )	2025-03-04 17:52:43 +01:00
Jay Foad	44607666b3	[AMDGPU] Simplify conditional expressions. NFC. (#129228 ) Simplfy `cond ? val : false` to `cond && val` and similar.	2025-03-03 10:40:49 +00:00
Brox Chen	c896f7bdaa	[AMDGPU][True16][CodeGen] build_vector pattern in true16 (#118904 ) build_vector pattern in true16 SDAG	2025-02-21 14:02:12 -05:00
Diana Picus	611a648327	[AMDGPU] Add llvm.amdgcn.dead intrinsic (#123190 ) Shaders that use the llvm.amdgcn.init.whole.wave intrinsic need to explicitly preserve the inactive lanes of VGPRs of interest by adding them as dummy arguments. The code usually looks something like this: ``` define amdgcn_cs_chain void f(active vgpr args..., i32 %inactive.vgpr1, ..., i32 %inactive.vgprN) { entry: %c = call i1 @llvm.amdgcn.init.whole.wave() br i1 %c, label %shader, label %tail shader: [...] tail: %inactive.vgpr.arg1 = phi i32 [ %inactive.vgpr1, %entry], [poison, %shader] [...] ; %inactive.vgpr* then get passed into a llvm.amdgcn.cs.chain call ``` Unfortunately, this kind of phi node will get optimized away and the backend won't be able to figure out that it's ok to use the active lanes of `%inactive.vgpr*` inside `shader`. This patch fixes the issue by introducing a llvm.amdgcn.dead intrinsic, whose result can be used as a PHI operand instead of the poison. This will be selected to an IMPLICIT_DEF, which the backend can work with. At the moment, the llvm.amdgcn.dead intrinsic works only on i32 values. Support for other types can be added later if needed.	2025-02-20 09:25:48 +01:00
Fabian Ritter	8615f9aaff	[AMDGPU] Replace gfx940 and gfx941 with gfx942 in llvm (#126763 ) gfx940 and gfx941 are no longer supported. This is one of a series of PRs to remove them from the code base. This PR removes all non-documentation occurrences of gfx940/gfx941 from the llvm directory, and the remaining occurrences in clang. Documentation changes will follow. For SWDEV-512631	2025-02-19 10:20:48 +01:00
Brox Chen	33d401fb15	[AMDGPU][True16][CodeGen] true16 codegen for icmp and is_fpclass (#124757 ) True16 codegen pattern for icmp patterns and is_fpclass	2025-01-30 12:18:00 -05:00
Petar Avramovic	0ee037b861	AMDGPU/GlobalISel: AMDGPURegBankLegalize (#112864 ) Lower G_ instructions that can't be inst-selected with register bank assignment from AMDGPURegBankSelect based on uniformity analysis. - Lower instruction to perform it on assigned register bank - Put uniform value in vgpr because SALU instruction is not available - Execute divergent instruction in SALU - "waterfall loop" Given LLTs on all operands after legalizer, some register bank assignments require lowering while other do not. Note: cases where all register bank assignments would require lowering are lowered in legalizer. AMDGPURegBankLegalize goals: - Define Rules: when and how to perform lowering - Goal of defining Rules it to provide high level table-like brief overview of how to lower generic instructions based on available target features and uniformity info (uniform vs divergent). - Fast search of Rules, depends on how complicated Rule.Predicate is - For some opcodes there would be too many Rules that are essentially all the same just for different combinations of types and banks. Write custom function that handles all cases. - Rules are made from enum IDs that correspond to each operand. Names of IDs are meant to give brief description what lowering does for each operand or the whole instruction. - AMDGPURegBankLegalizeHelper implements lowering algorithms Since this is the first patch that actually enables -new-reg-bank-select here is the summary of regression tests that were added earlier: - if instruction is uniform always select SALU instruction if available - eliminate back to back vgpr to sgpr to vgpr copies of uniform values - fast rules: small differences for standard and vector instruction - enabling Rule based on target feature - salu_float - how to specify lowering algorithm - vgpr S64 AND to S32 - on G_TRUNC in reg, it is up to user to deal with truncated bits G_TRUNC in reg is treated as no-op. - dealing with truncated high bits - ABS S16 to S32 - sgpr S1 phi lowering - new opcodes for vcc-to-scc and scc-to-vcc copies - lowering for vgprS1-to-vcc copy (formally this is vgpr-to-vcc G_TRUNC) - S1 zext and sext lowering to select - uniform and divergent S1 AND(OR and XOR) lowering - inst-selected into SALU instruction - divergent phi with uniform inputs - divergent instruction with temporal divergent use, source instruction is defined as uniform(AMDGPURegBankSelect) - missing temporal divergence lowering - uniform phi, because of undef incoming, is assigned to vgpr. Will be fixed in AMDGPURegBankSelect via another fix in machine uniformity analysis.	2025-01-24 12:12:45 +01:00
Mirko Brkušanin	3def49cb64	[AMDGPU] Remove s_wakeup_barrier instruction (#122277 )	2025-01-10 11:30:22 +01:00
Jakub Chlanda	01a7d4e26b	[AMDGPU] Allow selection of BITOP3 for some 2 opcodes and B32 cases (#122267 ) This came up in downstream static analysis - as a dead code. Admittedly, it depends on what the intention was when checking for [`if (NumOpcodes == 2 && IsB32)`](https://github.com/llvm/llvm-project/blob/main/llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp#L3792C3-L3792C32) and I took a guess that for certain cases the selection should take place. If that's incorrect, that whole if statement can be removed, as it is after a check for: [`if (NumOpcodes < 4)`](https://github.com/llvm/llvm-project/blob/main/llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp#L3788)	2025-01-10 07:49:11 +01:00
Brox Chen	c744ed53a8	[AMDGPU][True16][MC] disable incorrect VOPC t16 instruction (#120271 ) The current VOPC t16 instructions are not implemented with the correct t16 pseudo. Thus the current t16/fake16 instructions are all in fake16 format. The plan is to remove the incorrect t16 instructions and refactor them. The first step is to remove them in this patch. The next step will be updating the t16/fake16 pseudo to the correct format and add back true16 instruction one by one in the upcoming patches.	2025-01-03 11:58:04 -05:00
Matt Arsenault	92ba7e3973	AMDGPU/GlobalISel: Do not try to form v_bitop3_b32 for SGPR results (#117940 )	2024-11-30 20:21:20 -05:00
Matt Arsenault	b4a16a78c2	AMDGPU: Match and Select BITOP3 on gfx950 (#117843 ) Co-authored-by: Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>	2024-11-27 01:31:19 -05:00
Matt Arsenault	2b9e947d43	AMDGPU: Builtins & Codegen support for v_cvt_scale_fp4<->f32 for gfx950 (#117743 ) OPSEL ASM Syntax for v_cvt_scalef32_pk_f32_fp4 : opsel:[x,y,z] where, x & y i.e. OPSEL[1 : 0] selects which src_byte to read. OPSEL ASM Syntax for v_cvt_scalef32_pk_fp4_f32 : opsel:[a,b,c,d] where, c & d i.e. OPSEL[3 : 2] selects which dst_byte to write. Co-authored-by: Pravin Jagtap <Pravin.Jagtap@amd.com>	2024-11-26 19:20:09 -05:00
Matt Arsenault	62584f32eb	AMDGPU: Builtins & Codegen support for v_cvt_scalef32_pk_f32_{fp8\|bf8} for gfx950 (#117741 ) OPSEL[0] determines low/high 16 bits of src0 to read. Co-authored-by: Pravin Jagtap <Pravin.Jagtap@amd.com>	2024-11-26 19:12:18 -05:00

1 2 3 4 5 ...

526 Commits