526 Commits

Author SHA1 Message Date
Stanislav Mekhanoshin
d0ee82040c
[AMDGPU] Add s_barrier_init|join|leave instructions (#153296) 2025-08-12 15:07:07 -07:00
Fabian Ritter
e9ece175f9
[AMDGPU][GISel] Only fold flat offsets if they are inbounds (#153001)
For flat memory instructions where the address is supplied as a base address
register with an immediate offset, the memory aperture test ignores the
immediate offset. Currently, ISel does not respect that, which leads to
miscompilations where valid input programs crash when the address computation
relies on the immediate offset to get the base address in the proper memory
aperture. Global or scratch instructions are not affected.

This patch only selects flat instructions with immediate offsets from address
computations with the inbounds flag: If the address computation does not leave
the bounds of the allocated object, it cannot leave the bounds of the memory
aperture and is therefore safe to handle with an immediate offset.

Relevant tests are in fold-gep-offset.ll.

Analogous to #132353 for SDAG (which is not yet in a mergeable state, its
progress is currently blocked by #146076).

Fixes SWDEV-516125 for GISel.
2025-08-12 10:14:20 +02:00
Changpeng Fang
1e815ced81
[AMDGPU] Use SDNodeXForm to select a few VOP3P modifiers, NFC (#151907)
It is not necessary to use ComplexPattern to select VOP3PModsNeg, VOP3PModsNegs
and VOP3PModsNegAbs. We can use SDNodeXForm instead.
2025-08-04 12:51:48 -07:00
Changpeng Fang
7d2332391f
[AMDGPU] Fix destination op_sel for v_cvt_scale32_* and v_cvt_sr_* (#151411)
GFX950 uses OP_SEL[MSB:LSB] for both src reads and dest writes. So this
patch essentially revert the work from
https://github.com/llvm/llvm-project/pull/151286 regarding dest writes.
2025-07-30 16:15:50 -07:00
Changpeng Fang
180281b8ec
[AMDGPU] Fix op_sel settings for v_cvt_scale32_* and v_cvt_sr_* (#151286)
For OPF_OPSEL_SRCBYTE: Vector instruction uses OPSEL[1:0] to specify a
byte
select for the first source operand. So op_sel [0, 0], [1, 0], [0, 1]
and [1, 1] should map
to byte 0, 1, 2 and 3, respectively.

For OPF_OPSEL_DSTBYTE: OPSEL is used as a destination byte select.
OPSEL[2:3]
specify which byte of the destination to write to. Note that the order
of the bits is different
from that of OPF_OPSEL_SRCBYT. So the mapping should be: op_sel [0, 0],
[0, 1], [1, 0]
and [1, 1] map to byte 0, 1, 2 and 3, respectively.

Fixes: SWDEV-544901
2025-07-30 12:24:51 -07:00
Stanislav Mekhanoshin
7eaf1f2b2d
[AMDGPU] Bitop3 opcodes for gfx1250 (#151235) 2025-07-29 15:36:56 -07:00
Stanislav Mekhanoshin
d99238263c
[AMDGPU] Implement v_mad_u32/v_mad_nc_u|i64_u32 on gfx1250 (#151226) 2025-07-29 15:06:35 -07:00
Changpeng Fang
3b66d4a987
[AMDGPU] Support builtin/intrinsics for async loads/stores on gfx1250 (#151058) 2025-07-29 08:20:05 -07:00
Changpeng Fang
d7a38a94cd
[AMDGPU] Support builtin/intrinsics for load monitors on gfx1250 (#150540) 2025-07-24 16:23:33 -07:00
Stanislav Mekhanoshin
96e5eed92a
[AMDGPU] Select VMEM prefetch for llvm.prefetch on gfx1250 (#150493)
We have a choice to use a scalar or vector prefetch for an uniform
pointer. Since we do not have scalar stores our scalar cache is
practically readonly. The rw argument of the prefetch intrinsic is
used to force vector operation even for an uniform case. On GFX12
scalar prefetch will be used anyway, it is still useful but it will
only bring data to L2.
2025-07-24 13:22:50 -07:00
Stanislav Mekhanoshin
c6e560a25b
[AMDGPU] Select scale_offset for scratch instructions on gfx1250 (#150111) 2025-07-22 15:24:55 -07:00
Stanislav Mekhanoshin
a0973de745
[AMDGPU] Select scale_offset for global instructions on gfx1250 (#150107)
Also switches immediate offset to signed for the subtarget.
2025-07-22 15:04:52 -07:00
Stanislav Mekhanoshin
a0aebb1935
[AMDGPU] Select scale_offset with SMEM instructions (#150078) 2025-07-22 13:26:28 -07:00
Diana Picus
20d8398825
[AMDGPU] ISel & PEI for whole wave functions (#145858)
Whole wave functions are functions that will run with a full EXEC mask.
They will not be invoked directly, but instead will be launched by way
of a new intrinsic, `llvm.amdgcn.call.whole.wave` (to be added in
a future patch). These functions are meant as an alternative to the
`llvm.amdgcn.init.whole.wave` or `llvm.amdgcn.strict.wwm` intrinsics.

Whole wave functions will set EXEC to -1 in the prologue and restore the
original value of EXEC in the epilogue. They must have a special first
argument, `i1 %active`, that is going to be mapped to EXEC. They may
have either the default calling convention or amdgpu_gfx. The inactive
lanes need to be preserved for all registers used, active lanes only for
the CSRs.

At the IR level, arguments to a whole wave function (other than
`%active`) contain poison in their inactive lanes. Likewise, the return
value for the inactive lanes is poison.

This patch contains the following work:
* 2 new pseudos, SI_SETUP_WHOLE_WAVE_FUNC and SI_WHOLE_WAVE_FUNC_RETURN
  used for managing the EXEC mask. SI_SETUP_WHOLE_WAVE_FUNC will return
  a SReg_1 representing `%active`, which needs to be passed into
  SI_WHOLE_WAVE_FUNC_RETURN.
* SelectionDAG support for generating these 2 new pseudos and the
  special handling of %active. Since the return may be in a different
  basic block, it's difficult to add the virtual reg for %active to
  SI_WHOLE_WAVE_FUNC_RETURN, so we initially generate an IMPLICIT_DEF
  which is later replaced via a custom inserter.
* Expansion of the 2 pseudos during prolog/epilog insertion. PEI also
  marks any used VGPRs as WWM registers, which are then spilled and
  restored with the usual logic.

Future patches will include the `llvm.amdgcn.call.whole.wave` intrinsic
and a lot of optimization work (especially in order to reduce spills
around function calls).

---------

Co-authored-by: Matt Arsenault <Matthew.Arsenault@amd.com>
Co-authored-by: Shilei Tian <i@tianshilei.me>
2025-07-21 10:39:09 +02:00
Stanislav Mekhanoshin
cfa918bec1
[AMDGPU] Select flat GVS atomics on gfx1250 (#149554) 2025-07-18 12:31:29 -07:00
Changpeng Fang
868793fa8e
AMDGPU: Support intrinsic selection for gfx1250 wmma instructions (#148957)
Co-authored-by: Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>
Co-authored-by: Shilei Tian <Shilei.Tian@amd.com>
2025-07-15 15:25:05 -07:00
Fabian Ritter
a1edb1dbc6
[AMDGPU] Fix broken uses of isLegalFLATOffset and splitFlatOffset (#147469)
The last parameter of these functions used to be `Signed`, and it looks
like a few calls weren't updated when that was changed to `FlatVariant`.
Effectively, the functions were called with `FlatVariant=SALU` due to
integer promotions, which doesn't make any sense.
2025-07-08 11:18:36 +02:00
Shoreshen
99df642168
[AMDGPU] Re-Re-apply: Implement vop3p complex pattern optmization for gisel (#146984)
Reverts llvm/llvm-project#146982

Fix up reported building error for
https://github.com/llvm/llvm-project/pull/136262 with:
```
FAILED: lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUInstructionSelector.cpp.o 
CCACHE_CPP2=yes CCACHE_HASHDIR=yes CCACHE_SLOPPINESS=pch_defines,time_macros /usr/bin/ccache /home/b/sanitizer-aarch64-linux/build/llvm_build0/bin/clang++ -DGTEST_HAS_RTTI=0 -DLLVM_EXPORTS -D_DEBUG -D_GLIBCXX_ASSERTIONS -D_GNU_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -I/home/b/sanitizer-aarch64-linux/build/build_default/lib/Target/AMDGPU -I/home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/lib/Target/AMDGPU -I/home/b/sanitizer-aarch64-linux/build/build_default/include -I/home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/include -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Werror -Werror=date-time -Werror=unguarded-availability-new -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -pedantic -Wno-long-long -Wc++98-compat-extra-semi -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wsuggest-override -Wstring-conversion -Wmisleading-indentation -Wctad-maybe-unsupported -fdiagnostics-color -ffunction-sections -fdata-sections -O3 -DNDEBUG -std=c++17 -fvisibility=hidden  -fno-exceptions -funwind-tables -fno-rtti -UNDEBUG -MD -MT lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUInstructionSelector.cpp.o -MF lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUInstructionSelector.cpp.o.d -o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUInstructionSelector.cpp.o -c /home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp
/home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp:4566:1: error: non-void function does not return a value in all control paths [-Werror,-Wreturn-type]
 4566 | }
      | ^
1 error generated.
ninja: build stopped: subcommand failed.
```

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-07-08 15:10:44 +08:00
Shoreshen
5b8304d6b9
Revert "[AMDGPU] Re-apply: Implement vop3p complex pattern optmization for gisel" (#146982)
Reverts llvm/llvm-project#136262

Due to building error:
```
FAILED: lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUInstructionSelector.cpp.o 
CCACHE_CPP2=yes CCACHE_HASHDIR=yes CCACHE_SLOPPINESS=pch_defines,time_macros /usr/bin/ccache /home/b/sanitizer-aarch64-linux/build/llvm_build0/bin/clang++ -DGTEST_HAS_RTTI=0 -DLLVM_EXPORTS -D_DEBUG -D_GLIBCXX_ASSERTIONS -D_GNU_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -I/home/b/sanitizer-aarch64-linux/build/build_default/lib/Target/AMDGPU -I/home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/lib/Target/AMDGPU -I/home/b/sanitizer-aarch64-linux/build/build_default/include -I/home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/include -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Werror -Werror=date-time -Werror=unguarded-availability-new -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -pedantic -Wno-long-long -Wc++98-compat-extra-semi -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wsuggest-override -Wstring-conversion -Wmisleading-indentation -Wctad-maybe-unsupported -fdiagnostics-color -ffunction-sections -fdata-sections -O3 -DNDEBUG -std=c++17 -fvisibility=hidden  -fno-exceptions -funwind-tables -fno-rtti -UNDEBUG -MD -MT lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUInstructionSelector.cpp.o -MF lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUInstructionSelector.cpp.o.d -o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUInstructionSelector.cpp.o -c /home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp
/home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp:4566:1: error: non-void function does not return a value in all control paths [-Werror,-Wreturn-type]
 4566 | }
      | ^
1 error generated.
ninja: build stopped: subcommand failed.
```
2025-07-04 09:43:44 +08:00
Shoreshen
db03c27763
[AMDGPU] Re-apply: Implement vop3p complex pattern optmization for gisel (#136262)
This is a fix up for patch
https://github.com/llvm/llvm-project/pull/130234, which is reverted in
https://github.com/llvm/llvm-project/pull/136249

The main reason of building failure are:

1. 
   ```

/home/botworker/bbot/amdgpu-offload-rhel-9-cmake-build-only/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp:
In function ‘llvm::SmallVector<std::pair<const llvm::MachineOperand*,
SrcStatus> > getSrcStats(const llvm::MachineOperand*, const
llvm::MachineRegisterInfo&, searchOptions, int)’:

/home/botworker/bbot/amdgpu-offload-rhel-9-cmake-build-only/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp:4669:
error: could not convert ‘Statlist’ from ‘SmallVector<[...],4>’ to
‘SmallVector<[...],3>’
    4669 |   return Statlist;
   ```
2. 
   ```

/home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp:4554:1:
error: non-void function does not return a value in all control paths
[-Werror,-Wreturn-type]
    4554 | }
        | ^

/home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp:4644:39:
error: overlapping comparisons always evaluate to true
[-Werror,-Wtautological-overlap-compare]
4644 | (Stat >= SrcStatus::NEG_START || Stat <= SrcStatus::NEG_END)) {
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~

/home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp:4893:66:
error: captured structured bindings are a C++20 extension
[-Werror,-Wc++20-extensions]
4893 | [=](MachineInstrBuilder &MIB) { MIB.addImm(getAllKindImm(Op)); },
| ^

/home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp:4890:9:
note: 'Op' declared here
    4890 |   auto [Op, Mods] = selectVOP3PModsImpl(&Root, MRI, IsDOT);
        |         ^

/home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp:4894:52:
error: captured structured bindings are a C++20 extension
[-Werror,-Wc++20-extensions]
4894 | [=](MachineInstrBuilder &MIB) { MIB.addImm(Mods); } // src_mods
        |                                                    ^

/home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp:4890:13:
note: 'Mods' declared here
    4890 |   auto [Op, Mods] = selectVOP3PModsImpl(&Root, MRI, IsDOT);
        |             ^

/home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp:4899:50:
error: captured structured bindings are a C++20 extension
[-Werror,-Wc++20-extensions]
4899 | [=](MachineInstrBuilder &MIB) { MIB.addReg(Op->getReg()); },
        |                                                  ^

/home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp:4890:9:
note: 'Op' declared here
    4890 |   auto [Op, Mods] = selectVOP3PModsImpl(&Root, MRI, IsDOT);
        |         ^

/home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp:4900:50:
error: captured structured bindings are a C++20 extension
[-Werror,-Wc++20-extensions]
4900 | [=](MachineInstrBuilder &MIB) { MIB.addImm(Mods); } // src_mods
        |                                                  ^

/home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp:4890:13:
note: 'Mods' declared here
    4890 |   auto [Op, Mods] = selectVOP3PModsImpl(&Root, MRI, IsDOT);
        |             ^
   6 errors generated.
   ```

Both error cannot be reproduced at my local machine, the fix applied
are:
1. In `llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp` function
`getSrcStats` replace
   ```
SmallVector<std::pair<const MachineOperand *, SrcStatus>, 4> Statlist;
   ```
   with
   ```
   SmallVector<std::pair<const MachineOperand *, SrcStatus>> Statlist;
   ```
2. In `llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp` function
`AMDGPUInstructionSelector::selectVOP3PRetHelper` replace
   ```
   auto [Op, Mods] = selectVOP3PModsImpl(&Root, MRI, IsDOT);
   ```
   with
   ```
   auto Results = selectVOP3PModsImpl(&Root, MRI, IsDOT);
   const MachineOperand *Op = Results.first;
   unsigned Mods = Results.second;
   ```

These change hasn't be testified since both errors cannot be reproduced
in local
2025-07-04 09:23:59 +08:00
Matt Arsenault
ed155ff9f2
AMDGPU: Avoid report_fatal_error on ds ordered intrinsics (#145202) 2025-06-23 13:09:09 +09:00
Nicolai Hähnle
3bee9ba015
AMDGPU/GFX12: Fix s_barrier_signal_isfirst for single-wave workgroups (#143634)
Barrier instructions are no-ops in single-wave workgroups. This includes
s_barrier_signal_isfirst, which will leave SCC unmodified.

Model this correctly (via an implicit use of SCC) and ensure SCC==1
before the barrier instruction (if the wave is the only one of the
workgroup, then it is the first).

---------

Co-authored-by: Matt Arsenault <arsenm2@gmail.com>
2025-06-19 11:22:49 -07:00
Justin Bogner
b7bb256703
Warn on misuse of DiagnosticInfo classes that hold Twines (#137397)
This annotates the `Twine` passed to the constructors of the various
DiagnosticInfo subclasses with `[[clang::lifetimebound]]`, which causes
us to warn when we would try to print the twine after it had already
been destructed.

We also update `DiagnosticInfoUnsupported` to hold a `const Twine &`
like all of the other DiagnosticInfo classes, since this warning allows
us to clean up all of the places where it was being used incorrectly.
2025-05-28 12:26:39 -07:00
Krzysztof Drewniak
4bdd116b80
[AMDGPU] Add a new amdgcn.load.to.lds intrinsic (#137425)
This PR adds a amdgns_load_to_lds intrinsic that abstracts over loads to
LDS from global (address space 1) pointers and buffer fat pointers
(address space 7), since they use the same API and "gather from a
pointer to LDS" is something of an abstract operation.

This commit adds the intrinsic and its lowerings for addrspaces 1 and 7,
and updates the MLIR wrappers to use it (loosening up the restrictions
on loads to LDS along the way to match the ground truth from target
features).

It also plumbs the intrinsic through to clang.
2025-05-19 07:15:04 -07:00
Matt Arsenault
1e353fa5c3
AMDGPU: Fix -Wextra (#138539)
Another stupid gcc warning. Ideally we would directly use the enum type,
but subregister indexes are emitted as an anonymous enum.

Fixes #125548
2025-05-05 20:02:39 +02:00
Diana Picus
45d96df797
[AMDGPU] Support arbitrary types in amdgcn.dead (#134841)
Legalize the amdgcn.dead intrinsic to work with types other than i32. It
still generates IMPLICIT_DEFs.

Remove some of the previous code for selecting/reg bank mapping it for
32-bit types, since everything is done in the legalizer now.
2025-05-05 14:08:00 +02:00
Jay Foad
886f1199f0
[AMDGPU] Use variadic isa<>. NFC. (#137016) 2025-04-24 08:19:09 +01:00
Shoreshen
a3f38f27cd
Revert "[AMDGPU] Implement vop3p complex pattern optmization for gisel" (#136249)
Reverts llvm/llvm-project#130234
2025-04-17 23:45:30 -04:00
Shoreshen
a04580f71b
[AMDGPU] Implement vop3p complex pattern optmization for gisel (#130234)
Seeking opportunities to optimize VOP3P instructions by altering opsel,
opsel_hi, neg, neg_hi bits

Tests differences:
1. fix op_sel_hi bit for inline constant:
   1. `CodeGen/AMDGPU/packed-fp32.ll`
2. use neg bit to remove xor with 0x80008000
   1. `CodeGen/AMDGPU/strict_fsub.f16.ll`
   2. `CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.fdot2.ll`
   3. `CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.sdot4.ll`
   4. `CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.sdot8.ll`
   5. `CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.udot2.ll`
   6. `CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.udot4.ll`
   7. `CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.udot8.ll`
3. Remove xor 0x80008000, and use opsel, opsel_hi to remove alignbit
   1. `CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.sdot2.ll`
2025-04-18 10:56:20 +08:00
Juan Manuel Martinez Caamaño
beae0e9f1a
[AMDGPU] Use a target feature to enable __builtin_amdgcn_global_load_lds on gfx9/10 (#133055)
This patch introduces the `vmem-to-lds-load-insts` target feature, which
can be used to enable builtins `__builtin_amdgcn_global_load_lds` and
`__builtin_amdgcn_raw_ptr_buffer_load_lds` on platforms which have this
feature.

This feature is only available on gfx9/10.

A limitation of using a common target feature for both builtins is that
we could have made `__builtin_amdgcn_raw_ptr_buffer_load_lds` available
on gfx6,7,8.
2025-04-02 20:00:09 +02:00
Tim Gymnich
1d0005a69a
[GlobalISel][NFC] Rename GISelKnownBits to GISelValueTracking (#133466)
- rename `GISelKnownBits` to `GISelValueTracking` to analyze more than
just `KnownBits` in the future
2025-03-29 11:51:29 +01:00
Mariusz Sikora
4f5ccf22fa
[AMDGPU] Support image_bvh8_intersect_ray instruction and intrinsic. (#130041)
Co-authored-by: Ivan Kosarev <ivan.kosarev@amd.com>
2025-03-19 16:08:08 +01:00
Mariusz Sikora
575fde0995
[AMDGPU] Add intrinsic and MI for image_bvh_dual_intersect_ray (#130038)
- Add llvm.amdgcn.image.bvh.dual.intersect.ray intrinsic and
image_bvh_dual_intersect_ray machine instruction.
- Add llvm_v10i32_ty and llvm_v10f32_ty

---------

Co-authored-by: Mateja Marjanovic <mateja.marjanovic@amd.com>
2025-03-19 07:35:09 +01:00
Mariusz Sikora
6b47bba440
[AMDGPU] Add intrinsics and MIs for ds_bvh_stack_* (#130007)
New intrinsics / instructions :
int_amdgcn_ds_bvh_stack_push4_pop1_rtn / ds_bvh_stack_push4_pop1_rtn_b32
int_amdgcn_ds_bvh_stack_push8_pop1_rtn / ds_bvh_stack_push8_pop1_rtn_b32
int_amdgcn_ds_bvh_stack_push8_pop2_rtn / ds_bvh_stack_push8_pop2_rtn_b64

Co-authored-by: Mateja Marjanovic <mateja.marjanovic@amd.com>
2025-03-17 09:13:20 +01:00
Brox Chen
15a5b3a192
[AMDGPU][True16][CodeGen] gisel true16 for ICMP (#128913)
GlobalIsel true16 selection for ICMP
2025-03-13 12:03:17 -04:00
Mariusz Sikora
bbabf4e2b8
[AMDGPU][NFC] Update name for BVH Intersect Ray (#130036)
Co-authored-by: Ivan Kosarev <ivan.kosarev@amd.com>
2025-03-06 14:26:11 +01:00
Mariusz Sikora
cd3acd1bff
[AMDGPU] Remove unused s_barrier_{init,join,leave} instructions (#129548) 2025-03-04 17:52:43 +01:00
Jay Foad
44607666b3
[AMDGPU] Simplify conditional expressions. NFC. (#129228)
Simplfy `cond ? val : false` to `cond && val` and similar.
2025-03-03 10:40:49 +00:00
Brox Chen
c896f7bdaa
[AMDGPU][True16][CodeGen] build_vector pattern in true16 (#118904)
build_vector pattern in true16 SDAG
2025-02-21 14:02:12 -05:00
Diana Picus
611a648327
[AMDGPU] Add llvm.amdgcn.dead intrinsic (#123190)
Shaders that use the llvm.amdgcn.init.whole.wave intrinsic need to
explicitly preserve the inactive lanes of VGPRs of interest by adding
them as dummy arguments. The code usually looks something like this:

```
define amdgcn_cs_chain void f(active vgpr args..., i32 %inactive.vgpr1, ..., i32 %inactive.vgprN) {
entry:
  %c = call i1 @llvm.amdgcn.init.whole.wave()
  br i1 %c, label %shader, label %tail

shader:
  [...]

tail:
  %inactive.vgpr.arg1 = phi i32 [ %inactive.vgpr1, %entry], [poison, %shader]
  [...]
  ; %inactive.vgpr* then get passed into a llvm.amdgcn.cs.chain call
```

Unfortunately, this kind of phi node will get optimized away and the
backend won't be able to figure out that it's ok to use the active lanes
of `%inactive.vgpr*` inside `shader`.

This patch fixes the issue by introducing a llvm.amdgcn.dead intrinsic,
whose result can be used as a PHI operand instead of the poison. This
will be selected to an IMPLICIT_DEF, which the backend can work with.

At the moment, the llvm.amdgcn.dead intrinsic works only on i32 values.
Support for other types can be added later if needed.
2025-02-20 09:25:48 +01:00
Fabian Ritter
8615f9aaff
[AMDGPU] Replace gfx940 and gfx941 with gfx942 in llvm (#126763)
gfx940 and gfx941 are no longer supported. This is one of a series of
PRs to remove them from the code base.

This PR removes all non-documentation occurrences of gfx940/gfx941 from
the llvm directory, and the remaining occurrences in clang.

Documentation changes will follow.

For SWDEV-512631
2025-02-19 10:20:48 +01:00
Brox Chen
33d401fb15
[AMDGPU][True16][CodeGen] true16 codegen for icmp and is_fpclass (#124757)
True16 codegen pattern for icmp patterns and is_fpclass
2025-01-30 12:18:00 -05:00
Petar Avramovic
0ee037b861
AMDGPU/GlobalISel: AMDGPURegBankLegalize (#112864)
Lower G_ instructions that can't be inst-selected with register bank
assignment from AMDGPURegBankSelect based on uniformity analysis.
- Lower instruction to perform it on assigned register bank
- Put uniform value in vgpr because SALU instruction is not available
- Execute divergent instruction in SALU - "waterfall loop"

Given LLTs on all operands after legalizer, some register bank
assignments require lowering while other do not.
Note: cases where all register bank assignments would require lowering
are lowered in legalizer.

AMDGPURegBankLegalize goals:
- Define Rules: when and how to perform lowering
- Goal of defining Rules it to provide high level table-like brief
  overview of how to lower generic instructions based on available
  target features and uniformity info (uniform vs divergent).
- Fast search of Rules, depends on how complicated Rule.Predicate is
- For some opcodes there would be too many Rules that are essentially
  all the same just for different combinations of types and banks.
  Write custom function that handles all cases.
- Rules are made from enum IDs that correspond to each operand.
  Names of IDs are meant to give brief description what lowering does
  for each operand or the whole instruction.
- AMDGPURegBankLegalizeHelper implements lowering algorithms

Since this is the first patch that actually enables -new-reg-bank-select
here is the summary of regression tests that were added earlier:
- if instruction is uniform always select SALU instruction if available
- eliminate back to back vgpr to sgpr to vgpr copies of uniform values
- fast rules: small differences for standard and vector instruction
- enabling Rule based on target feature - salu_float
- how to specify lowering algorithm - vgpr S64 AND to S32
- on G_TRUNC in reg, it is up to user to deal with truncated bits
  G_TRUNC in reg is treated as no-op.
- dealing with truncated high bits - ABS S16 to S32
- sgpr S1 phi lowering
- new opcodes for vcc-to-scc and scc-to-vcc copies
- lowering for vgprS1-to-vcc copy (formally this is vgpr-to-vcc G_TRUNC)
- S1 zext and sext lowering to select
- uniform and divergent S1 AND(OR and XOR) lowering - inst-selected into
  SALU instruction
- divergent phi with uniform inputs
- divergent instruction with temporal divergent use, source instruction
  is defined as uniform(AMDGPURegBankSelect) - missing temporal
  divergence lowering
- uniform phi, because of undef incoming, is assigned to vgpr. Will be
  fixed in AMDGPURegBankSelect via another fix in machine uniformity
  analysis.
2025-01-24 12:12:45 +01:00
Mirko Brkušanin
3def49cb64
[AMDGPU] Remove s_wakeup_barrier instruction (#122277) 2025-01-10 11:30:22 +01:00
Jakub Chlanda
01a7d4e26b
[AMDGPU] Allow selection of BITOP3 for some 2 opcodes and B32 cases (#122267)
This came up in downstream static analysis - as a dead code.

Admittedly, it depends on what the intention was when checking for [`if
(NumOpcodes == 2 &&
IsB32)`](https://github.com/llvm/llvm-project/blob/main/llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp#L3792C3-L3792C32)
and I took a guess that for certain cases the selection should take
place.

If that's incorrect, that whole if statement can be removed, as it is
after a check for: [`if (NumOpcodes <
4)`](https://github.com/llvm/llvm-project/blob/main/llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp#L3788)
2025-01-10 07:49:11 +01:00
Brox Chen
c744ed53a8
[AMDGPU][True16][MC] disable incorrect VOPC t16 instruction (#120271)
The current VOPC t16 instructions are not implemented with the correct
t16 pseudo. Thus the current t16/fake16 instructions are all in fake16
format.

The plan is to remove the incorrect t16 instructions and refactor them.
The first step is to remove them in this patch. The next step will be
updating the t16/fake16 pseudo to the correct format and add back true16
instruction one by one in the upcoming patches.
2025-01-03 11:58:04 -05:00
Matt Arsenault
92ba7e3973
AMDGPU/GlobalISel: Do not try to form v_bitop3_b32 for SGPR results (#117940) 2024-11-30 20:21:20 -05:00
Matt Arsenault
b4a16a78c2
AMDGPU: Match and Select BITOP3 on gfx950 (#117843)
Co-authored-by: Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>
2024-11-27 01:31:19 -05:00
Matt Arsenault
2b9e947d43
AMDGPU: Builtins & Codegen support for v_cvt_scale_fp4<->f32 for gfx950 (#117743)
OPSEL ASM Syntax for v_cvt_scalef32_pk_f32_fp4 : opsel:[x,y,z]
where, x & y i.e. OPSEL[1 : 0] selects which src_byte to read.

OPSEL ASM Syntax for v_cvt_scalef32_pk_fp4_f32 : opsel:[a,b,c,d]
where, c & d i.e. OPSEL[3 : 2] selects which dst_byte  to write.

Co-authored-by: Pravin Jagtap <Pravin.Jagtap@amd.com>
2024-11-26 19:20:09 -05:00
Matt Arsenault
62584f32eb
AMDGPU: Builtins & Codegen support for v_cvt_scalef32_pk_f32_{fp8|bf8} for gfx950 (#117741)
OPSEL[0] determines low/high 16 bits of src0 to read.

Co-authored-by: Pravin Jagtap <Pravin.Jagtap@amd.com>
2024-11-26 19:12:18 -05:00