758 Commits

Author SHA1 Message Date
Mirko Brkušanin
fe5f49942e
[AMDGPU][GlobalISel] Lower G_FMINIMUM and G_FMAXIMUM (#151122)
Add GlobalISel lowering of G_FMINIMUM and G_FMAXIMUM following the same
logic as in SDag's expandFMINIMUM_FMAXIMUM.
Update AMDGPU legalization rules: Pre GFX12 now uses new lowering method
and make G_FMINNUM_IEEE and G_FMAXNUM_IEEE legal to match SDag.
2025-10-24 14:48:27 +02:00
paperchalice
14a42e64cf
[AMDGPU] Remove NoInfsFPMath uses (#163028)
Only `ninf` should be used.
2025-10-13 19:15:49 +08:00
Shilei Tian
2195fe7e01
[AMDGPU] Add the support for 45-bit buffer resource (#159702)
On new targets like `gfx1250`, the buffer resource (V#) now uses this
format:

```
base (57-bit): resource[56:0]
num_records (45-bit): resource[101:57]
reserved (6-bit): resource[107:102]
stride (14-bit): resource[121:108]
```

This PR changes the type of `num_records` from `i32` to `i64` in both
builtin and intrinsic, and also adds the support for lowering the new
format.

Fixes SWDEV-554034.

---------

Co-authored-by: Krzysztof Drewniak <Krzysztof.Drewniak@amd.com>
2025-09-24 11:12:02 -04:00
Stanislav Mekhanoshin
76efbc068a
[AMDGPU] Fix codegen to emit COPY instead of S_MOV_B64 for aperture regs (#158754) 2025-09-16 02:26:32 -07:00
Shilei Tian
1180c2ced0
[AMDGPU] Support lowering of cluster related instrinsics (#157978)
Since many code are connected, this also changes how workgroup id is lowered.

Co-authored-by: Jay Foad <jay.foad@amd.com>
Co-authored-by: Ivan Kosarev <ivan.kosarev@amd.com>
2025-09-12 21:11:17 -04:00
Anshil Gandhi
c6899193ed
[AMDGPU][Legalizer] Avoid pack/unpack for G_FSHR (#156796)
Scalarize G_FSHR only if the subtarget does not support V2S16 type.
2025-09-04 17:12:57 -06:00
Pierre van Houtryve
e2bd10cf16
[AMDGPU][gfx1250] Add 128B cooperative atomics (#156418)
- Add clang built-ins + sema/codegen
- Add IR Intrinsic + verifier
- Add DAG/GlobalISel codegen for the intrinsics
- Add lowering in SIMemoryLegalizer using a MMO flag.
2025-09-04 09:19:25 +00:00
paperchalice
595573d1ed
[AMDGPU] Remove ApproxFuncFPMath uses (#155578)
One of options in `resetTargetOptions`, this removes `ApproxFuncFPMath`
in AMDGPU part.
2025-08-28 11:09:01 +08:00
Tiger Ding
4ab14685a0
[AMDGPU] Narrow only on store to pow of 2 mem location (#150093)
Lowering in GlobalISel for AMDGPU previously always narrows to i32 on
truncating store regardless of mem size or scalar size, causing issues
with types like i65 which is first extended to i128 then stored as i64 +
i8 to i128 locations. Narrowing only on store to pow of 2 mem location
ensures only narrowing to mem size near end of legalization.

This LLVM defect was identified via the AMD Fuzzing project.
2025-08-19 00:04:27 +09:00
Stanislav Mekhanoshin
ea14834966
[AMDGPU] Per-subtarget DPP instruction classification (#153096)
This is NFCI at this point.
2025-08-11 15:41:02 -07:00
Stanislav Mekhanoshin
abc22f771e
[AMDGPU] Fix buffer addressing mode matching (#152584)
Starting in gfx1250, voffset and immoffset are zero-extended from 32
bits
to 45 bits before being added together.
2025-08-07 14:23:41 -07:00
Stanislav Mekhanoshin
b8eb61adc9
[AMDGPU] Implement addrspacecast from flat <-> private on gfx1250 (#152218) 2025-08-05 16:25:23 -07:00
paperchalice
8bacfb2538
[AMDGPU] Remove UnsafeFPMath uses (#151079)
Remove `UnsafeFPMath` in AMDGPU part, it blocks some bugfixes related to
clang and the ultimate goal is to remove `resetTargetOptions` method in
`TargetMachine`, see FIXME in `resetTargetOptions`.
See also
https://discourse.llvm.org/t/rfc-honor-pragmas-with-ffp-contract-fast

https://discourse.llvm.org/t/allowfpopfusion-vs-sdnodeflags-hasallowcontract

---------

Co-authored-by: Matt Arsenault <arsenm2@gmail.com>
2025-07-31 17:36:57 +08:00
Fabian Ritter
957ae8ad46
[AMDGPU][GISel] Use buildObjectPtrOffset instead of buildPtrAdd (#150899)
This concerns offset computations for kernargs and
RegBankLegalizeHelper::splitLoad, which should all be within the bounds of a
memory object. See #150392 for the motivation for introducing the
buildObjectPtrOffset function.

For SWDEV-516125.
2025-07-30 08:30:27 +02:00
Stanislav Mekhanoshin
3dfd939a16
[AMDGPU] gfx1250 V_{MIN|MAX}_{I|U}64 opcodes (#151256) 2025-07-29 19:13:51 -07:00
Changpeng Fang
6184ef1c2f
[AMDGPU] Support f64 atomics on gfx1250 (#151172)
- BUF/FLAT/GLOBAL_ADD/MIN/MAX_F64
   - DS_ADD_F64

Co-authored-by: Konstantin Zhuravlyov <Konstantin Zhuravlyov@amd.com>
2025-07-29 09:41:00 -07:00
Stanislav Mekhanoshin
2346968807
[AMDGPU] Add V_ADD|SUB|MUL_U64 gfx1250 opcodes (#150291) 2025-07-23 13:17:56 -07:00
Stanislav Mekhanoshin
2d6534b7da
[AMDGPU] gfx1250 64-bit relocations and fixups (#148951) 2025-07-15 17:13:42 -07:00
Changpeng Fang
868793fa8e
AMDGPU: Support intrinsic selection for gfx1250 wmma instructions (#148957)
Co-authored-by: Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>
Co-authored-by: Shilei Tian <Shilei.Tian@amd.com>
2025-07-15 15:25:05 -07:00
Stanislav Mekhanoshin
d0a4af725e
[AMDGPU] Add FeatureIEEEMinimumMaximumInsts. NFCI. (#147594)
Co-authored-by: Mirko Brkušanin <Mirko.Brkusanin@amd.com>
2025-07-08 14:32:44 -07:00
Matt Arsenault
c80282d333
AMDGPU: Directly select minimumnum/maximumnum with ieee_mode=0 (#141903)
The hardware min/max follow the IR rules with IEEE mode disabled,
so we can avoid the canonicalizes of the input. We lose the quieting
of a signaling nan if both inputs are nans, but we only require that
with strictfp.
2025-06-18 00:27:41 +09:00
Kazu Hirata
03f616eb3a
[llvm] Compare std::optional<T> to values directly (NFC) (#143340)
This patch transforms:

  X && *X == Y

to:

  X == Y

where X is of std::optional<T>, and Y is of T or similar.
2025-06-08 22:37:59 -07:00
Changpeng Fang
70e78be7dc
AMDGPU: Custom lower fptrunc vectors for f32 -> f16 (#141883)
The latest asics support v_cvt_pk_f16_f32 instruction. However current
implementation of vector fptrunc lowering fully scalarizes the vectors,
and the scalar conversions may not always be combined to generate the
packed one.
We made v2f32 -> v2f16 legal in
https://github.com/llvm/llvm-project/pull/139956. This work is an
extension to handle wider vectors. Instead of fully scalarization, we
split the vector to packs (v2f32 -> v2f16) to ensure the packed
conversion can always been generated.
2025-06-06 15:15:24 -07:00
Justin Bogner
b7bb256703
Warn on misuse of DiagnosticInfo classes that hold Twines (#137397)
This annotates the `Twine` passed to the constructors of the various
DiagnosticInfo subclasses with `[[clang::lifetimebound]]`, which causes
us to warn when we would try to print the twine after it had already
been destructed.

We also update `DiagnosticInfoUnsupported` to hold a `const Twine &`
like all of the other DiagnosticInfo classes, since this warning allows
us to clean up all of the places where it was being used incorrectly.
2025-05-28 12:26:39 -07:00
zGoldthorpe
bb7e559740
[AMDGPU] Correct bitshift legality transformation for small vectors (#140940)
Fix for a bug found by the AMD fuzzing project.

The legaliser would originally try to widen a small vector such as `<4 x
i1>` to a single `i16` during the legalisation of bitshifts, as it was
not originally written with consideration for vector operands. This
patch simply adds a guard to prohibit this transformation and allow
other legalisation transformations to step in.
2025-05-23 10:56:21 +02:00
Matt Arsenault
2e2bbcacf8
AMDGPU/GlobalISel: Start legalizing minimumnum and maximumnum (#140900)
This is the bare minimum to get the intrinsic to compile for AMDGPU,
and it's not optimal. We need to follow along closer with the existing
G_FMINNUM/G_FMAXNUM with custom lowering to handle the IEEE=0 case
better.

Just re-use the existing lowering for the old semantics for
G_FMINNUM/G_FMAXNUM. This does not change G_FMINNUM/G_FMAXNUM's
treatment,
nor try to handle the general expansion without an underlying min/max
variant (or with G_FMINIMUM/G_FMAXIMUM).
2025-05-21 17:00:45 +02:00
Chinmay Deshpande
3a5af231fd
[GlobalISel][AMDGPU] Fix handling of v2i128 type for AND, OR, XOR (#138574)
Current behavior crashes the compiler.

This bug was found using the AMDGPU Fuzzing project.

Fixes SWDEV-508816.
2025-05-08 19:31:28 +02:00
Pierre van Houtryve
0d0eed419f
[AMDGPU][Legalizer] Widen i16 G_SEXT_INREG (#131308)
It's better to widen them to avoid it being lowered into a G_ASHR + G_SHL. With this change we just extend to i32 then trunc the result.
2025-05-07 10:22:15 +02:00
Diana Picus
45d96df797
[AMDGPU] Support arbitrary types in amdgcn.dead (#134841)
Legalize the amdgcn.dead intrinsic to work with types other than i32. It
still generates IMPLICIT_DEFs.

Remove some of the previous code for selecting/reg bank mapping it for
32-bit types, since everything is done in the legalizer now.
2025-05-05 14:08:00 +02:00
Changpeng Fang
8b46b98b91
AMDGPU: Fix the double rounding issue in v2f64 -> v2f16 conversion (#135659)
On targets that support v_cvt_pk_f16_f32 instruction, if we make v2f64
-> v2f16 Legal, we will generate the following sequence of instructions:
  v_cvt_f32_f64_e32 v1, s[6:7]
  v_cvt_f32_f64_e32 v2, s[4:5]
  v_cvt_pk_f16_f32 v1, v2, v1
It possibly returns imprecise results due to double rounding. This patch
fixes the issue by not setting the conversion Legal. While we may still
expect the above sequence of code when unsafe fpmath is set, I hope
https://github.com/llvm/llvm-project/pull/134738 can address that
performance concern.

Fixes: SWDEV-523856
2025-04-17 11:15:49 -07:00
Vikram Hegde
123b0e2a1e
Reapply "[AMDGPU][GlobalISel] Properly handle lane op lowering for larger vector types (#132358)" (#135758)
reapply https://github.com/llvm/llvm-project/pull/132358, tests updated.
2025-04-16 11:28:28 +05:30
Kazu Hirata
f46cea5b42 Revert "[AMDGPU][GlobalISel] Properly handle lane op lowering for larger vector types (#132358)"
This reverts commit 62ef10a0f62c668e1fa7e357f56052f3364544c5.

Multiple buildbot failures have been reported:
https://github.com/llvm/llvm-project/pull/132358
2025-04-14 23:03:55 -07:00
Vikram Hegde
62ef10a0f6
[AMDGPU][GlobalISel] Properly handle lane op lowering for larger vector types (#132358)
Fixes https://github.com/llvm/llvm-project/issues/128650

Also adds few previously existing permlane64 tests which somehow got
removed in between.
2025-04-15 10:51:58 +05:30
Tim Gymnich
1d0005a69a
[GlobalISel][NFC] Rename GISelKnownBits to GISelValueTracking (#133466)
- rename `GISelKnownBits` to `GISelValueTracking` to analyze more than
just `KnownBits` in the future
2025-03-29 11:51:29 +01:00
Mariusz Sikora
4f5ccf22fa
[AMDGPU] Support image_bvh8_intersect_ray instruction and intrinsic. (#130041)
Co-authored-by: Ivan Kosarev <ivan.kosarev@amd.com>
2025-03-19 16:08:08 +01:00
Mariusz Sikora
575fde0995
[AMDGPU] Add intrinsic and MI for image_bvh_dual_intersect_ray (#130038)
- Add llvm.amdgcn.image.bvh.dual.intersect.ray intrinsic and
image_bvh_dual_intersect_ray machine instruction.
- Add llvm_v10i32_ty and llvm_v10f32_ty

---------

Co-authored-by: Mateja Marjanovic <mateja.marjanovic@amd.com>
2025-03-19 07:35:09 +01:00
Tim Gymnich
a5107be031
[NFC][AMDGPU][GlobalISel] Make LLTs constexpr (#131673)
- static const -> constexpr
2025-03-18 08:30:17 +07:00
Tim Gymnich
887cf1f8ce
[AMDGPU][GlobalISel] Enable vector reductions (#131413)
- Enable llvm vector reductions for AMDGPU.

fixes https://github.com/llvm/llvm-project/issues/114816
2025-03-17 14:25:30 -07:00
Mariusz Sikora
bbabf4e2b8
[AMDGPU][NFC] Update name for BVH Intersect Ray (#130036)
Co-authored-by: Ivan Kosarev <ivan.kosarev@amd.com>
2025-03-06 14:26:11 +01:00
Craig Topper
77cf6ecf78 [AMDGPU] Don't store an immediate in a Register. NFC 2025-03-04 22:17:17 -08:00
Brox Chen
e6f6a1e863
[AMDGPU][True16][CodeGen] uaddsat/usubsat true16 selection in gisel (#128233)
Enable gisel selection for uaddsat and usubsat in true16 flow

This patch includes:

1. Added VGPR_16_Lo128/VGPR_16 to register bank and update register info
for recognizing 16bit regclass id and bit width
2. uaddsat/usubsat test update
2025-02-25 17:09:34 -05:00
Matt Arsenault
1affadb7c6
AMDGPU: Drop legacy r600.read.global.size intrinsics from amdgcn (#128700)
These ancient intrinsics were still consumed by the backend for libclc,
which no longer uses them.
2025-02-25 22:21:03 +07:00
Matt Arsenault
37c341df28 Revert "AMDGPU: Don't canonicalize fminnum/fmaxnum if targets support IEEE fminimum(maximum)_num (#127711)"
This reverts commit 36eaf0daf5d6dd665d7c7a9ec38ea22f27709fed.

This is not a sound approach to dealing with this instruction change.
The new behavior is a different opcode pair, not a modifier on the
existing opcode.
2025-02-20 10:19:14 +07:00
Changpeng Fang
36eaf0daf5
AMDGPU: Don't canonicalize fminnum/fmaxnum if targets support IEEE fminimum(maximum)_num (#127711)
For targets that support IEEE fminimum_num/fmaximum_num, the
corresponding *_min_num_fXY/*_max_num_fXY instructions themselves
already did the canonicalization for the inputs. As a result, we do not
need to explicitly canonicalize the inputs for fminnum/fmaxnum.
2025-02-19 11:16:43 -08:00
Matt Arsenault
18ea6c9280
AMDGPU: Stop emitting an error on illegal addrspacecasts (#127487)
These cannot be static compile errors, and should be treated as
poison. Invalid casts may be introduced which are dynamically dead.

For example:

```
  void foo(volatile generic int* x) {
    __builtin_assume(is_shared(x));
    *x = 4;
  }

  void bar() {
    private int y;
    foo(&y); // violation, wrong address space
  }
```

This could produce a compile time backend error or not depending on
the optimization level. Similarly, the new test demonstrates a failure
on a lowered atomicrmw which required inserting runtime address
space checks. The invalid cases are dynamically dead, we should not
error, and the AtomicExpand pass shouldn't have to consider the details
of the incoming pointer to produce valid IR.

This should go to the release branch. This fixes broken -O0 compiles
with 64-bit atomics which would have started failing in
1d0370872f28ec9965448f33db1b105addaf64ae.
2025-02-17 21:03:50 +07:00
Ivan Kosarev
b7188f6313
[AMDGPU][NFC] Remove an unneeded return value. (#126739)
And rename the function to disassociate it from the one where generating
loading of the input value may actually fail.
2025-02-11 16:10:49 +00:00
Craig Topper
4a486e773e [CodeGen] Use Register/MCRegister::isPhysical. NFC 2025-01-18 23:37:03 -08:00
Tim Gymnich
2db2dc8ab9
[GlobalISel][NFC] Fix LLT Propagation (#119587)
Retain LLT type information by creating new LLTs from the original LLT
instead of only using the original scalar size.

This PR prepares for the [LLT FPInfo
RFC](https://discourse.llvm.org/t/rfc-globalisel-adding-fp-type-information-to-llt/83349/24)
where LLTs will carry additional floating point type information in
addition to the scalar size.
2024-12-12 09:47:46 -08:00
Jon Chesterfield
4e0ba801ea Revert "[amdgpu][lds] Simplify error diag path - lds variable names are no longer special"
Test case didn't run locally, investigating

This reverts commit 7bad469182ff2f6423ea209d5a1e81acca600568.
2024-12-08 12:00:13 +00:00
Jon Chesterfield
7bad469182 [amdgpu][lds] Simplify error diag path - lds variable names are no longer special 2024-12-08 11:26:33 +00:00