702 Commits

Author SHA1 Message Date
Mikhail Goncharov
8a849a2a56 Revert "Reapply "[AMDGPU][GlobalISel] Fix load/store of pointer vectors, buffer.*.pN (#110714)" v2 (#111708)"
This reverts commit 4b4a0d419c81b8b12a7dbb33dae1f7e9be91a88f.

New test fails on buildbots https://lab.llvm.org/buildbot/#/builders/63/builds/2039 https://lab.llvm.org/buildbot/#/builders/127/builds/1055
2024-10-10 13:37:44 +02:00
Krzysztof Drewniak
4b4a0d419c
Reapply "[AMDGPU][GlobalISel] Fix load/store of pointer vectors, buffer.*.pN (#110714)" v2 (#111708)
This adds `-disable-gisel-legality-check` to some gfx6 and gfx7 test
lines to prevent behavior mismatches between debug and release builds

The first attempted reapply was #111059

This reverts commit e075dcf7d270fd52dc837163ff24e8c872dfeb49.
2024-10-09 17:11:41 -05:00
Shilei Tian
88a239d292
[AMDGPU] Adopt new lowering sequence for fdiv16 (#109295)
The current lowering of `fdiv16` can generate incorrectly rounded result
in some cases. The new sequence was provided by the HW team, as shown
below written in C++.


```
half fdiv(half a, half b) {
  float a32 = float(a);
  float b32 = float(b);
  float r32 = 1.0f / b32;
  float q32 = a32 * r32;
  float e32 = -b32 * q32 + a32;
  q32 = e32 * r32 + q32;
  e32 = -b32 * q32 + a32;
  float tmp = e32 * r32;
  uin32_t tmp32 = std::bit_cast<uint32_t>(tmp);
  tmp32 = tmp32 & 0xff800000;
  tmp = std::bit_cast<float>(tmp32);
  q32 = tmp + q32;
  half q16 = half(q32);
  q16 = div_fixup_f16(q16);
  return q16;
}
```

Fixes SWDEV-477608.
2024-10-08 09:49:20 -04:00
NAKAMURA Takumi
e075dcf7d2 Revert "Reapply "[AMDGPU][GlobalISel] Fix load/store of pointer vectors, buffer.*.pN (#110714)" (#111059)"
This reverts commit 98a15c7b0c6ec129d371f0c121dbe9396c4f5609.
(llvmorg-20-init-8051-g98a15c7b0c6e)
2024-10-06 10:50:51 +09:00
Krzysztof Drewniak
98a15c7b0c
Reapply "[AMDGPU][GlobalISel] Fix load/store of pointer vectors, buffer.*.pN (#110714)" (#111059)
This reverts commit 650c41aad2eb43c634a05b2b5799a0c13a73b92f.

The test failures appear to be from conflicts with other PRs that landed around this time.
2024-10-04 12:33:26 -05:00
Jay Foad
8d13e7b8c3
[AMDGPU] Qualify auto. NFC. (#110878)
Generated automatically with:
$ clang-tidy -fix -checks=-*,llvm-qualified-auto $(find
lib/Target/AMDGPU/ -type f)
2024-10-03 13:07:54 +01:00
NAKAMURA Takumi
650c41aad2 Revert "[AMDGPU][GlobalISel] Fix load/store of pointer vectors, buffer.*.pN (#110714)"
Some builders has been failing tests.
```
Failed Tests (2):
  LLVM :: CodeGen/AMDGPU/GlobalISel/inst-select-load-global-old-legalization.mir
  LLVM :: CodeGen/AMDGPU/GlobalISel/inst-select-load-local.mir
```

This reverts commit ae5bd2a9f292037c605b2ec0ee31200581bd8701.
(llvmorg-20-init-7805-gae5bd2a9f292)
2024-10-03 15:38:34 +09:00
Krzysztof Drewniak
ae5bd2a9f2
[AMDGPU][GlobalISel] Fix load/store of pointer vectors, buffer.*.pN (#110714)
Certain pointer address spaces were not being correctly handled by the
GlobalISel lowering for buffer_load and buffer_store.

1. ptr addrspace(1) and addrspace(4) did not have rewrite patterns
defined for them, while p0 did, since those pointer types weren't in the
list of types that was iterated to form the patterns.
2. Vectors of pointers need to be bitcast to vectors of the
corresponding scalars, since there doesn't seem to be a good way to
define the rewrite patterns for buffer_load/store of those types

The need to bitcast vectors of pointers was also revealed to affect
ordinary `G_LOAD` and `G_STORE` in some cases, so
`shouldBitcastLoadStore()` has been fixed to handle it properly.
2024-10-02 13:46:56 -05:00
sstipano
4f951503b9
Reland "[AMDGPU][GlobalIsel] Use isRegisterClassType for G_FREEZE and G_IMPLICIT_DEF (#101331)" (#109958)
S192 type was missing from AllScalarTypes.
2024-09-25 13:02:29 +02:00
NAKAMURA Takumi
4fc08b6cd5 Revert "[AMDGPU][GlobalIsel] Use isRegisterClassType for G_FREEZE and G_IMPLICIT_DEF (#101331)"
This reverts commit 63b2595846b86b4e4eb9afba5e97dd64e8135c10.
(llvmorg-20-init-6782-g63b2595846b8)
A few bots have been failing on `inst-select-unmerge-values.mir`
2024-09-25 10:02:49 +09:00
sstipano
63b2595846
[AMDGPU][GlobalIsel] Use isRegisterClassType for G_FREEZE and G_IMPLICIT_DEF (#101331)
G_FREEZE was legal for <13 x S32> which caused an infinite loop in the
combiner
2024-09-24 07:49:40 +02:00
Jay Foad
e55d6f5ea2
[AMDGPU] Simplify and improve codegen for llvm.amdgcn.set.inactive (#107889)
Always generate v_cndmask_b32 instead of modifying exec around
v_mov_b32. This is expected to be faster because
modifying exec generally causes pipeline stalls.
2024-09-11 17:16:06 +01:00
Stanislav Mekhanoshin
0745219d4a
[AMDGPU] Add target intrinsic for s_buffer_prefetch_data (#107293) 2024-09-06 11:41:21 -07:00
Changpeng Fang
26b0bef192
AMDGPU: Use pattern to select instruction for intrinsic llvm.fptrunc.round (#105761)
Use GCNPat instead of Custom Lowering to select instructions for
intrinsic llvm.fptrunc.round. "SupportedRoundMode : TImmLeaf" is used as
a predicate to select only when the rounding mode is supported.
"as_hw_round_mode : SDNodeXForm" is developed to translate the round
modes to the corresponding ones that hardware recognizes.
2024-08-29 11:43:58 -07:00
Jay Foad
564bd20658
[AMDGPU][GlobalISel] Save a copy in one case of addrspacecast (#104789)
Refactor legalization of addrspacecast local/private -> flat to avoid
building a copy in the nonnull case.
2024-08-19 18:22:29 +01:00
Changpeng Fang
16929219b0
AMDGPU: Add tonearest and towardzero roundings for intrinsic llvm.fptrunc.round (#104486)
This work simplifies and generalizes the instruction definition for
intrinsic llvm.fptrunc.round. We no longer name the instruction with the
rounding mode. Instead, we introduce an immediate operand for the
rounding mode for the pseudo instruction. This immediate will be used to
set up the hardware mode register at the time the real instruction is
generated. We name the pseudo instruction as FPTRUNC_ROUND_F16_F32 (for
f32 -> f16), which is easy to generalize for other types.

"round.towardzero" and "round.tonearest" are added for f32 -> f16
truncating, in addition to the existing "round.upward" and
"round.downward". Other rounding modes are not supported by hardware at
this moment.
2024-08-17 11:22:47 -07:00
Sumanth Gundapaneni
0ee32c4573
[AMDGPU] Implement llvm.lrint intrinsic lowering (#98931)
This patch enabled the target-independent lowering of llvm.lrint via
GlobalISel.
For SelectionDAG, the instrinsic is custom lowered for AMDGPU.
2024-07-24 23:34:31 +04:00
Jessica Del
6a1b119035
[AMDGPU] Add intrinsics for atomic struct buffer loads (#100140)
Mark these intrinsics as atomic loads within LLVM to prevent hoisting
out of loops in cases where
the load is considered invariant.

Similar to https://github.com/llvm/llvm-project/pull/97707, but for
struct buffer loads.
2024-07-24 11:05:28 +02:00
Sumanth Gundapaneni
fc832d5349
[AMDGPU] Implement llvm.lround intrinsic lowering. (#98970)
This patch enables the target-independent lowering of llvm.lround via
GlobalISel. For SelectionDAG, the instrinsic is custom lowered for
AMDGPU. In order to support vector floating point input for llvm.lround,
this patch extends the target independent APIs and provide support for
scalarizing. pr98950 is needed to let verifier allow vector floating
point types
2024-07-23 20:34:34 +04:00
Jessica Del
ec7f8e1113
[AMDGPU] Add intrinsic for raw atomic buffer loads (#97707)
Upstream the intrinsics `llvm.amdgcn.raw.atomic.buffer.load`
and `llvm.amdgcn.raw.atomic.ptr.buffer.load`.

These additional intrinsics mark atomic buffer loads
as atomic to LLVM by removing the `IntrReadMem`
attribute. Otherwise, it could hoist these
intrinsics out of loops in cases where LLVM marks
them as invariant. That can cause issues such as
infinite loops.

Continuation of https://reviews.llvm.org/D138786
with the additional use in the fat buffer lowering,
more test cases and the additional ptr versions 
of these intrinsics.

---------

Co-authored-by: rtayl <>
Co-authored-by: Jay Foad <jay.foad@amd.com>
Co-authored-by: Mariusz Sikora <mariusz.sikora@amd.com>
2024-07-22 18:04:49 +02:00
Carl Ritson
62aa596ba1
[AMDGPU] Add no return image_sample intrinsics and instructions (#97542)
An appropriately configured image resource descriptor can trigger
image_sample instructions to store outputs directly to a linked memory
location instead of returning to VGPRs.

This is opaque to the backend as instruction encoding is unchanged;
however, a mechanism is require to allow frontends to communicate that
these instructions do not require destination VGPRs and store to memory.
Flagging these as stores means they will not be optimized away.
2024-07-20 17:26:58 +09:00
Matt Arsenault
611212fc9a
AMDGPU/GlobalISel: Legalize atomicrmw fmin/fmax (#97048)
We only handled the easy LDS case before. Handle the other address
spaces
with the more complicated legality logic.
2024-07-03 23:30:05 +02:00
Matt Arsenault
28d142a485 AMDGPU/GlobalISel: Make pk f16 atomicrmw fadd legal for gfx908
The subtarget features for these are a bit of a mess; the no return
version should probably be implied by the with-return feature.
2024-06-28 11:33:42 +02:00
Matt Arsenault
4477ff6836
AMDGPU: Remove ds_fmin/ds_fmax intrinsics (#96739)
These have been replaced with atomicrmw.
2024-06-27 15:35:24 +02:00
Vikram Hegde
35f7b60aa6
[AMDGPU] Extend permlane16, permlanex16 and permlane64 intrinsic lowering for generic types (#92725)
These are incremental changes over #89217 , with core logic being the
same. This patch along with #89217 and #91190 should get us ready to enable 64
bit optimizations in atomic optimizer.
2024-06-26 09:24:09 +05:30
Vikram Hegde
5feb32ba92
[AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (#89217)
This patch is intended to be the first of a series with end goal to
adapt atomic optimizer pass to support i64 and f64 operations (along
with removing all unnecessary bitcasts). This legalizes 64 bit readlane,
writelane and readfirstlane ops pre-ISel

---------

Co-authored-by: vikramRH <vikhegde@amd.com>
2024-06-25 14:35:19 +05:30
Matt Arsenault
70c8b9c24a
AMDGPU: Remove ds atomic fadd intrinsics (#95396)
These have been replaced with atomicrmw fadd
2024-06-23 10:30:20 +02:00
Matt Arsenault
8520061281
AMDGPU: Support local atomicrmw fmin/fmax for float/double (#95590)
This has always been supported. Somehow, we ended up with 2
copies of clang builtins for this case, and the newer one
erroneously requires gfx8-insts.
2024-06-18 18:34:34 +02:00
Matt Arsenault
3b997294d6
AMDGPU: Remove .v2bf16 buffer atomic fadd intrinsics (#95783)
These are redundant with the unsuffixed versions, and have a name
collision with surprising behavior when the base intrinsic is used with
v2bf16.

The global and flat variants should be removed too, but those are complicated
due to using v2i16 in place of the natural v2bf16. Those cases can soon be
completely deleted in favor of atomicrmw.

The GlobalISel codegen change is broken and substitutes handling as bf16
for handling as f16, but it's a bug that this passed the IRTranslator in the first
place.
2024-06-17 21:44:52 +02:00
Matt Arsenault
4cf1a19b7e Reapply "AMDGPU: Handle legal v2f16/v2bf16 atomicrmw fadd for global/flat (#95394)"
This reverts commit 95b77d90aae10725ea692e120aac083ef1c1297d.
2024-06-17 16:34:35 +02:00
Matt Arsenault
405882db94 AMDGPU: Fix legalization for llvm.amdgcn.struct.buffer.atomic.fadd.v2bf16 2024-06-17 15:20:23 +02:00
Nico Weber
95b77d90aa Revert "AMDGPU: Handle legal v2f16/v2bf16 atomicrmw fadd for global/flat (#95394)"
This reverts commit 5021e6dd548323e1169be3d466d440009e6d1f8e.
Breaks tests, see https://github.com/llvm/llvm-project/pull/95394#issuecomment-2169394503
2024-06-15 12:33:13 -04:00
Christudasan Devadasan
5e9fcb9572
[AMDGPU][GISel] Use datalayout alignment for buffer-load legalization (#95578)
It matches the legalization of buffer loads similar to the SelectionDAG.
2024-06-15 18:19:52 +05:30
Matt Arsenault
5021e6dd54
AMDGPU: Handle legal v2f16/v2bf16 atomicrmw fadd for global/flat (#95394)
Unlike the existing fadd cases, choose to ignore the requirement for
amdgpu-unsafe-fp-atomics in case of fine-grained memory access. This
is to minimize migration pain to the new atomic control metadata. This
should not break any users, as the atomic intrinsics are still
directly consumed, and clang does not yet produce vector FP atomicrmw.
2024-06-15 09:58:12 +02:00
Matt Arsenault
0a9a5f989f
AMDGPU: Legalize atomicrmw fadd for v2f16/v2bf16 for local memory (#95393)
Make this legal for gfx940 and gfx12
2024-06-15 09:55:04 +02:00
Mirko Brkušanin
1e6a82b8ef
[AMDGPU] Legalize and select raw/struct_buffer_load with tfe (#93310) 2024-05-27 14:09:17 +02:00
Leon Clark
e1c06c380c
[AMDGPU] Fix error in #88512. (#92770)
Fixes error in GlobalISel CTLZ lowering caused by
[#88512](https://github.com/llvm/llvm-project/pull/88512).

---------

Co-authored-by: Leon Clark <leoclark@amd.com>
2024-05-20 20:32:53 +01:00
Leon Clark
fb2c6597e3
[AMDGPU] Use LSH for lowering ctlz_zero_undef.i8/i16 (#88512)
Use LSH to lower ctlz_zero_undef instead of subtracting leading zeros
for i8 and i16.

Related to [77615](https://github.com/llvm/llvm-project/pull/77615).

---------

Co-authored-by: Leon Clark <leoclark@amd.com>
2024-05-19 21:45:24 +01:00
Kazu Hirata
c18bcd0a57
[Target] Use StringRef::operator== instead of StringRef::equals (NFC) (#91072) (#91138)
I'm planning to remove StringRef::equals in favor of
StringRef::operator==.

- StringRef::operator==/!= outnumber StringRef::equals by a factor of
  38 under llvm/ in terms of their usage.

- The elimination of StringRef::equals brings StringRef closer to
  std::string_view, which has operator== but not equals.

- S == "foo" is more readable than S.equals("foo"), especially for
  !Long.Expression.equals("str") vs Long.Expression != "str".
2024-05-05 13:43:10 -07:00
Emma Pilkington
a04714701f
[AMDGPU] Add a trap lowering workaround for gfx11 (#85854)
On gfx11 shaders run with PRIV=1, which causes `s_trap 2` to be treated
as a nop, which means it isn't a correct lowering for the trap
intrinsic. As a workaround, this commit instead lowers the trap
intrinsic to instructions that simulate the behavior of s_trap 2.

Fixes: SWDEV-438421
2024-04-24 09:43:54 -04:00
Jay Foad
46163688e1
[AMDGPU] Allow WorkgroupID intrinsics in amdgpu_gfx functions (#89773)
With GFX12 architected SGPRs the workgroup ids are trivially available
in any function called from a compute entrypoint.
2024-04-24 09:35:40 +01:00
Matt Arsenault
5b6db43f29
AMDGPU: Simplify DS atomicrmw fadd handling (#89468)
DS atomic fadd F32 does respect the denormal mode, so we do not need to
consider the expected FP mode or unsafe-fp-atomics attribute. They don't
respect the rounding mode, but we don't care outside of strictfp. This
also reveals the fp-mode-is-flush check has been missing in the cases
that should be considering it alongside amdgpu-unsafe-fp-atomics.

This also stops considering the case where flushing is enabled for f64,
as flushing isn't mandated and we barely handle this case.
2024-04-22 12:22:54 +02:00
Evgenii Kudriashov
d365a45cb3
[GlobalISel] Introduce G_TRAP, G_DEBUGTRAP, G_UBSANTRAP (#84941)
Here we introduce three new GMIR instructions to cover a set of trap
intrinsics. The idea behind it is that generic intrinsics shouldn't be
used with G_INTRINSIC opcode.

These new instructions can match perfectly with existing trap ISD nodes.
It allows X86, AArch64, RISCV and Mips to reuse SelectionDAG patterns for
selection and avoid manual selection. However AMDGPU is an exception. It
selects traps during legalization regardless SelectionDAG or GlobalISel.

Since there are not many places where traps are used, this change
attempts to clean up all the usages of G_INTRINSIC with trap intrinsics. So,
there is no stage when both G_TRAP and
G_INTRINSIC_W_SIDE_EFFECTS(@llvm.trap) are allowed.
2024-03-23 13:12:44 +01:00
David Green
601e102bdb
[CodeGen] Use LocationSize for MMO getSize (#84751)
This is part of #70452 that changes the type used for the external
interface of MMO to LocationSize as opposed to uint64_t. This means the
constructors take LocationSize, and convert ~UINT64_C(0) to
LocationSize::beforeOrAfter(). The getSize methods return a
LocationSize.

This allows us to be more precise with unknown sizes, not accidentally
treating them as unsigned values, and in the future should allow us to
add proper scalable vector support but none of that is included in this
patch. It should mostly be an NFC.

Global ISel is still expected to use the underlying LLT as it needs, and
are not expected to see unknown sizes for generic operations. Most of
the changes are hopefully fairly mechanical, adding a lot of getValue()
calls and protecting them with hasValue() where needed.
2024-03-17 18:15:56 +00:00
Joseph Huber
1fc5e50ceb
[AMDGPU] Implement 'llvm.get.fpenv' and 'llvm.set.fpenv' (#83906)
Summary:
This patch implements the LLVM floating point environment control
intrinsics and also exposes it through clang. We encode the floating
point environment as a 64-bit value that simply concatenates the values
of the mode registers and the current trap status. We only fetch the
bits relevant for floating point instructions. That is, rounding mode,
denormalization mode, ieee, dx10 clamp, debug, enabled traps, f16
overflow, and active exceptions.
2024-03-06 08:11:54 -06:00
Pierre van Houtryve
756166e342
[AMDGPU] Improve detection of non-null addrspacecast operands (#82311)
Use IR analysis to infer when an addrspacecast operand is nonnull, then
lower it to an intrinsic that the DAG can use to skip the null check.

I did this using an intrinsic as it's non-intrusive. An alternative
would have been to allow something like `!nonnull` on `addrspacecast`
then lower that to a custom opcode (or add an operand to the
addrspacecast MIR/DAG opcodes), but it's a lot of boilerplate for just
one target's use case IMO.

I'm hoping that when we switch to GISel that we can move all this logic
to the MIR level without losing info, but currently the DAG doesn't see
enough so we need to act in CGP.

Fixes: SWDEV-316445
2024-03-01 14:01:10 +01:00
Ivan Kosarev
dfa1d9b027
[AMDGPU][NFC] Have helpers to deal with encoding fields. (#82772)
These are hoped to provide more convenient and less error prone
facilities to encode and decode fields than manually defined constants
and functions.
2024-02-23 17:34:55 +00:00
Joseph Huber
11fcae69db
[LLVM] Add __builtin_readsteadycounter intrinsic and builtin for realtime clocks (#81331)
Summary:
This patch adds a new intrinsic and builtin function mirroring the
existing `__builtin_readcyclecounter`. The difference is that this
implementation targets a separate counter that some targets have which
returns a fixed frequency clock that can be used to determine elapsed
time, this is different compared to the cycle counter which often has
variable frequency.

This patch only adds support for the NVPTX and AMDGPU targets.

This is done as a new and separate builtin rather than an argument to
`readcyclecounter` to avoid needing to change existing code and to make
the separation more explicit.
2024-02-13 10:06:25 -06:00
Jay Foad
5b015229b7
[AMDGPU] Use LLT::isPointerOrPointerVector in legalizer (#81582) 2024-02-13 09:01:45 +00:00
Jay Foad
d57515bd10
[LLT] Add and use isPointerVector and isPointerOrPointerVector. NFC. (#81283) 2024-02-13 08:21:35 +00:00