56 Commits

Author SHA1 Message Date
Krzysztof Drewniak
e2d2affc70
[AMDGPU][LowerBufferFatPointers] Fix crash with select false (#166471)
If the input to LowerBufferFatPointers is such that the resource- and
offset-specific `select` instructions generated for a `select` on `ptr
addrspae(7)` fold away, the pass would crash when trying to replace an
instruction with itself. This commit resolves the issue.

Fixes https://github.com/iree-org/iree/issues/22551
2025-11-05 19:21:52 +00:00
Krzysztof Drewniak
01a7c880d2
[AMDGPU][LowerBufferFatPointers] Erase dead ptr(7) intrinsics (#160798)
Fix a crash that would arise when intrinsics like llvm.masked.load.T.p7
were left in the module when AMDGPULowerBufferFatPointers was applied
and so a captures(none) annotation would be applied to a non-pointer
value, triggering a verifier failure.

---------

Co-authored-by: Shilei Tian <i@tianshilei.me>
2025-09-29 10:46:45 -05:00
Krzysztof Drewniak
96ce9f9d64
[AMDGPU] Prevent re-visits in LowerBufferFatPointers (#159168)
Fixes https://github.com/iree-org/iree/issues/22001

The visitor in SplitPtrStructs would re-visit instructions if an
instruction earlier in program order caused a recursive visit() call via
getPtrParts(). This would cause instructions to be processed multiple
times.

As a consequence of this, PHI nodes could be added to the Conditionals
array multiple times, which would to a conditinoal that was already
simplified being processed multiple times. After the code moved to
InstSimplifyFolder, this re-processing, combined with more agressive
simplifications, would lead to an attempt to replace an instruction with
itself, causing an assertion failure and crash.

This commit resolves the issue and adds the reduced form of the crashing
input as a test.
2025-09-16 18:02:18 -07:00
Ivan Kosarev
faca8c9ed4
[AMDGPU][NFC] Only include CodeGenPassBuilder.h where needed. (#154769)
Saves around 125-210 MB of compilation memory usage per source for
roughly one third of our backend sources, ~60 MB on average.
2025-08-22 10:05:06 +01:00
Krzysztof Drewniak
7f27482a32
[AMDGPU][LowerBufferFatPointers] Fix lack of rewrite when loading/storing null (#154128)
Fixes #154056.

The fat buffer lowering pass was erroniously detecting that it did not
need to run on functions that only load/store to the null constant (or
other such constants). We thought this would be covered by specializing
constants out to instructions, but that doesn't account foc trivial
constants like null. Therefore, we check the operands of instructions
for buffer fat pointers in order to find such constants and ensure the
pass runs.

---------

Co-authored-by: Nikita Popov <github@npopov.com>
2025-08-18 12:32:54 -05:00
Orlando Cazalet-Hyams
54f92c7806
[RemoveDIs][AMDGPU] Replace defunct getAssignmentMarkers call (#153212)
Not quite NFC as it looks like the original intrinsic-handling code
never got updated to use records. This was never caught because that
code wasn't tested. I've adjusted an existing test so the behaviour is
now covered.
2025-08-12 17:20:38 +01:00
Alexander Richardson
87ad9122e5
[AMDGPULowerBufferFatPointers] Handle ptrtoaddr by extending the offset
Reviewed By: krzysz00

Pull Request: https://github.com/llvm/llvm-project/pull/139413
2025-08-09 16:28:12 -07:00
Jeremy Morse
c9ceb9b75f
[DebugInfo] Remove intrinsic-flavours of findDbgUsers (#149816)
This is one of the final remaining debug-intrinsic specific codepaths
out there, and pieces of cross-LLVM infrastructure to do with debug
intrinsics.
2025-07-21 17:49:25 +01:00
Jeremy Morse
040bffc633
[DebugInfo][AMDGPU] Convert a debug-intrinsic method to debug records (#149505)
It appears this wasn't handled in the initial migration a year ago,
seemingly because it didn't lead to any test failures. Find and interpret
debug records in the same way the original code handled intrinsics. Note
that we drop a call to copyMetadata: debug records can't carry additional
metadata like instructions, nothing relies on this in AMDGPU AFAIUI.
2025-07-21 10:07:14 +01:00
Matt Arsenault
0fa0c3c233
AMDGPU: Use reportFatalUsageError in AMDGPULowerBufferFatPointers (#145132) 2025-06-21 14:24:30 +09:00
Jeremy Morse
97ac6483aa
[DebugInfo][RemoveDIs] Delete debug-info-format flag (#143746)
This flag was used to let us incrementally introduce debug records
into LLVM, however everything is now using records. It serves no
purpose now, so delete it.
2025-06-12 11:51:58 +01:00
Matt Arsenault
6b81483e28
AMDGPU: Start using LLVMContext errors in buffer fat pointer lowering (#142014)
Avoid using report_fatal_error. Many more uses that should be converted
in the pass remain.
2025-05-30 07:52:45 +02:00
Devon Loehr
63de20c0de
Reland "Add macro to suppress -Wunnecessary-virtual-specifier" (#141091)
This fixes #139614 on non-clang compilers by moving `__has_warning`
completely inside the `#if defined(__clang__)` block. This prevents a
parse failure from compilers which don't recognize `__has_warning`.

Original description:
Followup to #138741.

This adds the requested macro to silence
`-Wunnecessary-virtual-specifier` when declaring virtual anchor
functions in `final` classes, per [LLVM
policy](https://llvm.org/docs/CodingStandards.html#provide-a-virtual-method-anchor-for-classes-in-headers).

It also cleans up any remaining instances of the warning, allowing us to
stop disabling it when we build LLVM.
2025-05-28 12:15:22 +02:00
Kazu Hirata
1e8e662174
[AMDGPU] Remove unused includes (NFC) (#141376)
These are identified by misc-include-cleaner.  I've filtered out those
that break builds.  Also, I'm staying away from llvm-config.h,
config.h, and Compiler.h, which likely cause platform- or
compiler-specific build failures.
2025-05-24 14:48:46 -07:00
Philip Reames
e4e7a7e64e Revert "Add macro to suppress -Wunnecessary-virtual-specifier (#139614)"
This reverts commit 0954c9d487e7cb30673df9f0ac125f71320d2936.

It breaks the build when built with gcc version 11.4.0 (Ubuntu 11.4.0-1ubuntu1~22.04).
2025-05-21 11:31:26 -07:00
Devon Loehr
0954c9d487
Add macro to suppress -Wunnecessary-virtual-specifier (#139614)
Followup to #138741.

This adds the requested macro to silence
`-Wunnecessary-virtual-specifier` when declaring virtual anchor
functions in `final` classes, per [LLVM
policy](https://llvm.org/docs/CodingStandards.html#provide-a-virtual-method-anchor-for-classes-in-headers).

It also cleans up any remaining instances of the warning, allowing us to
stop disabling it when we build LLVM.
2025-05-21 10:54:36 -07:00
Krzysztof Drewniak
6b9da28b2b
[AMDGPU][LowerBufferFatPointers] Handle addrspacecast null to p7 (#140775)
Some application code operating on generic pointers (that then gete
initialized to buffer fat pointers) may perform tests against nullptr.
After address space inference, this results in comparisons against
`addrspacecast (ptr null to ptr addrspace(7))`, which were crashing.

However, while general casts to ptr addrspace(7) from generic pointers
aren't supposted, it is possible to cast null pointers to the all-zerose
bufer resource and 0 offset, which this patch adds.

It also adds a TODO for casting _out_ of buffer resources, which isn't
implemented here but could be.
2025-05-20 16:13:01 -07:00
Krzysztof Drewniak
4bdd116b80
[AMDGPU] Add a new amdgcn.load.to.lds intrinsic (#137425)
This PR adds a amdgns_load_to_lds intrinsic that abstracts over loads to
LDS from global (address space 1) pointers and buffer fat pointers
(address space 7), since they use the same API and "gather from a
pointer to LDS" is something of an abstract operation.

This commit adds the intrinsic and its lowerings for addrspaces 1 and 7,
and updates the MLIR wrappers to use it (loosening up the restrictions
on loads to LDS along the way to match the ground truth from target
features).

It also plumbs the intrinsic through to clang.
2025-05-19 07:15:04 -07:00
Jonathan Thackray
6e49f73825
Reland [llvm] Add support for llvm IR atomicrmw fminimum/fmaximum instructions (#137701)
This patch adds support for LLVM IR atomicrmw `fmaximum` and `fminimum`
instructions.

These mirror the `llvm.maximum.*` and `llvm.minimum.*` instructions, but
are atomic and use IEEE754 2019 handling for NaNs, which is different to
`fmax` and `fmin`. See:
     https://llvm.org/docs/LangRef.html#llvm-minimum-intrinsic
for more details.

Future changes will allow this LLVM IR to be lowered to specialised
assembler instructions on suitable targets, such as AArch64.
2025-04-30 22:06:37 +01:00
Krzysztof Drewniak
94dc0a0e7b
[NFC][AMDGPU] Drop recursive types in LowerBufferFatPointers (#137735)
Now that IRMover and the rest of LLVM don't allow recursive types, drop
support for them from the clone of the IRMover code used when lowering
buffer fat pointer operations.
2025-04-29 07:23:40 -07:00
Jonathan Thackray
7ee0097b48
Revert "[llvm] Add support for llvm IR atomicrmw fminimum/fmaximum instructions" (#137657)
Reverts llvm/llvm-project#136759 due to bad interaction with c792b25e4
2025-04-28 16:53:36 +01:00
Jonathan Thackray
ba420d8122
[llvm] Add support for llvm IR atomicrmw fminimum/fmaximum instructions (#136759)
This patch adds support for LLVM IR atomicrmw `fmaximum` and `fminimum`
instructions.

These mirror the `llvm.maximum.*` and `llvm.minimum.*` instructions, but
are atomic and use IEEE754 2019 handling for NaNs, which is different to
`fmax` and `fmin`. See:
     https://llvm.org/docs/LangRef.html#llvm-minimum-intrinsic
for more details.

Future changes will allow this LLVM IR to be lowered to specialised
assembler instructions on suitable targets, such as AArch64.
2025-04-28 15:31:44 +01:00
Jay Foad
886f1199f0
[AMDGPU] Use variadic isa<>. NFC. (#137016) 2025-04-24 08:19:09 +01:00
Kazu Hirata
e7c07a0210
[AMDGPU] Construct SmallVector with iterator ranges (NFC) (#136415) 2025-04-19 09:09:41 -07:00
Krzysztof Drewniak
4a7b34d03c
Revert "[AMDGPU] Add buffer.fat.ptr.load.lds intrinsic wrapping raw rsrc version (#133015)" (#134871)
This reverts commit d1a05721172272f7aab685b56d99e86814a15bff.

There was further discussion on the PR about whether the intinsics
should exist in this form.
2025-04-08 11:00:41 -05:00
Rahul Joshi
a3754ade63
[NFC][LLVM][AMDGPU] Cleanup pass initialization for AMDGPU (#134410)
- Remove calls to pass initialization from pass constructors.
- https://github.com/llvm/llvm-project/issues/111767
2025-04-07 17:27:50 -07:00
Krzysztof Drewniak
d1a0572117
[AMDGPU] Add buffer.fat.ptr.load.lds intrinsic wrapping raw rsrc version (#133015)
Add a buffer_fat_ptr_load_lds intrinsic, by analogy with
global_load_lds, which enables using `ptr addrspace(7)` to set the rsrc
and offset arguments to raw_ptr_buffer_load_lds.
2025-04-07 15:42:22 -05:00
Krzysztof Drewniak
f23bb530cf
[AMDGPULowerBufferFatPointers] Use InstSimplifyFolder during rewrites (#134137)
This PR updates AMDGPULowerBufferFatPointers to use the
InstSimplifyFolder
when creating IR during buffer fat pointer lowering.

This shouldn't cause any large functional changes and might improve the
quality of the generated code.
2025-04-03 10:12:18 -05:00
Pedro Lobo
73e23f899f
[AMDGPU] Change placeholder from undef to poison (#130858)
Replace `undef` debug info with `poison`.
2025-03-12 12:53:27 +00:00
Krzysztof Drewniak
f8cc509b69
Reapply "[AMDGPU] Handle memcpy()-like ops in LowerBufferFatPointers (#126621)" (#129078)
This reverts commit 1559a65efaf327f9c72e14d4bb1834f076e7fc20.

Fixed test (I suspect broken by unrelated change in the merge)
2025-02-27 11:26:13 -06:00
Kazu Hirata
1559a65efa Revert "[AMDGPU] Handle memcpy()-like ops in LowerBufferFatPointers (#126621)"
This reverts commit 469757efafebdd5772d993fca4dc0dfa7cbda17c.

Multiple buildbot failures have been reported:
https://github.com/llvm/llvm-project/pull/126621
2025-02-26 14:35:07 -08:00
Krzysztof Drewniak
469757efaf
[AMDGPU] Handle memcpy()-like ops in LowerBufferFatPointers (#126621)
Since LowerBufferFatPointers runs before PreISelIntrinsicLowering, which
normally handles unsupported memcpy()s,, and since you can't have a
`noalias {ptr addrspace(8), i32}` becasue it crashes later passes,
manually expand memcpy()s involving buffer fat pointers to loops.

Additionally, though they're unlikely to be used, this commit adds
support for memset().

This commit doesn't implement writing direct-to-LDS loads as the
intrinsics, but leaves the option in the future.
2025-02-26 16:03:32 -06:00
Krzysztof Drewniak
f7d03707d1
[AMDGPU] Generalize amdgcn.make.buffer.rsrc to fat pointers (#126828)
Attempting to pass a `ptr addrspace(7)` to functions that take `ptr`
arguments produces undesirable `addrspacecast(addrspacecast(p8 x to p7)
to p0) => addrspacecast(p8 x to p0)` folds. This results in illegal GEP
operations on buffer resources, which can't be GEP'd. (However, note
that, while unimplemneted, addressspacecast from ptr addrspace(7) to ptr
is legal - it's just an effective address computation)

To resolve this problem, and thus prevent illegal
`getelementptr T, ptr addrspace(8) %x, ...` s from being produces, this
commit extends amdgcn.make.buffer.rsrc to also be variadic in its result
type, auto-upgrading old manglings.

The logic for handling a make.buffer.rsrc in instruction selection
remains untouched and expects the output type to be a ptr addrspace(8),
as does the Clang lowering for its builtin (the pointer-to-pointer
version might want a different name in clang). LowerBufferFatPointers
has been updated to lower
amdgcn.make.buffer.rsrc.p7.p* to amdgcn.make.buffer.rsrc.p8.p* .

This'll also make exposing buffer fat pointers in Clang easier, since
you don't have to cast between a `__amdgcn_rsrc_t` and a pointer.
2025-02-18 14:15:28 -06:00
Krzysztof Drewniak
934c97dd16
[LowerBufferFatPointers] Fix support for GEP T, p7, <N x T> idxs (#126126)
The lowering for GEP didn't properly support the case where the pointer
argument was being implicitly broadcast by a vector of indices. Fix
that.

---------

Co-authored-by: Matt Arsenault <arsenm2@gmail.com>
2025-02-11 18:22:50 -06:00
Krzysztof Drewniak
697c1883f1
Reapply "[AMDGPU] Handle natively unsupported types in addrspace(7) lowering" (#123660)
(#123657)

This reverts commit 64749fb01538fba2b56d9850497d5f3a626cabc2.

Adds a constructor to VecSlice to address the failure
2025-01-20 16:12:17 -06:00
Krzysztof Drewniak
64749fb015
Revert "[AMDGPU] Handle natively unsupported types in addrspace(7) lowering" (#123657)
Reverts llvm/llvm-project#110572

Seem to have broken a buildbot, not sure why
https://lab.llvm.org/buildbot/#/builders/108/builds/8346
2025-01-20 13:14:04 -05:00
Krzysztof Drewniak
3805355ef6
[AMDGPU] Handle natively unsupported types in addrspace(7) lowering (#110572)
The current lowering for ptr addrspace(7) assumed that the instruction
selector can handle arbtrary LLVM types, which is not the case. Code
generation can't deal with
- Values that aren't 8, 16, 32, 64, 96, or 128 bits long
- Aggregates (this commit only handles arrays of scalars, more may come)
- Vectors of more than one byte
- 3-word values that aren't a vector of 3 32-bit values (for axample, a
<6 x half>)

This commit adds a buffer contents type legalizer that adds the needed
bitcasts, zero-extensions, and splits into subcompnents needed to
convert a load or store operation into one that can be successfully
lowered through code generation.

In the long run, some of the involved bitcasts (though potentially not
the buffer operation splitting) ought to be handled by the instruction
legalizer, but SelectionDAG makes this difficult.

It also takes advantage of the new `nuw` flag on `getelementptr` when
lowering GEPs to offset additions.

We don't currently plumb through `nsw` on GEPs since that should likely
be a separate change and would require declaring what we mean by "the
address" in the context of the GEP guarantees.
2025-01-20 11:33:35 -06:00
Nikita Popov
4f614a8f7c
[AMDGPULowerBufferFatPointers] Use typeIncompatible() (#122902)
Use typeIncompatible() to drop attributes incompatible with the new
argument/return type, instead of keeping a custom list.
2025-01-14 16:55:49 +01:00
Acim Maravic
cc3aab580b
[AMDGPU] Handle nontemporal and amdgpu.last.use metadata in amdgpu-lower-buffer-fat-pointers (#120139) 2025-01-14 11:22:20 +01:00
Krzysztof Drewniak
3b0f506c87
[AMDGPU] Support nuw and nusw in buffer fat pointer lowering (#115039)
This commit usis the `nuw` flag on `getelemnetptr` to set the `nuw` flag
on buffer offset additions, and also moves from `inbounds` to the looser
`nusw` for the existing case.
2024-11-06 11:42:47 -06:00
Kazu Hirata
e1fdaaafc5 [AMDGPU] Work around a warning
This patch works around:

  llvm/lib/Target/AMDGPU/AMDGPULowerBufferFatPointers.cpp:1101:13:
  error: enumeration values 'USubCond' and 'USubSat' not handled in
  switch [-Werror,-Wswitch]

I've notified the author in #105568.
2024-09-06 09:35:13 -07:00
Jessica Del
ec7f8e1113
[AMDGPU] Add intrinsic for raw atomic buffer loads (#97707)
Upstream the intrinsics `llvm.amdgcn.raw.atomic.buffer.load`
and `llvm.amdgcn.raw.atomic.ptr.buffer.load`.

These additional intrinsics mark atomic buffer loads
as atomic to LLVM by removing the `IntrReadMem`
attribute. Otherwise, it could hoist these
intrinsics out of loops in cases where LLVM marks
them as invariant. That can cause issues such as
infinite loops.

Continuation of https://reviews.llvm.org/D138786
with the additional use in the fat buffer lowering,
more test cases and the additional ptr versions 
of these intrinsics.

---------

Co-authored-by: rtayl <>
Co-authored-by: Jay Foad <jay.foad@amd.com>
Co-authored-by: Mariusz Sikora <mariusz.sikora@amd.com>
2024-07-22 18:04:49 +02:00
Jay Foad
6bba44e8dc [AMDGPU] Use member initializers. NFC. 2024-07-16 15:29:10 +01:00
Nikita Popov
2d209d964a
[IR] Add getDataLayout() helpers to BasicBlock and Instruction (#96902)
This is a helper to avoid writing `getModule()->getDataLayout()`. I
regularly try to use this method only to remember it doesn't exist...

`getModule()->getDataLayout()` is also a common (the most common?)
reason why code has to include the Module.h header.
2024-06-27 16:38:15 +02:00
Nikita Popov
5ef768d22b
[AMDGPULowerBufferFatPointers] Expand const exprs using fat pointers (#95558)
Expand all constant expressions that use fat pointers upfront, so that
the rewriting logic only has to deal with instructions and not the
constant expression variants as well.

My primary motivation is to remove the creation of illegal constant
expressions (mul and shl) from this pass, but this also cuts down quite
a bit on the amount of duplicate logic.
2024-06-17 09:28:09 +02:00
Nikita Popov
0774000e32
[AMDGPULowerBufferFatPointers] Fix offset-only ptrtoint (#95543)
For ptrtoint that truncates to the offset only, the expansion generated
a shift by the bit width, which is poison. Instead, we should return the
offset directly.

(The same problem exists for the constant expression case, but I plan to
address that separately, and more comprehensively.)
2024-06-14 16:38:57 +02:00
Nikita Popov
1ceede3318 [AMDGPULowerBufferFatPointers] Don't try to preserve flags for constant expressions
We expect all of these ConstantExpr ctors to fold away, don't try
to preserve flags, especially as the flags are not correct.
2024-06-14 12:26:29 +02:00
Nikita Popov
cb3a6bded7 [AMDGPULowerBufferFatPointers] Restore zero offset special case
OffAccum will never be nullptr now, instead check for a zero
constant.
2024-06-12 10:30:23 +02:00
Nikita Popov
6fc63ab77d
[AMDGPULowerBufferFatPointers] Simplify and fix GEP offset emission (#95115)
Use emitGEPOffset() to emit the GEP offset, which already has all the
necessary logic.

This also fixes the nuw flag incorrectly being set on the offset
calculation, while only nsw is implied by inbounds.
2024-06-12 09:51:18 +02:00
Nikita Popov
8cdecd4d3a
[IR] Add getelementptr nusw and nuw flags (#90824)
This implements the `nusw` and `nuw` flags for `getelementptr` as
proposed at
https://discourse.llvm.org/t/rfc-add-nusw-and-nuw-flags-for-getelementptr/78672.

The three possible flags are encapsulated in the new `GEPNoWrapFlags`
class. Currently this class has a ctor from bool, interpreted as the
InBounds flag. This ctor should be removed in the future, as code gets
migrated to handle all flags.

There are a few places annotated with `TODO(gep_nowrap)`, where I've had
to touch code but opted to not infer or precisely preserve the new
flags, so as to keep this as NFC as possible and make sure any changes
of that kind get test coverage when they are made.
2024-05-27 16:05:17 +02:00