117 Commits

Author SHA1 Message Date
Nikita Popov
5933294bb1
[AMDGPULowerLDS] Avoid unnecessary ptrtoint/inttoptr roundtrip (#181671)
Store pointers instead of integers in the table, and load them as
pointers.
2026-02-17 09:28:01 +01:00
Nikita Popov
a17ffaf9d0 [AMDGPU] Avoid unnecessary zero-index GEPs
These are no-ops.
2026-02-13 15:10:42 +01:00
Jameson Nash
d10b2b566a
[NFCI] replace getValueType with new getGlobalSize query (#177186)
Returns uint64_t to simplify callers. The goal is eventually replace
getValueType with this query, which should return the known minimum
reference-able size, as provided (instead of a Type) during create.
Additionally the common isSized query would be replaced with an
isExactKnownSize query to test if that size is an exact definition.
2026-01-22 13:55:53 -05:00
Jay Foad
1c3b10f2e2
[AMDGPU] Remove isKernelLDS, add isKernel(const Function &). NFC. (#167300)
Since #142598 isKernelLDS has been a pointless wrapper around isKernel.
2025-11-25 15:43:18 +00:00
Chaitanya
49d5bb0ad0
[AMDGPU] Add amdgpu-lower-exec-sync pass to lower named-barrier globals (#165692)
This PR introduces `amdgpu-lower-exec-sync` pass which specifically
lowers named-barrier LDS globals introduced by #114550 .

Changes include:

- Moving the logic of lowering named-barrier LDS globals from
`amdgpu-lower-module-lds` pass to this new pass.

- This PR adds the pass to pipeline, remove the existing lowering logic for
named-barrier LDS in `amdgpu-lower-module-lds`

See #161827 for discussion on this topic.
2025-11-17 10:08:40 +05:30
Juan Manuel Martinez Caamaño
35530f4b65
[NFC][AMDGPU] Replace size & set_is_subset by operator== (#161813) 2025-10-03 12:44:51 +02:00
Jon Chesterfield
0ebd433402
[AMDGPU] Be less optimistic when allocating module scope lds (#161464)
Make the test for when additional variables can be added to the struct
allocated at address zero more stringent. Previously, variables can be
added to it (for faster access) even when that increases the lds
requested by a kernel. This corrects that oversight.

Test case diff shows the change from all variables being allocated into
the module lds to only some being, in particular the introduction of
uses of the offset table and that some kernels now use less lds than
before.

Alternative to PR 160181
2025-10-02 16:15:48 -04:00
Ivan Kosarev
faca8c9ed4
[AMDGPU][NFC] Only include CodeGenPassBuilder.h where needed. (#154769)
Saves around 125-210 MB of compilation memory usage per source for
roughly one third of our backend sources, ~60 MB on average.
2025-08-22 10:05:06 +01:00
Gang Chen
575fad2892
[AMDGPU] Upstream the Support for array of named barriers (#154604) 2025-08-20 14:53:03 -07:00
Austin
c7bacc9f26
[llvm] using wrapper llvm::sort(nfc) (#151000)
using wrapper llvm::sort(nfc)
2025-08-04 09:27:01 +08:00
Matt Arsenault
f466131055
AMDGPU: Use reportFatalUsageError in AMDGPULowerModuleLDS (#145130) 2025-06-21 12:18:25 +09:00
Kazu Hirata
1e8e662174
[AMDGPU] Remove unused includes (NFC) (#141376)
These are identified by misc-include-cleaner.  I've filtered out those
that break builds.  Also, I'm staying away from llvm-config.h,
config.h, and Compiler.h, which likely cause platform- or
compiler-specific build failures.
2025-05-24 14:48:46 -07:00
Shilei Tian
f2165b9d58 Revert "[AMDGPU] Add flag to prevent reruns of LowerModuleLDS (#129520)"
This reverts commit aa9f8596b01fef013ab62c20e61fc96d165f60f7 because it made
some assumptions that may not be valid.
2025-05-17 21:41:59 -04:00
Pierre van Houtryve
aa9f8596b0
[AMDGPU] Add flag to prevent reruns of LowerModuleLDS (#129520)
FullLTO has to run this early before module splitting occurs otherwise
module splitting won't work as expected. There was a targeted fix for
fortran on another branch that disables the LTO run but that'd break
full LTO module splitting entirely.

Test changes are due to metadata indexes shifting.

See #122891
2025-05-15 09:54:21 +02:00
Kazu Hirata
2d287f51ef
[llvm] Use *(Set|Map)::contains (NFC) (#138431) 2025-05-03 21:55:36 -07:00
Sirish Pande
abec9ff47d
[AMDGPU] Correctly merge noalias scopes during lowering of LDS data. (#131664)
Currently, if there is already noalias metadata present on loads and
stores, lower module lds pass is generating a more conservative aliasing
set. This results in inhibiting scheduling intrinsics that would have
otherwise generated a better pipelined instruction.

The fix is not to always intersect already existing noalias metadata
with noalias created for lowering of LDS. But to intersect only if
noalias scopes are from the same domain, otherwise concatenate exising
noalias sets with LDS noalias.

There a few patches that have come for scopedAA in the past. Following
three should be enough background information.
https://reviews.llvm.org/D91576
https://reviews.llvm.org/D108315
https://reviews.llvm.org/D110049

Essentially, after a pass that might change aliasing info, one should
check if that pass results in change number of MayAlias or ModRef using
the following:
`opt -S -aa-pipeline=basic-aa,scoped-noalias-aa -passes=aa-eval
-evaluate-aa-metadata -print-all-alias-modref-info -disable-output`
2025-04-28 14:02:18 -05:00
Rahul Joshi
a3754ade63
[NFC][LLVM][AMDGPU] Cleanup pass initialization for AMDGPU (#134410)
- Remove calls to pass initialization from pass constructors.
- https://github.com/llvm/llvm-project/issues/111767
2025-04-07 17:27:50 -07:00
Rahul Joshi
74b7abf154
[IRBuilder] Add new overload for CreateIntrinsic (#131942)
Add a new `CreateIntrinsic` overload with no `Types`, useful for
creating calls to non-overloaded intrinsics that don't need additional
mangling.
2025-03-31 08:10:34 -07:00
Kazu Hirata
ccf5d624f9 [AMDGPU] Fix a warning
This patch fixes:

  llvm/lib/Target/AMDGPU/AMDGPULowerModuleLDSPass.cpp:1031:17: error:
  unused variable 'F' [-Werror,-Wunused-variable]
2024-11-06 12:08:27 -08:00
Gang Chen
8c752900dd
[AMDGPU] modify named barrier builtins and intrinsics (#114550)
Use a local pointer type to represent the named barrier in builtin and
intrinsic. This makes the definitions more user friendly
bacause they do not need to worry about the hardware ID assignment. Also
this approach is more like the other popular GPU programming language.
Named barriers should be represented as global variables of addrspace(3)
in LLVM-IR. Compiler assigns the special LDS offsets for those variables
during AMDGPULowerModuleLDS pass. Those addresses are converted to hw
barrier ID during instruction selection. The rest of the
instruction-selection changes are primarily due to the
intrinsic-definition changes.
2024-11-06 10:37:22 -08:00
Jay Foad
85c17e4092
[LLVM] Make more use of IRBuilder::CreateIntrinsic. NFC. (#112706)
Convert many instances of:
  Fn = Intrinsic::getOrInsertDeclaration(...);
  CreateCall(Fn, ...)
to the equivalent CreateIntrinsic call.
2024-10-17 16:20:43 +01:00
Rahul Joshi
fa789dffb1
[NFC] Rename Intrinsic::getDeclaration to getOrInsertDeclaration (#111752)
Rename the function to reflect its correct behavior and to be consistent
with `Module::getOrInsertFunction`. This is also in preparation of
adding a new `Intrinsic::getDeclaration` that will have behavior similar
to `Module::getFunction` (i.e, just lookup, no creation).
2024-10-11 05:26:03 -07:00
Jay Foad
8d13e7b8c3
[AMDGPU] Qualify auto. NFC. (#110878)
Generated automatically with:
$ clang-tidy -fix -checks=-*,llvm-qualified-auto $(find
lib/Target/AMDGPU/ -type f)
2024-10-03 13:07:54 +01:00
Juan Manuel Martinez Caamaño
2d7339ad24
[AMDGPU][LDS] Fix dynamic LDS interaction with "amdgpu-no-lds-kernel-id" (#107092)
Dynamic lds and Table lds both use the amdgpu_lds_kernel_id intrinsic.
Kernels and functons that make an indirect use of this should not have
the
"amdgpu-no-lds-kernel-id" attribute.

For the later, this was done. For the dynamic lds case, this was
missing. This patch fixes it.
2024-09-04 16:41:43 +02:00
Jon Chesterfield
1bde8e0b80
[AMDGPU] Don't realign already allocated LDS. Point fix for 106412 (#106421)
Fixes 106412. The logic that skips the pass on already-lowered variables
doesn't cover the path that increases alignment of variables. If a
variable is allocated at 24 and then given 16 byte alignment, the
backend notices and fatal-errors on the inconsistency.
2024-08-28 18:30:48 +01:00
Jay Foad
55d744eea3
[AMDGPU] Move AMDGPUMemoryUtils out of Utils. NFC. (#104930)
It is only used by CodeGen so does not need to be shared with the
assembler/disassembler.
2024-08-20 16:15:46 +01:00
Jay Foad
c7309dadbf
[AMDGPU] Use range-based for loops. NFC. (#99047) 2024-07-17 10:18:03 +01:00
Akshat Oke
fb2b5cd1ad
[NFC] Fix typos (#98454)
Co-authored-by: Akshat Oke <Akshat.Oke@amd.com>
2024-07-16 11:03:42 +05:30
Stephen Tozer
d75f9dd1d2 Revert "[IR][NFC] Update IRBuilder to use InsertPosition (#96497)"
Reverts the above commit, as it updates a common header function and
did not update all callsites:

  https://lab.llvm.org/buildbot/#/builders/29/builds/382

This reverts commit 6481dc57612671ebe77fe9c34214fba94e1b3b27.
2024-06-24 18:00:22 +01:00
Stephen Tozer
6481dc5761
[IR][NFC] Update IRBuilder to use InsertPosition (#96497)
Uses the new InsertPosition class (added in #94226) to simplify some of
the IRBuilder interface, and removes the need to pass a BasicBlock
alongside a BasicBlock::iterator, using the fact that we can now get the
parent basic block from the iterator even if it points to the sentinel.
This patch removes the BasicBlock argument from each constructor or call
to setInsertPoint.

This has no functional effect, but later on as we look to remove the
`Instruction *InsertBefore` argument from instruction-creation
(discussed
[here](https://discourse.llvm.org/t/psa-instruction-constructors-changing-to-iterator-only-insertion/77845)),
this will simplify the process by allowing us to deprecate the
InsertPosition constructor directly and catch all the cases where we use
instructions rather than iterators.
2024-06-24 17:27:43 +01:00
Chaitanya
7573d5e4b1
[AMDGPU] Update removeFnAttrFromReachable to accept array of Fn Attrs. (#94188)
This PR updates removeFnAttrFromReachable in AMDGPUMemoryUtils to accept
array of function attributes as argument.
Helps to remove multiple attributes in one CallGraph walk.
2024-06-06 21:20:29 +05:30
Chaitanya
ebbbc73667
[AMDGPU] Use removeFnAttrFromReachable in lower-module-lds pass. (#92686) 2024-05-20 10:24:40 +05:30
Chaitanya
2c5f470da6
[AMDGPU] Move LDS utilities from amdgpu-lower-module-lds pass to AMDGPUMemoryUtils (#88002)
This moves some of the utility methods from amdgpu-lower-module-lds pass to AMDGPUMemoryUtils.
2024-05-10 10:49:48 +05:30
mmoadeli
1c63a3e0cd
Resolve static analyser report on pointer dereferencing after null check (#88278)
- Resolve Static Analyzer Check Failure: Pointer Dereferencing After
Null Check.
- Minor naming and style improvement
2024-04-15 18:05:40 +02:00
Pierre van Houtryve
ccb3a8feaa
[AMDGPU][LowerModuleLDS] Refactor partially lowered module detection (#85793)
Refactor the logic that checks if a module contains mixed
absolute/non-lowered LDS GVs.

The check now happens latter when the "worklists" are formed. This is
because in some cases (OpenMP) we can have non-lowered GVs in a lowered
module, and this is normal because those GVs are just unused and removed
from the list at some point before the end of `getUsesOfLDSByFunction`.

Doing the check later ensures that if a mixed module is spotted, then
it's a _real_ mixed module that needs rejection, not a module containing
an intentionally ignored GV.
2024-03-21 11:28:35 +01:00
Pierre van Houtryve
d4569d42b5
[AMDGPU] Let LowerModuleLDS run twice on the same module (#81729)
If all variables in the module are absolute, this means we're running
the pass again on an already lowered module, and that works.
If none of them are absolute, lowering can proceed as usual.
Only diagnose cases where we have a mix of absolute/non-absolute GVs,
which means we added LDS GVs after lowering, which is broken.

See #81491
Split from #75333
2024-03-11 09:20:01 +01:00
Matt Arsenault
888a20c466
AMDGPU: Drop amdgpu-no-lds-kernel-id attribute in LDS lowering (#71481)
This is in preparation for moving the run of AMDGPUAttributor earlier.
Currently it infers the lack of the corresponding intrinsic calls,
so if we introduce new ones we need to remove the attribute from any
possible transitive callers. This is more conservative than necessary,
we could try to identify specific subgraphs where LDS globals are not
used.

Other options include teaching the attributor to avoid adding it in
cases
where the lowering may choose the table, but this seems more complex.
Alternatively could add a second run which doesn't seem worth it.

Depends #71349
2024-01-10 00:12:40 +07:00
Kazu Hirata
3406a2bc5f [llvm] Stop including tuple (NFC)
Identified with clangd.
2023-12-03 23:01:26 -08:00
Kazu Hirata
84a48ee9fb [llvm] Stop including llvm/ADT/SetVector.h (NFC)
Identified with clangd.
2023-11-10 23:50:23 -08:00
Paulo Matos
7b9d73c2f9
[NFC] Remove Type::getInt8PtrTy (#71029)
Replace this with PointerType::getUnqual().
Followup to the opaque pointer transition. Fixes an in-code TODO item.
2023-11-07 17:26:26 +01:00
Jeremy Morse
e54277fa10 [NFC][RemoveDIs] Use iterators over inst-pointers when using IRBuilder
This patch adds a two-argument SetInsertPoint method to IRBuilder that
takes a block/iterator instead of an instruction, and updates many call
sites to use it. The motivating reason for doing this is given here [0],
we'd like to pass around more information about the position of debug-info
in the iterator object. That necessitates passing iterators around most of
the time.

[0] https://discourse.llvm.org/t/rfc-instruction-api-changes-needed-to-eliminate-debug-intrinsics-from-ir/68939

Differential Revision: https://reviews.llvm.org/D152468
2023-09-11 20:01:19 +01:00
Matt Arsenault
f7dcabe502 AMDGPU: Pass in TargetMachine to AMDGPULowerModuleLDSPass
https://reviews.llvm.org/D157660
2023-09-02 12:02:36 -04:00
Matt Arsenault
1f52060000 AMDGPU: Use poison instead of undef in module lds pass 2023-09-02 11:33:26 -04:00
Juan Manuel MARTINEZ CAAMAÑO
4e43ba2599 [NFC][AMDGPULowerModuleLDSPass] Use shorter APIs in markUsedByKernel
* Use shorter versions of the LLVM API

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D155589
2023-07-19 09:54:53 +02:00
Juan Manuel MARTINEZ CAAMAÑO
fcbafc066c [NFC][AMDGPULowerModuleLDSPass] Cleanup of getTableLookupKernelIndex
* Do a single lookup when querying the map
* Use shorter versions of the LLVM API

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D155588
2023-07-19 09:54:53 +02:00
Jon Chesterfield
6043d4dfec [amdgpu] Accept an optional max to amdgpu-lds-size attribute for use in PromoteAlloca 2023-07-15 21:37:21 +01:00
Jon Chesterfield
d3316bc111 [amdgpu] Delete elide-module-lds attribute
Requires D155190

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D155238
2023-07-14 00:36:33 +01:00
Jon Chesterfield
74e928a081 [amdgpu][lds] Remove recalculation of LDS frame from backend
Do the LDS frame calculation once, in the IR pass, instead of repeating the work in the backend.

Prior to this patch:
The IR lowering pass sets up a per-kernel LDS frame and annotates the variables with absolute_symbol
metadata so that the assembler can build lookup tables out of it. There is a fragile association between
kernel functions and named structs which is used to recompute the frame layout in the backend, with
fatal_errors catching inconsistencies in the second calculation.

After this patch:
The IR lowering pass additionally sets a frame size attribute on kernels. The backend uses the same
absolute_symbol metadata that the assembler uses to place objects within that frame size.

Deleted the now dead allocation code from the backend. Left for a later cleanup:
- enabling lowering for anonymous functions
- removing the elide-module-lds attribute (test churn, it's not used by llc any more)
- adjusting the dynamic alignment check to not use symbol names

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D155190
2023-07-13 23:54:38 +01:00
Jon Chesterfield
9418c40af7 [amdgpu][lds] Raise an explicit unimplemented error on absolute address LDS variables
These aren't implemented. They could be at moderate implementation
complexity. Raising an error is better than silently miscompiling.

Patching now because the patch at D155125 is a step towards using this metadata
more extensively as part of the lowering path and that will interact badly with
input variables with this annotation.

Lowering user defined variables at specific addresses would drop this error,
put them at the requested position in the frame during this pass, and then
use the same codegen that will be used for the kernel specific struct shortly.

Reviewed By: jmmartinez

Differential Revision: https://reviews.llvm.org/D155132
2023-07-13 11:32:03 +01:00
Juan Manuel MARTINEZ CAAMAÑO
367b1f28db [NFC][AMDGPULowerModuleLDSPass] Fix buildbot santizier failed to compile
It seems that the sanitizer-x86_64-linux-android wasn't able to deduce
the template argument:

  AMDGPULowerModuleLDSPass.cpp:1192:53: error: no viable constructor or
  deduction guide for deduction of template arguments of 'vector'
        auto TableLookupVariablesOrdered = sortByName(std::vector(

This patch makes the template argument explicit.
2023-07-12 11:08:16 +02:00