102 Commits

Author SHA1 Message Date
Sirish Pande
abec9ff47d
[AMDGPU] Correctly merge noalias scopes during lowering of LDS data. (#131664)
Currently, if there is already noalias metadata present on loads and
stores, lower module lds pass is generating a more conservative aliasing
set. This results in inhibiting scheduling intrinsics that would have
otherwise generated a better pipelined instruction.

The fix is not to always intersect already existing noalias metadata
with noalias created for lowering of LDS. But to intersect only if
noalias scopes are from the same domain, otherwise concatenate exising
noalias sets with LDS noalias.

There a few patches that have come for scopedAA in the past. Following
three should be enough background information.
https://reviews.llvm.org/D91576
https://reviews.llvm.org/D108315
https://reviews.llvm.org/D110049

Essentially, after a pass that might change aliasing info, one should
check if that pass results in change number of MayAlias or ModRef using
the following:
`opt -S -aa-pipeline=basic-aa,scoped-noalias-aa -passes=aa-eval
-evaluate-aa-metadata -print-all-alias-modref-info -disable-output`
2025-04-28 14:02:18 -05:00
Rahul Joshi
a3754ade63
[NFC][LLVM][AMDGPU] Cleanup pass initialization for AMDGPU (#134410)
- Remove calls to pass initialization from pass constructors.
- https://github.com/llvm/llvm-project/issues/111767
2025-04-07 17:27:50 -07:00
Rahul Joshi
74b7abf154
[IRBuilder] Add new overload for CreateIntrinsic (#131942)
Add a new `CreateIntrinsic` overload with no `Types`, useful for
creating calls to non-overloaded intrinsics that don't need additional
mangling.
2025-03-31 08:10:34 -07:00
Kazu Hirata
ccf5d624f9 [AMDGPU] Fix a warning
This patch fixes:

  llvm/lib/Target/AMDGPU/AMDGPULowerModuleLDSPass.cpp:1031:17: error:
  unused variable 'F' [-Werror,-Wunused-variable]
2024-11-06 12:08:27 -08:00
Gang Chen
8c752900dd
[AMDGPU] modify named barrier builtins and intrinsics (#114550)
Use a local pointer type to represent the named barrier in builtin and
intrinsic. This makes the definitions more user friendly
bacause they do not need to worry about the hardware ID assignment. Also
this approach is more like the other popular GPU programming language.
Named barriers should be represented as global variables of addrspace(3)
in LLVM-IR. Compiler assigns the special LDS offsets for those variables
during AMDGPULowerModuleLDS pass. Those addresses are converted to hw
barrier ID during instruction selection. The rest of the
instruction-selection changes are primarily due to the
intrinsic-definition changes.
2024-11-06 10:37:22 -08:00
Jay Foad
85c17e4092
[LLVM] Make more use of IRBuilder::CreateIntrinsic. NFC. (#112706)
Convert many instances of:
  Fn = Intrinsic::getOrInsertDeclaration(...);
  CreateCall(Fn, ...)
to the equivalent CreateIntrinsic call.
2024-10-17 16:20:43 +01:00
Rahul Joshi
fa789dffb1
[NFC] Rename Intrinsic::getDeclaration to getOrInsertDeclaration (#111752)
Rename the function to reflect its correct behavior and to be consistent
with `Module::getOrInsertFunction`. This is also in preparation of
adding a new `Intrinsic::getDeclaration` that will have behavior similar
to `Module::getFunction` (i.e, just lookup, no creation).
2024-10-11 05:26:03 -07:00
Jay Foad
8d13e7b8c3
[AMDGPU] Qualify auto. NFC. (#110878)
Generated automatically with:
$ clang-tidy -fix -checks=-*,llvm-qualified-auto $(find
lib/Target/AMDGPU/ -type f)
2024-10-03 13:07:54 +01:00
Juan Manuel Martinez Caamaño
2d7339ad24
[AMDGPU][LDS] Fix dynamic LDS interaction with "amdgpu-no-lds-kernel-id" (#107092)
Dynamic lds and Table lds both use the amdgpu_lds_kernel_id intrinsic.
Kernels and functons that make an indirect use of this should not have
the
"amdgpu-no-lds-kernel-id" attribute.

For the later, this was done. For the dynamic lds case, this was
missing. This patch fixes it.
2024-09-04 16:41:43 +02:00
Jon Chesterfield
1bde8e0b80
[AMDGPU] Don't realign already allocated LDS. Point fix for 106412 (#106421)
Fixes 106412. The logic that skips the pass on already-lowered variables
doesn't cover the path that increases alignment of variables. If a
variable is allocated at 24 and then given 16 byte alignment, the
backend notices and fatal-errors on the inconsistency.
2024-08-28 18:30:48 +01:00
Jay Foad
55d744eea3
[AMDGPU] Move AMDGPUMemoryUtils out of Utils. NFC. (#104930)
It is only used by CodeGen so does not need to be shared with the
assembler/disassembler.
2024-08-20 16:15:46 +01:00
Jay Foad
c7309dadbf
[AMDGPU] Use range-based for loops. NFC. (#99047) 2024-07-17 10:18:03 +01:00
Akshat Oke
fb2b5cd1ad
[NFC] Fix typos (#98454)
Co-authored-by: Akshat Oke <Akshat.Oke@amd.com>
2024-07-16 11:03:42 +05:30
Stephen Tozer
d75f9dd1d2 Revert "[IR][NFC] Update IRBuilder to use InsertPosition (#96497)"
Reverts the above commit, as it updates a common header function and
did not update all callsites:

  https://lab.llvm.org/buildbot/#/builders/29/builds/382

This reverts commit 6481dc57612671ebe77fe9c34214fba94e1b3b27.
2024-06-24 18:00:22 +01:00
Stephen Tozer
6481dc5761
[IR][NFC] Update IRBuilder to use InsertPosition (#96497)
Uses the new InsertPosition class (added in #94226) to simplify some of
the IRBuilder interface, and removes the need to pass a BasicBlock
alongside a BasicBlock::iterator, using the fact that we can now get the
parent basic block from the iterator even if it points to the sentinel.
This patch removes the BasicBlock argument from each constructor or call
to setInsertPoint.

This has no functional effect, but later on as we look to remove the
`Instruction *InsertBefore` argument from instruction-creation
(discussed
[here](https://discourse.llvm.org/t/psa-instruction-constructors-changing-to-iterator-only-insertion/77845)),
this will simplify the process by allowing us to deprecate the
InsertPosition constructor directly and catch all the cases where we use
instructions rather than iterators.
2024-06-24 17:27:43 +01:00
Chaitanya
7573d5e4b1
[AMDGPU] Update removeFnAttrFromReachable to accept array of Fn Attrs. (#94188)
This PR updates removeFnAttrFromReachable in AMDGPUMemoryUtils to accept
array of function attributes as argument.
Helps to remove multiple attributes in one CallGraph walk.
2024-06-06 21:20:29 +05:30
Chaitanya
ebbbc73667
[AMDGPU] Use removeFnAttrFromReachable in lower-module-lds pass. (#92686) 2024-05-20 10:24:40 +05:30
Chaitanya
2c5f470da6
[AMDGPU] Move LDS utilities from amdgpu-lower-module-lds pass to AMDGPUMemoryUtils (#88002)
This moves some of the utility methods from amdgpu-lower-module-lds pass to AMDGPUMemoryUtils.
2024-05-10 10:49:48 +05:30
mmoadeli
1c63a3e0cd
Resolve static analyser report on pointer dereferencing after null check (#88278)
- Resolve Static Analyzer Check Failure: Pointer Dereferencing After
Null Check.
- Minor naming and style improvement
2024-04-15 18:05:40 +02:00
Pierre van Houtryve
ccb3a8feaa
[AMDGPU][LowerModuleLDS] Refactor partially lowered module detection (#85793)
Refactor the logic that checks if a module contains mixed
absolute/non-lowered LDS GVs.

The check now happens latter when the "worklists" are formed. This is
because in some cases (OpenMP) we can have non-lowered GVs in a lowered
module, and this is normal because those GVs are just unused and removed
from the list at some point before the end of `getUsesOfLDSByFunction`.

Doing the check later ensures that if a mixed module is spotted, then
it's a _real_ mixed module that needs rejection, not a module containing
an intentionally ignored GV.
2024-03-21 11:28:35 +01:00
Pierre van Houtryve
d4569d42b5
[AMDGPU] Let LowerModuleLDS run twice on the same module (#81729)
If all variables in the module are absolute, this means we're running
the pass again on an already lowered module, and that works.
If none of them are absolute, lowering can proceed as usual.
Only diagnose cases where we have a mix of absolute/non-absolute GVs,
which means we added LDS GVs after lowering, which is broken.

See #81491
Split from #75333
2024-03-11 09:20:01 +01:00
Matt Arsenault
888a20c466
AMDGPU: Drop amdgpu-no-lds-kernel-id attribute in LDS lowering (#71481)
This is in preparation for moving the run of AMDGPUAttributor earlier.
Currently it infers the lack of the corresponding intrinsic calls,
so if we introduce new ones we need to remove the attribute from any
possible transitive callers. This is more conservative than necessary,
we could try to identify specific subgraphs where LDS globals are not
used.

Other options include teaching the attributor to avoid adding it in
cases
where the lowering may choose the table, but this seems more complex.
Alternatively could add a second run which doesn't seem worth it.

Depends #71349
2024-01-10 00:12:40 +07:00
Kazu Hirata
3406a2bc5f [llvm] Stop including tuple (NFC)
Identified with clangd.
2023-12-03 23:01:26 -08:00
Kazu Hirata
84a48ee9fb [llvm] Stop including llvm/ADT/SetVector.h (NFC)
Identified with clangd.
2023-11-10 23:50:23 -08:00
Paulo Matos
7b9d73c2f9
[NFC] Remove Type::getInt8PtrTy (#71029)
Replace this with PointerType::getUnqual().
Followup to the opaque pointer transition. Fixes an in-code TODO item.
2023-11-07 17:26:26 +01:00
Jeremy Morse
e54277fa10 [NFC][RemoveDIs] Use iterators over inst-pointers when using IRBuilder
This patch adds a two-argument SetInsertPoint method to IRBuilder that
takes a block/iterator instead of an instruction, and updates many call
sites to use it. The motivating reason for doing this is given here [0],
we'd like to pass around more information about the position of debug-info
in the iterator object. That necessitates passing iterators around most of
the time.

[0] https://discourse.llvm.org/t/rfc-instruction-api-changes-needed-to-eliminate-debug-intrinsics-from-ir/68939

Differential Revision: https://reviews.llvm.org/D152468
2023-09-11 20:01:19 +01:00
Matt Arsenault
f7dcabe502 AMDGPU: Pass in TargetMachine to AMDGPULowerModuleLDSPass
https://reviews.llvm.org/D157660
2023-09-02 12:02:36 -04:00
Matt Arsenault
1f52060000 AMDGPU: Use poison instead of undef in module lds pass 2023-09-02 11:33:26 -04:00
Juan Manuel MARTINEZ CAAMAÑO
4e43ba2599 [NFC][AMDGPULowerModuleLDSPass] Use shorter APIs in markUsedByKernel
* Use shorter versions of the LLVM API

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D155589
2023-07-19 09:54:53 +02:00
Juan Manuel MARTINEZ CAAMAÑO
fcbafc066c [NFC][AMDGPULowerModuleLDSPass] Cleanup of getTableLookupKernelIndex
* Do a single lookup when querying the map
* Use shorter versions of the LLVM API

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D155588
2023-07-19 09:54:53 +02:00
Jon Chesterfield
6043d4dfec [amdgpu] Accept an optional max to amdgpu-lds-size attribute for use in PromoteAlloca 2023-07-15 21:37:21 +01:00
Jon Chesterfield
d3316bc111 [amdgpu] Delete elide-module-lds attribute
Requires D155190

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D155238
2023-07-14 00:36:33 +01:00
Jon Chesterfield
74e928a081 [amdgpu][lds] Remove recalculation of LDS frame from backend
Do the LDS frame calculation once, in the IR pass, instead of repeating the work in the backend.

Prior to this patch:
The IR lowering pass sets up a per-kernel LDS frame and annotates the variables with absolute_symbol
metadata so that the assembler can build lookup tables out of it. There is a fragile association between
kernel functions and named structs which is used to recompute the frame layout in the backend, with
fatal_errors catching inconsistencies in the second calculation.

After this patch:
The IR lowering pass additionally sets a frame size attribute on kernels. The backend uses the same
absolute_symbol metadata that the assembler uses to place objects within that frame size.

Deleted the now dead allocation code from the backend. Left for a later cleanup:
- enabling lowering for anonymous functions
- removing the elide-module-lds attribute (test churn, it's not used by llc any more)
- adjusting the dynamic alignment check to not use symbol names

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D155190
2023-07-13 23:54:38 +01:00
Jon Chesterfield
9418c40af7 [amdgpu][lds] Raise an explicit unimplemented error on absolute address LDS variables
These aren't implemented. They could be at moderate implementation
complexity. Raising an error is better than silently miscompiling.

Patching now because the patch at D155125 is a step towards using this metadata
more extensively as part of the lowering path and that will interact badly with
input variables with this annotation.

Lowering user defined variables at specific addresses would drop this error,
put them at the requested position in the frame during this pass, and then
use the same codegen that will be used for the kernel specific struct shortly.

Reviewed By: jmmartinez

Differential Revision: https://reviews.llvm.org/D155132
2023-07-13 11:32:03 +01:00
Juan Manuel MARTINEZ CAAMAÑO
367b1f28db [NFC][AMDGPULowerModuleLDSPass] Fix buildbot santizier failed to compile
It seems that the sanitizer-x86_64-linux-android wasn't able to deduce
the template argument:

  AMDGPULowerModuleLDSPass.cpp:1192:53: error: no viable constructor or
  deduction guide for deduction of template arguments of 'vector'
        auto TableLookupVariablesOrdered = sortByName(std::vector(

This patch makes the template argument explicit.
2023-07-12 11:08:16 +02:00
Juan Manuel MARTINEZ CAAMAÑO
3a75551e85 Reland "[NFC][AMDGPULowerModuleLDSPass] Factorize repetead sort code"
Fixed compilation error and reudndant copy warning

Differential Revision: https://reviews.llvm.org/D154977
2023-07-12 09:27:20 +02:00
Jon Chesterfield
e75ce77cd7 [amdgpu][lds] Fix missing markUsedByKernel calls and undef lookup table elements
More robust association between the kernels and lds struct.

Use poison instead of value() for lookup table elements introduced by dynamic lds lowering.

Extracted from D154946, new test from there verbatim. Segv fixed.

Fixes issues/63338

Fixes SWDEV-404491

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D154972
2023-07-12 00:37:21 +01:00
Juan Manuel MARTINEZ CAAMAÑO
ebdd610ad4 Revert "[NFC][AMDGPULowerModuleLDSPass] Factorize repetead sort code"
This reverts commit 125b90749a98d6dc6b492883c9617f9e91ab60e0.
2023-07-11 17:08:59 +02:00
Juan Manuel MARTINEZ CAAMAÑO
125b90749a [NFC][AMDGPULowerModuleLDSPass] Factorize repetead sort code
Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D154970
2023-07-11 17:03:58 +02:00
Juan Manuel MARTINEZ CAAMAÑO
70bb5d2b9d [NFC][AMDGPULowerModuleLDSPass] Add const to some variables/parameters
Moving out some changes not related to the bugfix in https://reviews.llvm.org/D154946

Reviewed By: JonChesterfield, arsenm

Differential Revision: https://reviews.llvm.org/D154959
2023-07-11 15:51:57 +02:00
Juan Manuel MARTINEZ CAAMAÑO
abf081975e [NFC][AMDGPULowerModuleLDSPass] Remove dead variable 2023-07-11 12:35:21 +02:00
Jon Chesterfield
e17c1bb494 [amdgpu][nfc] Update comments on LDS lowering 2023-04-11 10:48:19 +01:00
Jon Chesterfield
0507448d82 [amdgpu] Implement dynamic LDS accesses from non-kernel functions
The premise here is to allow non-kernel functions to locate external LDS variables without using LDS or extra magic SGPRs to do so.

1/ First it crawls the callgraph to work out which external LDS variables are reachable from a given kernel
2/ Then it creates a new `extern char[0]` variable for each kernel, which will alias all the other extern LDS variables because that's the documented behaviour of these variables
3/ The address of that variable is written to a lookup table. The global variable is tagged with metadata to track what address it was allocated at by codegen
4/ The assembler builds the lookup table using the metadata
5/ Any non-kernel functions use the same magic intrinsic used by table lookups of non-dynamic LDS variables to find the address to use

Heavy overlap with the code paths taken for other lowering, in particular the same intrinsic is used to pass the dynamic scope information through the same sgpr as for table lookups of static LDS.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D144233
2023-04-04 20:06:34 +01:00
Jon Chesterfield
62951784f0 [amdgpu][nfc] Refactor prior to D144233 to remove noise from diff 2023-04-03 16:47:01 +01:00
Jon Chesterfield
78e6818049 [amdgpu][nfc] clang-format AMDGPULowerModuleLDS for easier merging 2023-03-22 01:49:53 +00:00
Jon Chesterfield
d70e7ea0d1 [amdgpu][nfc] Extract more functions in LowerModuleLDS, mark more methods static 2023-03-22 01:25:28 +00:00
Jon Chesterfield
e8ad2a051c [amdgpu][nfc] Comment and extract two functions in LowerModuleLDS 2023-03-21 23:39:20 +00:00
Jon Chesterfield
d3dda422bf [amdgpu][nfc] Replace ad hoc LDS frame recalculation with absolute_symbol MD
Post ISel, LDS variables are absolute values. Representing them as
such is simpler than the frame recalculation currently used to build assembler
tables from their addresses.

This is a precursor to lowering dynamic/external LDS accesses from non-kernel
functions.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D144221
2023-03-12 13:47:48 +00:00
Nikita Popov
576060fb41 [ReplaceConstant] Extract code for expanding users of constant (NFC)
AMDGPU implements some handy code for expanding all constexpr
users of LDS globals. Extract the core logic into ReplaceConstant,
so that it can be reused elsewhere.
2023-03-03 16:09:06 +01:00
Jon Chesterfield
bf579a7049 [amdgpu] Change LDS lowering default to hybrid
Postponed from D139433 until the bug fixed by D139874 could be resolved.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D141852
2023-02-24 15:20:12 +00:00