llvm-project

Author	SHA1	Message	Date
Sirish Pande	abec9ff47d	[AMDGPU] Correctly merge noalias scopes during lowering of LDS data. (#131664 ) Currently, if there is already noalias metadata present on loads and stores, lower module lds pass is generating a more conservative aliasing set. This results in inhibiting scheduling intrinsics that would have otherwise generated a better pipelined instruction. The fix is not to always intersect already existing noalias metadata with noalias created for lowering of LDS. But to intersect only if noalias scopes are from the same domain, otherwise concatenate exising noalias sets with LDS noalias. There a few patches that have come for scopedAA in the past. Following three should be enough background information. https://reviews.llvm.org/D91576 https://reviews.llvm.org/D108315 https://reviews.llvm.org/D110049 Essentially, after a pass that might change aliasing info, one should check if that pass results in change number of MayAlias or ModRef using the following: `opt -S -aa-pipeline=basic-aa,scoped-noalias-aa -passes=aa-eval -evaluate-aa-metadata -print-all-alias-modref-info -disable-output`	2025-04-28 14:02:18 -05:00
Rahul Joshi	a3754ade63	[NFC][LLVM][AMDGPU] Cleanup pass initialization for AMDGPU (#134410 ) - Remove calls to pass initialization from pass constructors. - https://github.com/llvm/llvm-project/issues/111767	2025-04-07 17:27:50 -07:00
Rahul Joshi	74b7abf154	[IRBuilder] Add new overload for CreateIntrinsic (#131942 ) Add a new `CreateIntrinsic` overload with no `Types`, useful for creating calls to non-overloaded intrinsics that don't need additional mangling.	2025-03-31 08:10:34 -07:00
Kazu Hirata	ccf5d624f9	[AMDGPU] Fix a warning This patch fixes: llvm/lib/Target/AMDGPU/AMDGPULowerModuleLDSPass.cpp:1031:17: error: unused variable 'F' [-Werror,-Wunused-variable]	2024-11-06 12:08:27 -08:00
Gang Chen	8c752900dd	[AMDGPU] modify named barrier builtins and intrinsics (#114550 ) Use a local pointer type to represent the named barrier in builtin and intrinsic. This makes the definitions more user friendly bacause they do not need to worry about the hardware ID assignment. Also this approach is more like the other popular GPU programming language. Named barriers should be represented as global variables of addrspace(3) in LLVM-IR. Compiler assigns the special LDS offsets for those variables during AMDGPULowerModuleLDS pass. Those addresses are converted to hw barrier ID during instruction selection. The rest of the instruction-selection changes are primarily due to the intrinsic-definition changes.	2024-11-06 10:37:22 -08:00
Jay Foad	85c17e4092	[LLVM] Make more use of IRBuilder::CreateIntrinsic. NFC. (#112706 ) Convert many instances of: Fn = Intrinsic::getOrInsertDeclaration(...); CreateCall(Fn, ...) to the equivalent CreateIntrinsic call.	2024-10-17 16:20:43 +01:00
Rahul Joshi	fa789dffb1	[NFC] Rename `Intrinsic::getDeclaration` to `getOrInsertDeclaration` (#111752 ) Rename the function to reflect its correct behavior and to be consistent with `Module::getOrInsertFunction`. This is also in preparation of adding a new `Intrinsic::getDeclaration` that will have behavior similar to `Module::getFunction` (i.e, just lookup, no creation).	2024-10-11 05:26:03 -07:00
Jay Foad	8d13e7b8c3	[AMDGPU] Qualify auto. NFC. (#110878 ) Generated automatically with: $ clang-tidy -fix -checks=-*,llvm-qualified-auto $(find lib/Target/AMDGPU/ -type f)	2024-10-03 13:07:54 +01:00
Juan Manuel Martinez Caamaño	2d7339ad24	[AMDGPU][LDS] Fix dynamic LDS interaction with "amdgpu-no-lds-kernel-id" (#107092 ) Dynamic lds and Table lds both use the amdgpu_lds_kernel_id intrinsic. Kernels and functons that make an indirect use of this should not have the "amdgpu-no-lds-kernel-id" attribute. For the later, this was done. For the dynamic lds case, this was missing. This patch fixes it.	2024-09-04 16:41:43 +02:00
Jon Chesterfield	1bde8e0b80	[AMDGPU] Don't realign already allocated LDS. Point fix for 106412 (#106421 ) Fixes 106412. The logic that skips the pass on already-lowered variables doesn't cover the path that increases alignment of variables. If a variable is allocated at 24 and then given 16 byte alignment, the backend notices and fatal-errors on the inconsistency.	2024-08-28 18:30:48 +01:00
Jay Foad	55d744eea3	[AMDGPU] Move AMDGPUMemoryUtils out of Utils. NFC. (#104930 ) It is only used by CodeGen so does not need to be shared with the assembler/disassembler.	2024-08-20 16:15:46 +01:00
Jay Foad	c7309dadbf	[AMDGPU] Use range-based for loops. NFC. (#99047 )	2024-07-17 10:18:03 +01:00
Akshat Oke	fb2b5cd1ad	[NFC] Fix typos (#98454 ) Co-authored-by: Akshat Oke <Akshat.Oke@amd.com>	2024-07-16 11:03:42 +05:30
Stephen Tozer	d75f9dd1d2	Revert "[IR][NFC] Update IRBuilder to use InsertPosition (#96497 )" Reverts the above commit, as it updates a common header function and did not update all callsites: https://lab.llvm.org/buildbot/#/builders/29/builds/382 This reverts commit 6481dc57612671ebe77fe9c34214fba94e1b3b27.	2024-06-24 18:00:22 +01:00
Stephen Tozer	6481dc5761	[IR][NFC] Update IRBuilder to use InsertPosition (#96497 ) Uses the new InsertPosition class (added in #94226) to simplify some of the IRBuilder interface, and removes the need to pass a BasicBlock alongside a BasicBlock::iterator, using the fact that we can now get the parent basic block from the iterator even if it points to the sentinel. This patch removes the BasicBlock argument from each constructor or call to setInsertPoint. This has no functional effect, but later on as we look to remove the `Instruction *InsertBefore` argument from instruction-creation (discussed [here](https://discourse.llvm.org/t/psa-instruction-constructors-changing-to-iterator-only-insertion/77845)), this will simplify the process by allowing us to deprecate the InsertPosition constructor directly and catch all the cases where we use instructions rather than iterators.	2024-06-24 17:27:43 +01:00
Chaitanya	7573d5e4b1	[AMDGPU] Update removeFnAttrFromReachable to accept array of Fn Attrs. (#94188 ) This PR updates removeFnAttrFromReachable in AMDGPUMemoryUtils to accept array of function attributes as argument. Helps to remove multiple attributes in one CallGraph walk.	2024-06-06 21:20:29 +05:30
Chaitanya	ebbbc73667	[AMDGPU] Use removeFnAttrFromReachable in lower-module-lds pass. (#92686 )	2024-05-20 10:24:40 +05:30
Chaitanya	2c5f470da6	[AMDGPU] Move LDS utilities from amdgpu-lower-module-lds pass to AMDGPUMemoryUtils (#88002 ) This moves some of the utility methods from amdgpu-lower-module-lds pass to AMDGPUMemoryUtils.	2024-05-10 10:49:48 +05:30
mmoadeli	1c63a3e0cd	Resolve static analyser report on pointer dereferencing after null check (#88278 ) - Resolve Static Analyzer Check Failure: Pointer Dereferencing After Null Check. - Minor naming and style improvement	2024-04-15 18:05:40 +02:00
Pierre van Houtryve	ccb3a8feaa	[AMDGPU][LowerModuleLDS] Refactor partially lowered module detection (#85793 ) Refactor the logic that checks if a module contains mixed absolute/non-lowered LDS GVs. The check now happens latter when the "worklists" are formed. This is because in some cases (OpenMP) we can have non-lowered GVs in a lowered module, and this is normal because those GVs are just unused and removed from the list at some point before the end of `getUsesOfLDSByFunction`. Doing the check later ensures that if a mixed module is spotted, then it's a _real_ mixed module that needs rejection, not a module containing an intentionally ignored GV.	2024-03-21 11:28:35 +01:00
Pierre van Houtryve	d4569d42b5	[AMDGPU] Let LowerModuleLDS run twice on the same module (#81729 ) If all variables in the module are absolute, this means we're running the pass again on an already lowered module, and that works. If none of them are absolute, lowering can proceed as usual. Only diagnose cases where we have a mix of absolute/non-absolute GVs, which means we added LDS GVs after lowering, which is broken. See #81491 Split from #75333	2024-03-11 09:20:01 +01:00
Matt Arsenault	888a20c466	AMDGPU: Drop amdgpu-no-lds-kernel-id attribute in LDS lowering (#71481 ) This is in preparation for moving the run of AMDGPUAttributor earlier. Currently it infers the lack of the corresponding intrinsic calls, so if we introduce new ones we need to remove the attribute from any possible transitive callers. This is more conservative than necessary, we could try to identify specific subgraphs where LDS globals are not used. Other options include teaching the attributor to avoid adding it in cases where the lowering may choose the table, but this seems more complex. Alternatively could add a second run which doesn't seem worth it. Depends #71349	2024-01-10 00:12:40 +07:00
Kazu Hirata	3406a2bc5f	[llvm] Stop including tuple (NFC) Identified with clangd.	2023-12-03 23:01:26 -08:00
Kazu Hirata	84a48ee9fb	[llvm] Stop including llvm/ADT/SetVector.h (NFC) Identified with clangd.	2023-11-10 23:50:23 -08:00
Paulo Matos	7b9d73c2f9	[NFC] Remove Type::getInt8PtrTy (#71029 ) Replace this with PointerType::getUnqual(). Followup to the opaque pointer transition. Fixes an in-code TODO item.	2023-11-07 17:26:26 +01:00
Jeremy Morse	e54277fa10	[NFC][RemoveDIs] Use iterators over inst-pointers when using IRBuilder This patch adds a two-argument SetInsertPoint method to IRBuilder that takes a block/iterator instead of an instruction, and updates many call sites to use it. The motivating reason for doing this is given here [0], we'd like to pass around more information about the position of debug-info in the iterator object. That necessitates passing iterators around most of the time. [0] https://discourse.llvm.org/t/rfc-instruction-api-changes-needed-to-eliminate-debug-intrinsics-from-ir/68939 Differential Revision: https://reviews.llvm.org/D152468	2023-09-11 20:01:19 +01:00
Matt Arsenault	f7dcabe502	AMDGPU: Pass in TargetMachine to AMDGPULowerModuleLDSPass https://reviews.llvm.org/D157660	2023-09-02 12:02:36 -04:00
Matt Arsenault	1f52060000	AMDGPU: Use poison instead of undef in module lds pass	2023-09-02 11:33:26 -04:00
Juan Manuel MARTINEZ CAAMAÑO	4e43ba2599	[NFC][AMDGPULowerModuleLDSPass] Use shorter APIs in markUsedByKernel * Use shorter versions of the LLVM API Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D155589	2023-07-19 09:54:53 +02:00
Juan Manuel MARTINEZ CAAMAÑO	fcbafc066c	[NFC][AMDGPULowerModuleLDSPass] Cleanup of getTableLookupKernelIndex * Do a single lookup when querying the map * Use shorter versions of the LLVM API Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D155588	2023-07-19 09:54:53 +02:00
Jon Chesterfield	6043d4dfec	[amdgpu] Accept an optional max to amdgpu-lds-size attribute for use in PromoteAlloca	2023-07-15 21:37:21 +01:00
Jon Chesterfield	d3316bc111	[amdgpu] Delete elide-module-lds attribute Requires D155190 Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D155238	2023-07-14 00:36:33 +01:00
Jon Chesterfield	74e928a081	[amdgpu][lds] Remove recalculation of LDS frame from backend Do the LDS frame calculation once, in the IR pass, instead of repeating the work in the backend. Prior to this patch: The IR lowering pass sets up a per-kernel LDS frame and annotates the variables with absolute_symbol metadata so that the assembler can build lookup tables out of it. There is a fragile association between kernel functions and named structs which is used to recompute the frame layout in the backend, with fatal_errors catching inconsistencies in the second calculation. After this patch: The IR lowering pass additionally sets a frame size attribute on kernels. The backend uses the same absolute_symbol metadata that the assembler uses to place objects within that frame size. Deleted the now dead allocation code from the backend. Left for a later cleanup: - enabling lowering for anonymous functions - removing the elide-module-lds attribute (test churn, it's not used by llc any more) - adjusting the dynamic alignment check to not use symbol names Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D155190	2023-07-13 23:54:38 +01:00
Jon Chesterfield	9418c40af7	[amdgpu][lds] Raise an explicit unimplemented error on absolute address LDS variables These aren't implemented. They could be at moderate implementation complexity. Raising an error is better than silently miscompiling. Patching now because the patch at D155125 is a step towards using this metadata more extensively as part of the lowering path and that will interact badly with input variables with this annotation. Lowering user defined variables at specific addresses would drop this error, put them at the requested position in the frame during this pass, and then use the same codegen that will be used for the kernel specific struct shortly. Reviewed By: jmmartinez Differential Revision: https://reviews.llvm.org/D155132	2023-07-13 11:32:03 +01:00
Juan Manuel MARTINEZ CAAMAÑO	367b1f28db	[NFC][AMDGPULowerModuleLDSPass] Fix buildbot santizier failed to compile It seems that the sanitizer-x86_64-linux-android wasn't able to deduce the template argument: AMDGPULowerModuleLDSPass.cpp:1192:53: error: no viable constructor or deduction guide for deduction of template arguments of 'vector' auto TableLookupVariablesOrdered = sortByName(std::vector( This patch makes the template argument explicit.	2023-07-12 11:08:16 +02:00
Juan Manuel MARTINEZ CAAMAÑO	3a75551e85	Reland "[NFC][AMDGPULowerModuleLDSPass] Factorize repetead sort code" Fixed compilation error and reudndant copy warning Differential Revision: https://reviews.llvm.org/D154977	2023-07-12 09:27:20 +02:00
Jon Chesterfield	e75ce77cd7	[amdgpu][lds] Fix missing markUsedByKernel calls and undef lookup table elements More robust association between the kernels and lds struct. Use poison instead of value() for lookup table elements introduced by dynamic lds lowering. Extracted from D154946, new test from there verbatim. Segv fixed. Fixes issues/63338 Fixes SWDEV-404491 Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D154972	2023-07-12 00:37:21 +01:00
Juan Manuel MARTINEZ CAAMAÑO	ebdd610ad4	Revert "[NFC][AMDGPULowerModuleLDSPass] Factorize repetead sort code" This reverts commit 125b90749a98d6dc6b492883c9617f9e91ab60e0.	2023-07-11 17:08:59 +02:00
Juan Manuel MARTINEZ CAAMAÑO	125b90749a	[NFC][AMDGPULowerModuleLDSPass] Factorize repetead sort code Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D154970	2023-07-11 17:03:58 +02:00
Juan Manuel MARTINEZ CAAMAÑO	70bb5d2b9d	[NFC][AMDGPULowerModuleLDSPass] Add const to some variables/parameters Moving out some changes not related to the bugfix in https://reviews.llvm.org/D154946 Reviewed By: JonChesterfield, arsenm Differential Revision: https://reviews.llvm.org/D154959	2023-07-11 15:51:57 +02:00
Juan Manuel MARTINEZ CAAMAÑO	abf081975e	[NFC][AMDGPULowerModuleLDSPass] Remove dead variable	2023-07-11 12:35:21 +02:00
Jon Chesterfield	e17c1bb494	[amdgpu][nfc] Update comments on LDS lowering	2023-04-11 10:48:19 +01:00
Jon Chesterfield	0507448d82	[amdgpu] Implement dynamic LDS accesses from non-kernel functions The premise here is to allow non-kernel functions to locate external LDS variables without using LDS or extra magic SGPRs to do so. 1/ First it crawls the callgraph to work out which external LDS variables are reachable from a given kernel 2/ Then it creates a new `extern char[0]` variable for each kernel, which will alias all the other extern LDS variables because that's the documented behaviour of these variables 3/ The address of that variable is written to a lookup table. The global variable is tagged with metadata to track what address it was allocated at by codegen 4/ The assembler builds the lookup table using the metadata 5/ Any non-kernel functions use the same magic intrinsic used by table lookups of non-dynamic LDS variables to find the address to use Heavy overlap with the code paths taken for other lowering, in particular the same intrinsic is used to pass the dynamic scope information through the same sgpr as for table lookups of static LDS. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D144233	2023-04-04 20:06:34 +01:00
Jon Chesterfield	62951784f0	[amdgpu][nfc] Refactor prior to D144233 to remove noise from diff	2023-04-03 16:47:01 +01:00
Jon Chesterfield	78e6818049	[amdgpu][nfc] clang-format AMDGPULowerModuleLDS for easier merging	2023-03-22 01:49:53 +00:00
Jon Chesterfield	d70e7ea0d1	[amdgpu][nfc] Extract more functions in LowerModuleLDS, mark more methods static	2023-03-22 01:25:28 +00:00
Jon Chesterfield	e8ad2a051c	[amdgpu][nfc] Comment and extract two functions in LowerModuleLDS	2023-03-21 23:39:20 +00:00
Jon Chesterfield	d3dda422bf	[amdgpu][nfc] Replace ad hoc LDS frame recalculation with absolute_symbol MD Post ISel, LDS variables are absolute values. Representing them as such is simpler than the frame recalculation currently used to build assembler tables from their addresses. This is a precursor to lowering dynamic/external LDS accesses from non-kernel functions. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D144221	2023-03-12 13:47:48 +00:00
Nikita Popov	576060fb41	[ReplaceConstant] Extract code for expanding users of constant (NFC) AMDGPU implements some handy code for expanding all constexpr users of LDS globals. Extract the core logic into ReplaceConstant, so that it can be reused elsewhere.	2023-03-03 16:09:06 +01:00
Jon Chesterfield	bf579a7049	[amdgpu] Change LDS lowering default to hybrid Postponed from D139433 until the bug fixed by D139874 could be resolved. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D141852	2023-02-24 15:20:12 +00:00

1 2 3

102 Commits