18 Commits

Author SHA1 Message Date
Jameson Nash
d10b2b566a
[NFCI] replace getValueType with new getGlobalSize query (#177186)
Returns uint64_t to simplify callers. The goal is eventually replace
getValueType with this query, which should return the known minimum
reference-able size, as provided (instead of a Type) during create.
Additionally the common isSized query would be replaced with an
isExactKnownSize query to test if that size is an exact definition.
2026-01-22 13:55:53 -05:00
Mirko Brkušanin
5759a3a779
[AMDGPU] Add s_wakeup_barrier instruction for gfx1250 (#170501) 2025-12-10 09:45:13 +01:00
Jay Foad
1c3b10f2e2
[AMDGPU] Remove isKernelLDS, add isKernel(const Function &). NFC. (#167300)
Since #142598 isKernelLDS has been a pointless wrapper around isKernel.
2025-11-25 15:43:18 +00:00
Chaitanya
49d5bb0ad0
[AMDGPU] Add amdgpu-lower-exec-sync pass to lower named-barrier globals (#165692)
This PR introduces `amdgpu-lower-exec-sync` pass which specifically
lowers named-barrier LDS globals introduced by #114550 .

Changes include:

- Moving the logic of lowering named-barrier LDS globals from
`amdgpu-lower-module-lds` pass to this new pass.

- This PR adds the pass to pipeline, remove the existing lowering logic for
named-barrier LDS in `amdgpu-lower-module-lds`

See #161827 for discussion on this topic.
2025-11-17 10:08:40 +05:30
Stanislav Mekhanoshin
4ab8dabc25
[AMDGPU] Add s_cluster_barrier on gfx1250 (#159175) 2025-09-16 14:49:48 -07:00
Kazu Hirata
11b4f110e0
[llvm] Remove unused includes of SmallSet.h (NFC) (#154893)
We just replaced SmallSet<T *, N> with SmallPtrSet<T *, N>, bypassing
the redirection found in SmallSet.h.  With that, we no longer need to
include SmallSet.h in many files.
2025-08-22 10:33:46 -07:00
Gang Chen
575fad2892
[AMDGPU] Upstream the Support for array of named barriers (#154604) 2025-08-20 14:53:03 -07:00
Kazu Hirata
07eb7b7692
[llvm] Replace SmallSet with SmallPtrSet (NFC) (#154068)
This patch replaces SmallSet<T *, N> with SmallPtrSet<T *, N>.  Note
that SmallSet.h "redirects" SmallSet to SmallPtrSet for pointer
element types:

  template <typename PointeeType, unsigned N>
class SmallSet<PointeeType*, N> : public SmallPtrSet<PointeeType*, N>
{};

We only have 140 instances that rely on this "redirection", with the
vast majority of them under llvm/. Since relying on the redirection
doesn't improve readability, this patch replaces SmallSet with
SmallPtrSet for pointer element types.
2025-08-18 07:01:29 -07:00
Stanislav Mekhanoshin
d0ee82040c
[AMDGPU] Add s_barrier_init|join|leave instructions (#153296) 2025-08-12 15:07:07 -07:00
Matt Arsenault
a961ba88e1
AMDGPU: Use reportFatalUsageError for LDS mixed absolute addresses (#145135) 2025-06-21 23:54:27 +09:00
Diana Picus
40a7dce9ef
[AMDGPU] Remove duplicated/confusing helpers. NFCI (#142598)
Move canGuaranteeTCO and mayTailCallThisCC into AMDGPUBaseInfo instead
of keeping two copies for DAG/Global ISel.

Also remove isKernelCC, which doesn't agree with isKernel and doesn't
seem very useful.

While at it, also move all the CC-related helpers into AMDGPUBaseInfo.h and
mark them constexpr.
2025-06-05 11:19:20 +02:00
Mariusz Sikora
cd3acd1bff
[AMDGPU] Remove unused s_barrier_{init,join,leave} instructions (#129548) 2025-03-04 17:52:43 +01:00
Jeffrey Byrnes
c5a4512d85
[AMDGPU] iglp.opt does not clobber memory operands (#126976)
I think it was an accident that this wasn't included.
2025-02-12 14:11:02 -08:00
Kareem Ergawy
ff55c9bc63
[llvm][amdgpu] Handle indirect refs to LDS GVs during LDS lowering (#124089)
Fixes #123800

Extends LDS lowering by allowing it to discover transitive
indirect/escpaing references to LDS GVs.

For example, given the following input:
```llvm
@lds_item_to_indirectly_load = internal addrspace(3) global ptr undef, align 8

%store_type = type { i32, ptr }
@place_to_store_indirect_caller = internal addrspace(3) global %store_type undef, align 8

define amdgpu_kernel void @offloading_kernel() {
  store ptr @indirectly_load_lds, ptr addrspace(3) getelementptr inbounds nuw (i8, ptr addrspace(3) @place_to_store_indirect_caller, i32 0), align 8
  call void @call_unknown()
  ret void
}

define void @call_unknown() {
  %1 = alloca ptr, align 8
  %2 = call i32 %1()
  ret void
}

define void @indirectly_load_lds() {
  call void @directly_load_lds()
  ret void
}

define void @directly_load_lds() {
  %2 = load ptr, ptr addrspace(3) @lds_item_to_indirectly_load, align 8
  ret void
}

```

With the above input, prior to this patch, LDS lowering failed to lower
the reference to `@lds_item_to_indirectly_load` because:
1. it is indirectly called by a function whose address is taken in the
kernel.
2. we did not check if the kernel indirectly makes any calls to unknown
functions (we only checked the direct calls).

Co-authored-by: Jon Chesterfield <jonathan.chesterfield@amd.com>
2025-01-23 14:53:11 +01:00
Mirko Brkušanin
3def49cb64
[AMDGPU] Remove s_wakeup_barrier instruction (#122277) 2025-01-10 11:30:22 +01:00
Kazu Hirata
be187369a0
[AMDGPU] Remove unused includes (NFC) (#116154)
Identified with misc-include-cleaner.
2024-11-13 21:10:03 -08:00
Gang Chen
8c752900dd
[AMDGPU] modify named barrier builtins and intrinsics (#114550)
Use a local pointer type to represent the named barrier in builtin and
intrinsic. This makes the definitions more user friendly
bacause they do not need to worry about the hardware ID assignment. Also
this approach is more like the other popular GPU programming language.
Named barriers should be represented as global variables of addrspace(3)
in LLVM-IR. Compiler assigns the special LDS offsets for those variables
during AMDGPULowerModuleLDS pass. Those addresses are converted to hw
barrier ID during instruction selection. The rest of the
instruction-selection changes are primarily due to the
intrinsic-definition changes.
2024-11-06 10:37:22 -08:00
Jay Foad
55d744eea3
[AMDGPU] Move AMDGPUMemoryUtils out of Utils. NFC. (#104930)
It is only used by CodeGen so does not need to be shared with the
assembler/disassembler.
2024-08-20 16:15:46 +01:00