1010 Commits

Author SHA1 Message Date
Marcos Maronas
ce94d63f0f
Make OpenCL an OSType rather than an EnvironmentType. (#170297)
OpenCL was added as an `EnvironmentType` in
https://github.com/llvm/llvm-project/pull/78655, but there is no
explanation as to why it was added as such, even after explicitly asking
in the PR
(https://github.com/llvm/llvm-project/pull/78655#issuecomment-2743162853).
This PR makes it an `OSType` instead, which feels more natural, and
updates tests accordingly.

---------

Co-authored-by: Marcos Maronas <marcos.maronas@intel.com>
2026-02-10 18:45:50 +00:00
Mirko Brkušanin
4280f0d241
[AMDGPU] Add dot4 fp8/bf8 instructions for gfx1170 (#180516) 2026-02-10 12:14:49 +01:00
Mirko Brkušanin
45b037cf7a
[AMDGPU] Add fp8/bf8 conversion instructions for gfx1170 (#180191) 2026-02-09 13:56:43 +01:00
Pierre van Houtryve
b79ba02479
[AMDGPU][GFX12.5] Reimplement monitor load as an atomic operation (#177343)
Load monitor operations make more sense as atomic operations, as
non-atomic operations cannot be used for inter-thread communication w/o
additional synchronization.
The previous built-in made it work because one could just override the
CPol bits, but that bypasses the memory model and forces the user to learn
about ISA bits encoding.

Making load monitor an atomic operation has a couple of advantages.
First, the memory model foundation for it is stronger. We just lean on the
existing rules for atomic operations. Second, the CPol bits are abstracted away
from the user, which avoids leaking ISA details into the API.

This patch also adds supporting memory model and intrinsics
documentation to AMDGPUUsage.

Solves SWDEV-516398.
2026-02-09 09:57:27 +01:00
paperchalice
5c5677d7b8
[llvm] Remove "no-infs-fp-math" attribute support (#180083)
One of global options in `TargetMachine::resetTargetOptions`, now all
backends no longer support it, remove it.
2026-02-09 08:43:33 +08:00
Mirko Brkušanin
20b5849e17
[AMDGPU] Define new target gfx1170 (#180185) 2026-02-06 14:38:50 +01:00
Matt Arsenault
2502e3b7ba
IR: Promote "denormal-fp-math" to a first class attribute (#174293)
Convert "denormal-fp-math" and "denormal-fp-math-f32" into a first
class denormal_fpenv attribute. Previously the query for the effective
denormal mode involved two string attribute queries with parsing. I'm
introducing more uses of this, so it makes sense to convert this
to a more efficient encoding. The old representation was also awkward
since it was split across two separate attributes. The new encoding
just stores the default and float modes as bitfields, largely avoiding
the need to consider if the other mode is set.

The syntax in the common cases looks like this:
  `denormal_fpenv(preservesign,preservesign)`
  `denormal_fpenv(float: preservesign,preservesign)`
  `denormal_fpenv(dynamic,dynamic float: preservesign,preservesign)`

I wasn't sure about reusing the float type name instead of adding a
new keyword. It's parsed as a type but only accepts float. I'm also
debating switching the name to subnormal to match the current
preferred IEEE terminology (also used by nofpclass and other
contexts).

This has a behavior change when using the command flag debug
options to set the denormal mode. The behavior of the flag
ignored functions with an explicit attribute set, per
the default and f32 version. Now that these are one attribute,
the flag logic can't distinguish which of the two components
were explicitly set on the function. Only one test appeared to
rely on this behavior, so I just avoided using the flags in it.

This also does not perform all the code cleanups this enables.
In particular the attributor handling could be cleaned up.

I also guessed at how to support this in MLIR. I followed
MemoryEffects as a reference; it appears bitfields are expanded
into arguments to attributes, so the representation there is
a bit uglier with the 2 2-element fields flattened into 4 arguments.
2026-02-05 13:31:26 +00:00
Wenju He
8ab29461c3
[OpenCL] Set half-precision Div and Sqrt accuracy (#179621)
OpenCL spec relaxed half-precision divide to 1 ULP and sqrt to 1.5 ULP
in https://github.com/KhronosGroup/OpenCL-Docs/pull/1293
https://github.com/KhronosGroup/OpenCL-Docs/pull/1386
This can enable target to use hardware rcp instruction for half.
2026-02-05 09:32:56 +08:00
Jameson Nash
0dd21ad1c6
[clang] remove addrspace cast from CreateIRTemp (#179327)
This just added unnecessary work to the IR, since they are only used for
load and store, which just causes some IR noise. Tests updated by UTC
script to remove the extra lines.
2026-02-04 13:09:32 -05:00
Aaditya
f190477718
[AMDGPU] Add builtins for wave reduction intrinsics (#170813) 2026-01-30 18:15:06 +05:30
Wenju He
c03d0fe672
[OpenCL] Add clang internal extension __cl_clang_function_scope_local_variables (#176726)
OpenCL spec restricts that variable in local address space can only be
declared at kernel function scope.
Add a Clang internal extension __cl_clang_function_scope_local_variables
to lift the restriction.

To expose static local allocations at kernel scope, targets can either
force-inline non-kernel functions that declare local memory or pass a
kernel-allocated local buffer to those functions via an implicit argument.

Motivation: support local memory allocation in libclc's implementation
of work-group collective built-ins, see example at:
https://github.com/intel/llvm/blob/41455e305117/libclc/libspirv/lib/amdgcn-amdhsa/group/collectives_helpers.ll
https://github.com/intel/llvm/blob/41455e305117/libclc/libspirv/lib/amdgcn-amdhsa/group/collectives.cl#L182

Right now this is a Clang-only OpenCL extension intended for compiling
OpenCL libraries with Clang. It could be proposed as a standard OpenCL
extension in the future.
2026-01-26 08:13:22 +08:00
Shilei Tian
f3a674a2ef
[RFC][Clang][AMDGPU] Emit only delta target-features to reduce IR bloat (#176533)
Currently, AMDGPU functions have `target-features` attribute populated
with all default features for the target GPU. This is redundant because
the backend can derive these defaults from the `target-cpu` attribute
via `AMDGPUTargetMachine::getFeatureString()`.

In this PR, for AMDGPU targets only:

- Functions without explicit target attributes no longer emit
`target-features`
- Functions with `__attribute__((target(...)))` or `-target-feature`
emit only features that differ from the target's defaults (delta)

The backend already handles missing `target-features` correctly by
falling back to the TargetMachine's defaults.

A new cc1 flag `-famdgpu-emit-full-target-features` is added to emit
full features when needed.

Example:

Before:

```llvm
attributes #0 = { "target-cpu"="gfx90a" "target-features"="+16-bit-insts,+atomic-buffer-global-pk-add-f16-insts,+atomic-fadd-rtn-insts,+ci-insts,+dl-insts,+dot1-insts,+dot2-insts,..." }
```

After (default):

```llvm
attributes #0 = { "target-cpu"="gfx90a" }
```

After (with explicit `+wavefrontsize32` override):

```llvm
attributes #0 = { "target-cpu"="gfx90a" "target-features"="+wavefrontsize32" }
```
2026-01-20 14:49:35 -05:00
Shilei Tian
4efbe98659
[Clang][AMDGPU] Add a Sema check for the imm argument of __builtin_amdgcn_s_setreg (#176838)
Our backend cannot select the corresponding intrinsic if the imm
argument is not a `int16_t` or `uint16_t`, which is not really helpful.
2026-01-20 11:48:52 -05:00
Shilei Tian
39bd4562ba
[Clang][AMDGPU] Handle wavefrontsize32 and wavefrontsize64 features more robustly (#176599)
We should not allow `-wavefrontsize32` and `-wavefrontsize64` to be
specified at the same time. We should also not allow `-wavefrontsize32`
on a target that only supports `wavefrontsize32`, and the vice versa.
2026-01-19 18:16:29 -05:00
Shoreshen
26624d51d1
[AMDGPU]Add specific instruction feature for multicast load (#175503) 2026-01-13 09:10:09 +08:00
Shilei Tian
5a63367b15
Reapply "[AMDGPU] Rework the clamp support for WMMA instructions" (#174674) (#174697)
This reverts commit 0b2f3cfb72a76fa90f3ec2a234caabe0d0712590.
2026-01-07 06:12:19 +00:00
dyung
0b2f3cfb72
Revert "[AMDGPU] Rework the clamp support for WMMA instructions" (#174674)
Reverts llvm/llvm-project#174310

This change is causing 2 cross-project-test failures on
https://lab.llvm.org/buildbot/#/builders/174/builds/29695
2026-01-07 01:18:23 +00:00
Shilei Tian
ccca3b8c67
[AMDGPU] Rework the clamp support for WMMA instructions (#174310)
Fixes #166989.
2026-01-06 15:46:40 -05:00
Shilei Tian
ef55a0be4e [NFC] Update clang/test/CodeGenOpenCL/builtins-amdgcn-gfx1250-wmma-w32.cl 2026-01-06 13:06:57 -05:00
Wenju He
1f14ed948d
[Clang] Honor '#pragma STDC FENV_ROUND' in __builtin_store_half/halff (#173821)
Before this change, constrained fptrunc for __builtin_store_half/halff
always used round.tonearest, ignoring the active pragma STDC FENV_ROUND.
This PR guards builtin emission with CGFPOptionsRAII so the current
rounding mode is propagated to the generated constrained intrinsic.
2026-01-04 17:25:22 +08:00
Shilei Tian
c97de4387b
Revert "[AMDGPU] add clamp immediate operand to WMMA iu8 intrinsic (#171069)" (#174303)
This reverts commit 2c376ffeca490a5732e4fd6e98e5351fcf6d692a because it
breaks assembler.

```
$ llvm-mc -triple=amdgcn -mcpu=gfx1250 -show-encoding <<< "v_wmma_i32_16x16x64_iu8 v[16:23], v[0:7], v[8:15], v[16:23] matrix_b_reuse"
  v_wmma_i32_16x16x64_iu8 v[16:23], v[0:7], v[8:15], v[16:23] clamp ; encoding: [0x10,0x80,0x72,0xcc,0x00,0x11,0x42,0x1c]
```

We have a fundamental issue in the clamp support in VOP3P instructions,
which will need more changes.
2026-01-04 02:13:21 +00:00
Krzysztof Drewniak
20ef8b0285
[AMDGPU] Add nocreateundeforpoison annotations (#166450)
This commit goes through IntrinsicsAMDGPU.td and adds
`nocreateundeforpoison` to intrinsics that (to my knowledge) perform
arithmetic operations that are defined everywhere (so no bitfield
extracts and such since those can have invalid inputs, and similarly for permutations).
2026-01-02 10:12:58 -08:00
Juan Manuel Martinez Caamaño
f04dc3b5d4
[Clang] Remove 't' from __builtin_amdgcn_flat_atomic_fmin/fmax_f64 (#173839)
Allows for type checking depending on the built-in signature.

There is no `f32` version for both builtins
2025-12-30 09:14:53 +01:00
Muhammad Abdul
2c376ffeca
[AMDGPU] add clamp immediate operand to WMMA iu8 intrinsic (#171069)
Fixes #166989 

- Adds a clamp immediate operand to the AMDGPU WMMA iu8 intrinsic and
threads it through LLVM IR, MIR lowering, Clang builtins/tests, and MLIR
ROCDL dialect so all layers agree on the new operand
- Updates AMDGPUWmmaIntrinsicModsAB so the clamp attribute is emitted,
teaches VOP3P encoding to accept the immediate, and adjusts Clang
codegen/builtin headers plus MLIR op definitions and tests to match
- Documents what the WMMA clamp operand do
- Implement bitcode AutoUpgrade for source compatibility on WMMA IU8
Intrinsic op

Possible future enhancements:
- infer clamping as an optimization fold based on the use context

---------

Co-authored-by: Matt Arsenault <arsenm2@gmail.com>
2025-12-27 12:51:29 -05:00
Juan Manuel Martinez Caamaño
42f741c98e
[Clang] Remove 't' from __builtin_amdgcn_global_atomic_fadd_f32/f64 (#173480)
Allows for type checking depending on the built-in signature.
2025-12-26 09:16:26 +01:00
Juan Manuel Martinez Caamaño
fcd9235d86
[Clang] Remove 't' from __builtin_amdgcn_flat_atomic_fadd_f32/f64 (#173381)
Allows for type checking depending on the built-in signature.

This introduces some subtle changes in code generation: before, since
the signature was meaningless, we would accept any pointer type without
casting. After this change, the pointer of the `atomicrmw` matches the
flat address space.
2025-12-24 12:07:14 +01:00
Mirko Brkušanin
5759a3a779
[AMDGPU] Add s_wakeup_barrier instruction for gfx1250 (#170501) 2025-12-10 09:45:13 +01:00
Hongyu Chen
4b0a975939
[OpenCL][NVPTX] Don't set calling convention for OpenCL kernel (#170170)
Fixes #154772
We previously set `ptx_kernel` for all kernels. But it's incorrect to
add `ptx_kernel` to the stub version of kernel introduced in #115821.
This patch copies the workaround of AMDGPU.
2025-12-03 16:53:25 +08:00
Matt Arsenault
9fd288e886
clang/AMDGPU: Enable opencl 2.0 features for unknown target (#170308)
Assume amdhsa triples support flat addressing, which matches
the backend logic for the default target. This fixes the
rocm device-libs build.
2025-12-02 19:11:30 -05:00
Aaditya
4604762cc3
[AMDGPU] Add builtins for wave reduction intrinsics (#161816) 2025-11-24 15:13:11 +05:30
Wenju He
c4254cd9bb
[Clang] Support __bf16 type for SPIR/SPIR-V (#169012)
SPIR/SPIR-V are generic targets. Assume they support __bf16.
2025-11-24 10:10:11 +08:00
Shoreshen
52a58a4193
[AMDGPU] Adding instruction specific features (#167809) 2025-11-19 11:06:00 +08:00
CarolineConcatto
200793ac21
Extend MemoryEffects to Support Target-Specific Memory Locations (#148650)
This patch introduces preliminary support for additional memory
locations.
They are: target_mem0 and target_mem1 and they model memory locations
that cannot be represented with existing memory locations.

It was a solution suggested in :
https://discourse.llvm.org/t/rfc-improving-fpmr-handling-for-fp8-intrinsics-in-llvm/86868/6

Currently, these locations are not yet target-specific. The goal is to
enable the compiler to express read/write effects on these resources.
2025-11-18 11:10:58 +00:00
Jay Foad
f037f41350
[IR] Add new function attribute nocreateundeforpoison (#164809)
Also add a corresponding intrinsic property that can be used to mark
intrinsics that do not introduce poison, for example simple arithmetic
intrinsics that propagate poison just like a simple arithmetic
instruction.

As a smoke test this patch adds the new property to
llvm.amdgcn.fmul.legacy.
2025-11-04 12:00:44 +00:00
Wenju He
efb84586da
[clang][SPIR][SPIRV] Don't generate constant NULL from addrspacecast generic NULL (#165353)
Fix a regression caused by 1ffff05a38c9.
OpenCL/SPIRV generic address space doesn't cover constant address space.

---------

Co-authored-by: Alexey Bader <alexey.bader@intel.com>
2025-10-31 15:35:41 +08:00
Fabian Ritter
ea034477fd
Reapply "[HIP][Clang] Remove __AMDGCN_WAVEFRONT_SIZE macros" (#164217)
This reverts commit 78bf682cb9033cf6a5bbc733e062c7b7d825fdaf.

Original PR: #157463
Revert PR: #158566

The relevant buildbots have been updated to a ROCm version that does not
use the macros anymore to avoid the failures.

Implements SWDEV-522062.
2025-10-30 13:42:32 +01:00
Florian Hahn
53785846aa
[Clang] Freeze padded vectors before storing. (#164821)
Currently Clang usually leaves padding bits uninitialized, which means
they are undef at the moment.

When expanding stores of vector types to include padding, the padding
lanes will be poison, hence the padding bits will be poison.

This interacts badly with coercion of arguments and return values, where
3 x float vectors will be loaded as i128 integer; poisoning the padding
bits will make the whole value poison.

Not sure if there's a better way, but I think we have a number of places
that currently rely on the padding being undef, not poison.

PR: https://github.com/llvm/llvm-project/pull/164821
2025-10-28 19:03:17 -07:00
Stanislav Mekhanoshin
9b5bc98743
[AMDGPU] Add intrinsics for v_[pk]_add_{min|max}_* instructions (#164731) 2025-10-22 17:46:33 -07:00
macurtis-amd
5440cfc450
[clang] Add support for cluster sync scope (#162575)
From Sam Liu:
>CUDA supports thread block clusters
https://docs.nvidia.com/cuda/cuda-c-programming-guide/#thread-block-clusters
>
>In their atomic intrinsics, cluster scope is supported
https://docs.nvidia.com/cuda/cuda-c-programming-guide/#nv-atomic-fetch-add-and-nv-atomic-add
>
>For compatibility, clang and hip needs to support cluster scope.
2025-10-21 05:47:26 -05:00
Antonio Frighetto
efcda54794
[clang][CodeGen] Emit llvm.tbaa.errno metadata during module creation
Let Clang emit `llvm.tbaa.errno` metadata in order to let LLVM
carry out optimizations around errno-writing libcalls to, as
long as it is proved the involved memory location does not alias
`errno`.

Previous discussion: https://discourse.llvm.org/t/rfc-modelling-errno-memory-effects/82972.
2025-10-21 11:38:45 +02:00
Shilei Tian
c683f215e5
[NFC][Clang][AMDGPU] Fix upstream and downstream difference (#164304)
These two files were left during the upstream of the corresponding
feature.
2025-10-20 15:47:46 -04:00
Matt Arsenault
853760bca6
AMDGPU: Use ELF mangling in data layout (#163011)
Closes #95219
2025-10-13 03:01:45 +00:00
paperchalice
2aeefcf40f
[clang][CodeGen] Remove "unsafe-fp-math" attribute support (#162779)
These global flags block furthur improvements for clang, users should
always use fast-math flags
see also
https://discourse.llvm.org/t/rfc-honor-pragmas-with-ffp-contract-fast/80797
Remove them incrementally, this is the clang part.
2025-10-10 15:56:29 +08:00
Wenju He
1ffff05a38
[clang][SPIR][SPIRV] Materialize non-generic null pointers via addrspacecast (#161773)
LLVM models ConstantPointerNull as all-zero, but some GPUs (e.g. AMDGPU
and our downstream GPU target) use a non-zero sentinel for null in
private / local address spaces.
SPIR-V is a supported input for our GPU target. This PR preserves a
canonical zero form in the generic AS while allowing later lowering to
substitute the target’s real sentinel.
2025-10-09 09:10:24 +08:00
Shilei Tian
9e8dda1034
[NFC] Change spelling of cluster feature to "clusters" (#162103) 2025-10-06 15:55:39 +00:00
Shilei Tian
bea0225c30
[AMDGPU] Make cluster a target feature (#162040)
This replaces the original arch check.
2025-10-06 05:05:53 +00:00
Alex Voicu
d481e5f9b7
[AMDGPU][SPIRV] Use SPIR-V syncscopes for some AMDGCN BIs (#154867)
AMDGCN flavoured SPIR-V allows AMDGCN specific builtins, including those
for scoped fences and some specific RMWs. However, at present we don't
map syncscopes to their SPIR-V equivalents, but rather use the AMDGCN
ones. This ends up pessimising the resulting code as system scope is
used instead of device (agent) or subgroup (wavefront), so we correct
the behaviour, to ensure that we do the right thing during reverse
translation.
2025-09-29 22:50:15 +01:00
Shilei Tian
2195fe7e01
[AMDGPU] Add the support for 45-bit buffer resource (#159702)
On new targets like `gfx1250`, the buffer resource (V#) now uses this
format:

```
base (57-bit): resource[56:0]
num_records (45-bit): resource[101:57]
reserved (6-bit): resource[107:102]
stride (14-bit): resource[121:108]
```

This PR changes the type of `num_records` from `i32` to `i64` in both
builtin and intrinsic, and also adds the support for lowering the new
format.

Fixes SWDEV-554034.

---------

Co-authored-by: Krzysztof Drewniak <Krzysztof.Drewniak@amd.com>
2025-09-24 11:12:02 -04:00
Stanislav Mekhanoshin
221f8eef9d
[AMDGPU] Add gfx1251 runlines to cooperative atomcis tests. NFC (#159437) 2025-09-17 14:08:05 -07:00
Stanislav Mekhanoshin
e556dc0b23
[AMDGPU] Add gfx1251 subtarget (#159430) 2025-09-17 13:02:02 -07:00