llvm-project

Author	SHA1	Message	Date
Marcos Maronas	ce94d63f0f	Make OpenCL an OSType rather than an EnvironmentType. (#170297 ) OpenCL was added as an `EnvironmentType` in https://github.com/llvm/llvm-project/pull/78655, but there is no explanation as to why it was added as such, even after explicitly asking in the PR (https://github.com/llvm/llvm-project/pull/78655#issuecomment-2743162853). This PR makes it an `OSType` instead, which feels more natural, and updates tests accordingly. --------- Co-authored-by: Marcos Maronas <marcos.maronas@intel.com>	2026-02-10 18:45:50 +00:00
Mirko Brkušanin	4280f0d241	[AMDGPU] Add dot4 fp8/bf8 instructions for gfx1170 (#180516 )	2026-02-10 12:14:49 +01:00
Mirko Brkušanin	45b037cf7a	[AMDGPU] Add fp8/bf8 conversion instructions for gfx1170 (#180191 )	2026-02-09 13:56:43 +01:00
Pierre van Houtryve	b79ba02479	[AMDGPU][GFX12.5] Reimplement monitor load as an atomic operation (#177343 ) Load monitor operations make more sense as atomic operations, as non-atomic operations cannot be used for inter-thread communication w/o additional synchronization. The previous built-in made it work because one could just override the CPol bits, but that bypasses the memory model and forces the user to learn about ISA bits encoding. Making load monitor an atomic operation has a couple of advantages. First, the memory model foundation for it is stronger. We just lean on the existing rules for atomic operations. Second, the CPol bits are abstracted away from the user, which avoids leaking ISA details into the API. This patch also adds supporting memory model and intrinsics documentation to AMDGPUUsage. Solves SWDEV-516398.	2026-02-09 09:57:27 +01:00
paperchalice	5c5677d7b8	[llvm] Remove "no-infs-fp-math" attribute support (#180083 ) One of global options in `TargetMachine::resetTargetOptions`, now all backends no longer support it, remove it.	2026-02-09 08:43:33 +08:00
Mirko Brkušanin	20b5849e17	[AMDGPU] Define new target gfx1170 (#180185 )	2026-02-06 14:38:50 +01:00
Matt Arsenault	2502e3b7ba	IR: Promote "denormal-fp-math" to a first class attribute (#174293 ) Convert "denormal-fp-math" and "denormal-fp-math-f32" into a first class denormal_fpenv attribute. Previously the query for the effective denormal mode involved two string attribute queries with parsing. I'm introducing more uses of this, so it makes sense to convert this to a more efficient encoding. The old representation was also awkward since it was split across two separate attributes. The new encoding just stores the default and float modes as bitfields, largely avoiding the need to consider if the other mode is set. The syntax in the common cases looks like this: `denormal_fpenv(preservesign,preservesign)` `denormal_fpenv(float: preservesign,preservesign)` `denormal_fpenv(dynamic,dynamic float: preservesign,preservesign)` I wasn't sure about reusing the float type name instead of adding a new keyword. It's parsed as a type but only accepts float. I'm also debating switching the name to subnormal to match the current preferred IEEE terminology (also used by nofpclass and other contexts). This has a behavior change when using the command flag debug options to set the denormal mode. The behavior of the flag ignored functions with an explicit attribute set, per the default and f32 version. Now that these are one attribute, the flag logic can't distinguish which of the two components were explicitly set on the function. Only one test appeared to rely on this behavior, so I just avoided using the flags in it. This also does not perform all the code cleanups this enables. In particular the attributor handling could be cleaned up. I also guessed at how to support this in MLIR. I followed MemoryEffects as a reference; it appears bitfields are expanded into arguments to attributes, so the representation there is a bit uglier with the 2 2-element fields flattened into 4 arguments.	2026-02-05 13:31:26 +00:00
Wenju He	8ab29461c3	[OpenCL] Set half-precision Div and Sqrt accuracy (#179621 ) OpenCL spec relaxed half-precision divide to 1 ULP and sqrt to 1.5 ULP in https://github.com/KhronosGroup/OpenCL-Docs/pull/1293 https://github.com/KhronosGroup/OpenCL-Docs/pull/1386 This can enable target to use hardware rcp instruction for half.	2026-02-05 09:32:56 +08:00
Jameson Nash	0dd21ad1c6	[clang] remove addrspace cast from CreateIRTemp (#179327 ) This just added unnecessary work to the IR, since they are only used for load and store, which just causes some IR noise. Tests updated by UTC script to remove the extra lines.	2026-02-04 13:09:32 -05:00
Aaditya	f190477718	[AMDGPU] Add builtins for wave reduction intrinsics (#170813 )	2026-01-30 18:15:06 +05:30
Wenju He	c03d0fe672	[OpenCL] Add clang internal extension __cl_clang_function_scope_local_variables (#176726 ) OpenCL spec restricts that variable in local address space can only be declared at kernel function scope. Add a Clang internal extension __cl_clang_function_scope_local_variables to lift the restriction. To expose static local allocations at kernel scope, targets can either force-inline non-kernel functions that declare local memory or pass a kernel-allocated local buffer to those functions via an implicit argument. Motivation: support local memory allocation in libclc's implementation of work-group collective built-ins, see example at: https://github.com/intel/llvm/blob/41455e305117/libclc/libspirv/lib/amdgcn-amdhsa/group/collectives_helpers.ll https://github.com/intel/llvm/blob/41455e305117/libclc/libspirv/lib/amdgcn-amdhsa/group/collectives.cl#L182 Right now this is a Clang-only OpenCL extension intended for compiling OpenCL libraries with Clang. It could be proposed as a standard OpenCL extension in the future.	2026-01-26 08:13:22 +08:00
Shilei Tian	f3a674a2ef	[RFC][Clang][AMDGPU] Emit only delta target-features to reduce IR bloat (#176533 ) Currently, AMDGPU functions have `target-features` attribute populated with all default features for the target GPU. This is redundant because the backend can derive these defaults from the `target-cpu` attribute via `AMDGPUTargetMachine::getFeatureString()`. In this PR, for AMDGPU targets only: - Functions without explicit target attributes no longer emit `target-features` - Functions with `__attribute__((target(...)))` or `-target-feature` emit only features that differ from the target's defaults (delta) The backend already handles missing `target-features` correctly by falling back to the TargetMachine's defaults. A new cc1 flag `-famdgpu-emit-full-target-features` is added to emit full features when needed. Example: Before: ```llvm attributes #0 = { "target-cpu"="gfx90a" "target-features"="+16-bit-insts,+atomic-buffer-global-pk-add-f16-insts,+atomic-fadd-rtn-insts,+ci-insts,+dl-insts,+dot1-insts,+dot2-insts,..." } ``` After (default): ```llvm attributes #0 = { "target-cpu"="gfx90a" } ``` After (with explicit `+wavefrontsize32` override): ```llvm attributes #0 = { "target-cpu"="gfx90a" "target-features"="+wavefrontsize32" } ```	2026-01-20 14:49:35 -05:00
Shilei Tian	4efbe98659	[Clang][AMDGPU] Add a Sema check for the imm argument of `__builtin_amdgcn_s_setreg` (#176838 ) Our backend cannot select the corresponding intrinsic if the imm argument is not a `int16_t` or `uint16_t`, which is not really helpful.	2026-01-20 11:48:52 -05:00
Shilei Tian	39bd4562ba	[Clang][AMDGPU] Handle `wavefrontsize32` and `wavefrontsize64` features more robustly (#176599 ) We should not allow `-wavefrontsize32` and `-wavefrontsize64` to be specified at the same time. We should also not allow `-wavefrontsize32` on a target that only supports `wavefrontsize32`, and the vice versa.	2026-01-19 18:16:29 -05:00
Shoreshen	26624d51d1	[AMDGPU]Add specific instruction feature for multicast load (#175503 )	2026-01-13 09:10:09 +08:00
Shilei Tian	5a63367b15	Reapply "[AMDGPU] Rework the clamp support for WMMA instructions" (#174674 ) (#174697 ) This reverts commit 0b2f3cfb72a76fa90f3ec2a234caabe0d0712590.	2026-01-07 06:12:19 +00:00
dyung	0b2f3cfb72	Revert "[AMDGPU] Rework the clamp support for WMMA instructions" (#174674 ) Reverts llvm/llvm-project#174310 This change is causing 2 cross-project-test failures on https://lab.llvm.org/buildbot/#/builders/174/builds/29695	2026-01-07 01:18:23 +00:00
Shilei Tian	ccca3b8c67	[AMDGPU] Rework the clamp support for WMMA instructions (#174310 ) Fixes #166989.	2026-01-06 15:46:40 -05:00
Shilei Tian	ef55a0be4e	[NFC] Update `clang/test/CodeGenOpenCL/builtins-amdgcn-gfx1250-wmma-w32.cl`	2026-01-06 13:06:57 -05:00
Wenju He	1f14ed948d	[Clang] Honor '#pragma STDC FENV_ROUND' in __builtin_store_half/halff (#173821 ) Before this change, constrained fptrunc for __builtin_store_half/halff always used round.tonearest, ignoring the active pragma STDC FENV_ROUND. This PR guards builtin emission with CGFPOptionsRAII so the current rounding mode is propagated to the generated constrained intrinsic.	2026-01-04 17:25:22 +08:00
Shilei Tian	c97de4387b	Revert "[AMDGPU] add clamp immediate operand to WMMA iu8 intrinsic (#171069 )" (#174303 ) This reverts commit 2c376ffeca490a5732e4fd6e98e5351fcf6d692a because it breaks assembler. ``` $ llvm-mc -triple=amdgcn -mcpu=gfx1250 -show-encoding <<< "v_wmma_i32_16x16x64_iu8 v[16:23], v[0:7], v[8:15], v[16:23] matrix_b_reuse" v_wmma_i32_16x16x64_iu8 v[16:23], v[0:7], v[8:15], v[16:23] clamp ; encoding: [0x10,0x80,0x72,0xcc,0x00,0x11,0x42,0x1c] ``` We have a fundamental issue in the clamp support in VOP3P instructions, which will need more changes.	2026-01-04 02:13:21 +00:00
Krzysztof Drewniak	20ef8b0285	[AMDGPU] Add `nocreateundeforpoison` annotations (#166450 ) This commit goes through IntrinsicsAMDGPU.td and adds `nocreateundeforpoison` to intrinsics that (to my knowledge) perform arithmetic operations that are defined everywhere (so no bitfield extracts and such since those can have invalid inputs, and similarly for permutations).	2026-01-02 10:12:58 -08:00
Juan Manuel Martinez Caamaño	f04dc3b5d4	[Clang] Remove 't' from __builtin_amdgcn_flat_atomic_fmin/fmax_f64 (#173839 ) Allows for type checking depending on the built-in signature. There is no `f32` version for both builtins	2025-12-30 09:14:53 +01:00
Muhammad Abdul	2c376ffeca	[AMDGPU] add clamp immediate operand to WMMA iu8 intrinsic (#171069 ) Fixes #166989 - Adds a clamp immediate operand to the AMDGPU WMMA iu8 intrinsic and threads it through LLVM IR, MIR lowering, Clang builtins/tests, and MLIR ROCDL dialect so all layers agree on the new operand - Updates AMDGPUWmmaIntrinsicModsAB so the clamp attribute is emitted, teaches VOP3P encoding to accept the immediate, and adjusts Clang codegen/builtin headers plus MLIR op definitions and tests to match - Documents what the WMMA clamp operand do - Implement bitcode AutoUpgrade for source compatibility on WMMA IU8 Intrinsic op Possible future enhancements: - infer clamping as an optimization fold based on the use context --------- Co-authored-by: Matt Arsenault <arsenm2@gmail.com>	2025-12-27 12:51:29 -05:00
Juan Manuel Martinez Caamaño	42f741c98e	[Clang] Remove 't' from __builtin_amdgcn_global_atomic_fadd_f32/f64 (#173480 ) Allows for type checking depending on the built-in signature.	2025-12-26 09:16:26 +01:00
Juan Manuel Martinez Caamaño	fcd9235d86	[Clang] Remove 't' from __builtin_amdgcn_flat_atomic_fadd_f32/f64 (#173381 ) Allows for type checking depending on the built-in signature. This introduces some subtle changes in code generation: before, since the signature was meaningless, we would accept any pointer type without casting. After this change, the pointer of the `atomicrmw` matches the flat address space.	2025-12-24 12:07:14 +01:00
Mirko Brkušanin	5759a3a779	[AMDGPU] Add s_wakeup_barrier instruction for gfx1250 (#170501 )	2025-12-10 09:45:13 +01:00
Hongyu Chen	4b0a975939	[OpenCL][NVPTX] Don't set calling convention for OpenCL kernel (#170170 ) Fixes #154772 We previously set `ptx_kernel` for all kernels. But it's incorrect to add `ptx_kernel` to the stub version of kernel introduced in #115821. This patch copies the workaround of AMDGPU.	2025-12-03 16:53:25 +08:00
Matt Arsenault	9fd288e886	clang/AMDGPU: Enable opencl 2.0 features for unknown target (#170308 ) Assume amdhsa triples support flat addressing, which matches the backend logic for the default target. This fixes the rocm device-libs build.	2025-12-02 19:11:30 -05:00
Aaditya	4604762cc3	[AMDGPU] Add builtins for wave reduction intrinsics (#161816 )	2025-11-24 15:13:11 +05:30
Wenju He	c4254cd9bb	[Clang] Support __bf16 type for SPIR/SPIR-V (#169012 ) SPIR/SPIR-V are generic targets. Assume they support __bf16.	2025-11-24 10:10:11 +08:00
Shoreshen	52a58a4193	[AMDGPU] Adding instruction specific features (#167809 )	2025-11-19 11:06:00 +08:00
CarolineConcatto	200793ac21	Extend MemoryEffects to Support Target-Specific Memory Locations (#148650 ) This patch introduces preliminary support for additional memory locations. They are: target_mem0 and target_mem1 and they model memory locations that cannot be represented with existing memory locations. It was a solution suggested in : https://discourse.llvm.org/t/rfc-improving-fpmr-handling-for-fp8-intrinsics-in-llvm/86868/6 Currently, these locations are not yet target-specific. The goal is to enable the compiler to express read/write effects on these resources.	2025-11-18 11:10:58 +00:00
Jay Foad	f037f41350	[IR] Add new function attribute nocreateundeforpoison (#164809 ) Also add a corresponding intrinsic property that can be used to mark intrinsics that do not introduce poison, for example simple arithmetic intrinsics that propagate poison just like a simple arithmetic instruction. As a smoke test this patch adds the new property to llvm.amdgcn.fmul.legacy.	2025-11-04 12:00:44 +00:00
Wenju He	efb84586da	[clang][SPIR][SPIRV] Don't generate constant NULL from addrspacecast generic NULL (#165353 ) Fix a regression caused by 1ffff05a38c9. OpenCL/SPIRV generic address space doesn't cover constant address space. --------- Co-authored-by: Alexey Bader <alexey.bader@intel.com>	2025-10-31 15:35:41 +08:00
Fabian Ritter	ea034477fd	Reapply "[HIP][Clang] Remove __AMDGCN_WAVEFRONT_SIZE macros" (#164217 ) This reverts commit 78bf682cb9033cf6a5bbc733e062c7b7d825fdaf. Original PR: #157463 Revert PR: #158566 The relevant buildbots have been updated to a ROCm version that does not use the macros anymore to avoid the failures. Implements SWDEV-522062.	2025-10-30 13:42:32 +01:00
Florian Hahn	53785846aa	[Clang] Freeze padded vectors before storing. (#164821 ) Currently Clang usually leaves padding bits uninitialized, which means they are undef at the moment. When expanding stores of vector types to include padding, the padding lanes will be poison, hence the padding bits will be poison. This interacts badly with coercion of arguments and return values, where 3 x float vectors will be loaded as i128 integer; poisoning the padding bits will make the whole value poison. Not sure if there's a better way, but I think we have a number of places that currently rely on the padding being undef, not poison. PR: https://github.com/llvm/llvm-project/pull/164821	2025-10-28 19:03:17 -07:00
Stanislav Mekhanoshin	9b5bc98743	[AMDGPU] Add intrinsics for v_[pk]_add_{min\|max}_* instructions (#164731 )	2025-10-22 17:46:33 -07:00
macurtis-amd	5440cfc450	[clang] Add support for cluster sync scope (#162575 ) From Sam Liu: >CUDA supports thread block clusters https://docs.nvidia.com/cuda/cuda-c-programming-guide/#thread-block-clusters > >In their atomic intrinsics, cluster scope is supported https://docs.nvidia.com/cuda/cuda-c-programming-guide/#nv-atomic-fetch-add-and-nv-atomic-add > >For compatibility, clang and hip needs to support cluster scope.	2025-10-21 05:47:26 -05:00
Antonio Frighetto	efcda54794	[clang][CodeGen] Emit `llvm.tbaa.errno` metadata during module creation Let Clang emit `llvm.tbaa.errno` metadata in order to let LLVM carry out optimizations around errno-writing libcalls to, as long as it is proved the involved memory location does not alias `errno`. Previous discussion: https://discourse.llvm.org/t/rfc-modelling-errno-memory-effects/82972.	2025-10-21 11:38:45 +02:00
Shilei Tian	c683f215e5	[NFC][Clang][AMDGPU] Fix upstream and downstream difference (#164304 ) These two files were left during the upstream of the corresponding feature.	2025-10-20 15:47:46 -04:00
Matt Arsenault	853760bca6	AMDGPU: Use ELF mangling in data layout (#163011 ) Closes #95219	2025-10-13 03:01:45 +00:00
paperchalice	2aeefcf40f	[clang][CodeGen] Remove "unsafe-fp-math" attribute support (#162779 ) These global flags block furthur improvements for clang, users should always use fast-math flags see also https://discourse.llvm.org/t/rfc-honor-pragmas-with-ffp-contract-fast/80797 Remove them incrementally, this is the clang part.	2025-10-10 15:56:29 +08:00
Wenju He	1ffff05a38	[clang][SPIR][SPIRV] Materialize non-generic null pointers via addrspacecast (#161773 ) LLVM models ConstantPointerNull as all-zero, but some GPUs (e.g. AMDGPU and our downstream GPU target) use a non-zero sentinel for null in private / local address spaces. SPIR-V is a supported input for our GPU target. This PR preserves a canonical zero form in the generic AS while allowing later lowering to substitute the target’s real sentinel.	2025-10-09 09:10:24 +08:00
Shilei Tian	9e8dda1034	[NFC] Change spelling of cluster feature to "clusters" (#162103 )	2025-10-06 15:55:39 +00:00
Shilei Tian	bea0225c30	[AMDGPU] Make cluster a target feature (#162040 ) This replaces the original arch check.	2025-10-06 05:05:53 +00:00
Alex Voicu	d481e5f9b7	[AMDGPU][SPIRV] Use SPIR-V syncscopes for some AMDGCN BIs (#154867 ) AMDGCN flavoured SPIR-V allows AMDGCN specific builtins, including those for scoped fences and some specific RMWs. However, at present we don't map syncscopes to their SPIR-V equivalents, but rather use the AMDGCN ones. This ends up pessimising the resulting code as system scope is used instead of device (agent) or subgroup (wavefront), so we correct the behaviour, to ensure that we do the right thing during reverse translation.	2025-09-29 22:50:15 +01:00
Shilei Tian	2195fe7e01	[AMDGPU] Add the support for 45-bit buffer resource (#159702 ) On new targets like `gfx1250`, the buffer resource (V#) now uses this format: ``` base (57-bit): resource[56:0] num_records (45-bit): resource[101:57] reserved (6-bit): resource[107:102] stride (14-bit): resource[121:108] ``` This PR changes the type of `num_records` from `i32` to `i64` in both builtin and intrinsic, and also adds the support for lowering the new format. Fixes SWDEV-554034. --------- Co-authored-by: Krzysztof Drewniak <Krzysztof.Drewniak@amd.com>	2025-09-24 11:12:02 -04:00
Stanislav Mekhanoshin	221f8eef9d	[AMDGPU] Add gfx1251 runlines to cooperative atomcis tests. NFC (#159437 )	2025-09-17 14:08:05 -07:00
Stanislav Mekhanoshin	e556dc0b23	[AMDGPU] Add gfx1251 subtarget (#159430 )	2025-09-17 13:02:02 -07:00

1 2 3 4 5 ...

1010 Commits