llvm-project

Author	SHA1	Message	Date
Nikita Popov	1295aa2e81	[Clang] Add -fwrapv-pointer flag (#122486 ) GCC supports three flags related to overflow behavior: * `-fwrapv`: Makes signed integer overflow well-defined. * `-fwrapv-pointer`: Makes pointer overflow well-defined. * `-fno-strict-overflow`: Implies `-fwrapv -fwrapv-pointer`, making both signed integer overflow and pointer overflow well-defined. Clang currently only supports `-fno-strict-overflow` and `-fwrapv`, but not `-fwrapv-pointer`. This PR proposes to introduce `-fwrapv-pointer` and adjust the semantics of `-fwrapv` to match GCC. This allows signed integer overflow and pointer overflow to be controlled independently, while `-fno-strict-overflow` still exists to control both at the same time (and that option is consistent across GCC and Clang).	2025-01-28 09:57:00 +01:00
Adam Yang	aab25f20f6	[HLSL][SPIRV][DXIL] Implement `WaveActiveMax` intrinsic (#123428 ) ``` - add clang builtin to Builtins.td - link builtin in hlsl_intrinsics - add codegen for spirv intrinsic and two directx intrinsics to retain signedness information of the operands in CGBuiltin.cpp - add semantic analysis in SemaHLSL.cpp - add lowering of spirv intrinsic to spirv backend in SPIRVInstructionSelector.cpp - add lowering of directx intrinsics to WaveActiveOp dxil op in DXIL.td - add test cases to illustrate passespendent pr merges. ``` Resolves #99170	2025-01-27 23:26:56 -08:00
Momchil Velikov	f75860f895	[AArch64] Implement NEON FP8 intrinsics for fused multiply-add (#123615 ) This patch adds the following intrinsics: * Fused multiply-add non-indexed float16x8_t vmlalbq_f16_mf8_fpm(float16x8_t, mfloat8x16_t, mfloat8x16_t, fpm_t) float16x8_t vmlaltq_f16_mf8_fpm(float16x8_t, mfloat8x16_t, mfloat8x16_t, fpm_t) float32x4_t vmlallbbq_f32_mf8_fpm(float32x4_t, mfloat8x16_t, mfloat8x16_t, fpm_t) float32x4_t vmlallbtq_f32_mf8_fpm(float32x4_t, mfloat8x16_t, mfloat8x16_t, fpm_t) float32x4_t vmlalltbq_f32_mf8_fpm(float32x4_t, mfloat8x16_t, mfloat8x16_t, fpm_t) float32x4_t vmlallttq_f32_mf8_fpm(float32x4_t, mfloat8x16_t, mfloat8x16_t, fpm_t) * Floating-point multiply-add long to half-precision (vector, by element) float16x8_t vmlalbq_lane_f16_mf8_fpm(float16x8_t vd, mfloat8x16_t vn, mfloat8x8_t vm, __builtin_constant_p(lane), fpm_t fpm) float16x8_t vmlalbq_laneq_f16_mf8_fpm(float16x8_t vd, mfloat8x16_t vn, mfloat8x16_t vm, __builtin_constant_p(lane), fpm_t fpm) float16x8_t vmlaltq_lane_f16_mf8_fpm(float16x8_t vd, mfloat8x16_t vn, mfloat8x8_t vm, __builtin_constant_p(lane), fpm_t fpm) float16x8_t vmlaltq_laneq_f16_mf8_fpm(float16x8_t vd, mfloat8x16_t vn, mfloat8x16_t vm, __builtin_constant_p(lane), fpm_t fpm) * Floating-point multiply-add long-long to single-precision (vector, by element) float32x4_t vmlallbbq_lane_f32_mf8_fpm(float32x4_t vd, mfloat8x16_t vn, mfloat8x8_t vm, __builtin_constant_p(lane), fpm_t fpm) float32x4_t vmlallbbq_laneq_f32_mf8_fpm(float32x4_t vd, mfloat8x16_t vn, mfloat8x16_t vm, __builtin_constant_p(lane), fpm_t fpm) float32x4_t vmlallbtq_lane_f32_mf8_fpm(float32x4_t vd, mfloat8x16_t vn, mfloat8x8_t vm, __builtin_constant_p(lane), fpm_t fpm) float32x4_t vmlallbtq_laneq_f32_mf8_fpm(float32x4_t vd, mfloat8x16_t vn, mfloat8x16_t vm, __builtin_constant_p(lane), fpm_t fpm) float32x4_t vmlalltbq_lane_f32_mf8_fpm(float32x4_t vd, mfloat8x16_t vn, mfloat8x8_t vm, __builtin_constant_p(lane), fpm_t fpm) float32x4_t vmlalltbq_laneq_f32_mf8_fpm(float32x4_t vd, mfloat8x16_t vn, mfloat8x16_t vm, __builtin_constant_p(lane), fpm_t fpm) float32x4_t vmlallttq_lane_f32_mf8_fpm(float32x4_t vd, mfloat8x16_t vn, mfloat8x8_t vm, __builtin_constant_p(lane), fpm_t fpm) float32x4_t vmlallttq_laneq_f32_mf8_fpm(float32x4_t vd, mfloat8x16_t vn, mfloat8x16_t vm, __builtin_constant_p(lane), fpm_t fpm)	2025-01-28 00:38:44 +00:00
Momchil Velikov	804b81d39f	[AArch64] Add FP8 Neon intrinsics for dot-product (#123613 ) This patch adds the following intrinsics: float16x4_t vdot_f16_mf8_fpm(float16x4_t vd, mfloat8x8_t vn, mfloat8x8_t vm, fpm_t fpm) float16x8_t vdotq_f16_mf8_fpm(float16x8_t vd, mfloat8x16_t vn, mfloat8x16_t vm, fpm_t fpm) float16x4_t vdot_lane_f16_mf8_fpm(float16x4_t vd, mfloat8x8_t vn, mfloat8x8_t vm, __builtin_constant_p(lane), fpm_t fpm) float16x4_t vdot_laneq_f16_mf8_fpm(float16x4_t vd, mfloat8x8_t vn, mfloat8x16_t vm, __builtin_constant_p(lane), fpm_t fpm) float16x8_t vdotq_lane_f16_mf8_fpm(float16x8_t vd, mfloat8x16_t vn, mfloat8x8_t vm, __builtin_constant_p(lane), fpm_t fpm) float16x8_t vdotq_laneq_f16_mf8_fpm(float16x8_t vd, mfloat8x16_t vn, mfloat8x16_t vm, __builtin_constant_p(lane), fpm_t fpm) float32x2_t vdot_f32_mf8_fpm(float32x2_t vd, mfloat8x8_t vn, mfloat8x8_t vm, fpm_t fpm) float32x4_t vdotq_f32_mf8_fpm(float32x4_t vd, mfloat8x16_t vn, mfloat8x16_t vm, fpm_t fpm) float32x2_t vdot_lane_f32_mf8_fpm(float32x2_t vd, mfloat8x8_t vn, mfloat8x8_t vm, __builtin_constant_p(lane), fpm_t fpm) float32x2_t vdot_laneq_f32_mf8_fpm(float32x2_t vd, mfloat8x8_t vn, mfloat8x16_t vm, __builtin_constant_p(lane), fpm_t fpm) float32x4_t vdotq_lane_f32_mf8_fpm(float32x4_t vd, mfloat8x16_t vn, mfloat8x8_t vm, __builtin_constant_p(lane), fpm_t fpm) float32x4_t vdotq_laneq_f32_mf8_fpm(float32x4_t vd, mfloat8x16_t vn, mfloat8x16_t vm, __builtin_constant_p(lane), fpm_t fpm)	2025-01-27 21:14:16 +00:00
Momchil Velikov	99bd2e3f12	[AArch64] Add Neon FP8 conversion intrinsics (#123612 ) The patch adds the following intrinsics: bfloat16x8_t vcvt1_bf16_mf8_fpm(mfloat8x8_t vn, fpm_t fpm) bfloat16x8_t vcvt1_low_bf16_mf8_fpm(mfloat8x16_t vn, fpm_t fpm) bfloat16x8_t vcvt2_bf16_mf8_fpm(mfloat8x8_t vn, fpm_t fpm) bfloat16x8_t vcvt2_low_bf16_mf8_fpm(mfloat8x16_t vn, fpm_t fpm) bfloat16x8_t vcvt1_high_bf16_mf8_fpm(mfloat8x16_t vn, fpm_t fpm) bfloat16x8_t vcvt2_high_bf16_mf8_fpm(mfloat8x16_t vn, fpm_t fpm) float16x8_t vcvt1_f16_mf8_fpm(mfloat8x8_t vn, fpm_t fpm) float16x8_t vcvt1_low_f16_mf8_fpm(mfloat8x16_t vn, fpm_t fpm) float16x8_t vcvt2_f16_mf8_fpm(mfloat8x8_t vn, fpm_t fpm) float16x8_t vcvt2_low_f16_mf8_fpm(mfloat8x16_t vn, fpm_t fpm) float16x8_t vcvt1_high_f16_mf8_fpm(mfloat8x16_t vn, fpm_t fpm) float16x8_t vcvt2_high_f16_mf8_fpm(mfloat8x16_t vn, fpm_t fpm) mfloat8x8_t vcvt_mf8_f32_fpm(float32x4_t vn, float32x4_t vm, fpm_t fpm) mfloat8x16_t vcvt_high_mf8_f32_fpm(mfloat8x8_t vd, float32x4_t vn, float32x4_t vm, fpm_t fpm) mfloat8x8_t vcvt_mf8_f16_fpm(float16x4_t vn, float16x4_t vm, fpm_t fpm) mfloat8x16_t vcvtq_mf8_f16_fpm(float16x8_t vn, float16x8_t vm, fpm_t fpm) Co-Authored-By: Caroline Concatto <caroline.concatto@arm.com>	2025-01-27 17:32:47 +00:00
Momchil Velikov	87103a016f	[AArch64] Implement NEON FP8 vectors as VectorType (#123603 ) Reimplement Neon FP8 vector types using attribute `neon_vector_type` instead of having them as builtin types. This allows to implement FP8 Neon intrinsics without the need to add special cases for these types when using `__builtin_shufflevector` or bitcast (using C-style cast operator) between vectors, both extensively used in the generated code in `arm_neon.h`.	2025-01-27 10:41:53 +00:00
Phoebe Wang	ee2722fc88	[X86][AVX10.2-BF16] Remove [NE]P from intrinsic and instruction name (#123335 ) Ref.: https://cdrdv2.intel.com/v1/dl/getContent/828965	2025-01-24 15:49:28 +08:00
Finn Plummer	0fe8e70c66	Revert "Reland "[HLSL] Implement the `reflect` HLSL function"" (#124046 ) Reverts llvm/llvm-project#123853 The introduction of `reflect-error.ll` surfaced a bug with the use of `report_fatal_error` in `SPIRVInstructionSelector` that was propagated into the pr. This has caused a build-bot breakage, and the work to solve the underlying issue is tracked here: https://github.com/llvm/llvm-project/issues/124045. We can re-apply this commit when the underlying issue is resolved.	2025-01-22 18:22:03 -08:00
Deric Cheung	2656928d0c	Reland "[HLSL] Implement the `reflect` HLSL function" (#123853 ) This PR relands [#122992](https://github.com/llvm/llvm-project/pull/122992). Some machines were failing to run the `reflect-error.ll` test due to the RUN lines ```llvm ; RUN: not %if spirv-tools %{ llc -O0 -mtriple=spirv64-unknown-unknown %s -o /dev/null 2>&1 -filetype=obj %} ; RUN: not %if spirv-tools %{ llc -O0 -mtriple=spirv32-unknown-unknown %s -o /dev/null 2>&1 -filetype=obj %} ``` which failed when `spirv-tools` was not present on the machine due to running the command `not` without any arguments. These RUN lines have been removed since they don't actually test anything new compared to the other two RUN lines due to the expected error during instruction selection. ```llvm ; RUN: not llc -verify-machineinstrs -O0 -mtriple=spirv64-unknown-unknown %s -o /dev/null 2>&1 \| FileCheck %s ; RUN: not llc -verify-machineinstrs -O0 -mtriple=spirv32-unknown-unknown %s -o /dev/null 2>&1 \| FileCheck %s ```	2025-01-22 13:29:19 -08:00
Oliver Stannard	c4ef805b0b	[Clang] Re-write codegen for atomic_test_and_set and atomic_clear (#121943 ) Re-write the sema and codegen for the atomic_test_and_set and atomic_clear builtin functions to go via AtomicExpr, like the other atomic builtins do. This simplifies the code, because AtomicExpr already handles things like generating code for to dynamically select the memory ordering, which was duplicated for these builtins. This also fixes a few crash bugs, one when passing an integer to the pointer argument, and one when using an array. This also adds diagnostics for the memory orderings which are not valid for atomic_clear according to https://gcc.gnu.org/onlinedocs/gcc/_005f_005fatomic-Builtins.html, which were missing before. Fixes https://github.com/llvm/llvm-project/issues/111293. This is a re-land of #120449, modified to allow any non-const pointer type for the first argument.	2025-01-22 10:48:04 +00:00
schittir	65df99c208	[NFC] Avoid potential nullptr deref by using castAs<> (#123395 ) Use castAs<> instead of getAs<>	2025-01-21 21:41:37 -08:00
Finn Plummer	4c91263045	Revert "[HLSL] Implement the `reflect` HLSL function" (#123846 ) Reverts llvm/llvm-project#122992 Due to an included failing test-case the commit causes build failures.	2025-01-21 15:12:58 -08:00
Deric Cheung	dd860bcfb5	[HLSL] Implement the `reflect` HLSL function (#122992 ) Fixes #99152 Tasks completed: - Implement `reflect` in `clang/lib/Headers/hlsl/hlsl_intrinsics.h` - Implement the `reflect` SPIR-V target built-in in `clang/include/clang/Basic/BuiltinsSPIRV.td` - Add a SPIR-V fast path in `clang/lib/Headers/hlsl/hlsl_detail.h` in the form ```c++ #if (__has_builtin(__builtin_spirv_reflect)) return __builtin_spirv_reflect(...); #else return ...; // regular behavior #endif ``` - Add codegen for the SPIR-V `reflect` built-in to `EmitSPIRVBuiltinExpr` in `clang/lib/CodeGen/CGBuiltin.cpp` - Add HLSL codegen tests to `clang/test/CodeGenHLSL/builtins/reflect.hlsl` - Add SPIR-V built-in codegen tests to `clang/test/CodeGenSPIRV/Builtins/reflect.c` - Add sema tests to `clang/test/SemaHLSL/BuiltIns/reflect-errors.hlsl` - Add SPIR-V sema tests to `clang/test/CodeGenSPIRV/Builtins/reflect-errors.c` - Create the `int_spv_reflect` intrinsic in `llvm/include/llvm/IR/IntrinsicsSPIRV.td` - In `llvm/lib/Target/SPIRV/SPIRVInstructionSelector.cpp` create the `reflect` lowering and map it to `int_spv_reflect` in `SPIRVInstructionSelector::selectIntrinsic` - Create a SPIR-V backend test case in `llvm/test/CodeGen/SPIRV/hlsl-intrinsics/reflect.ll` Additional tasks completed: - Implement sema check for the `reflect` SPIR-V built-in in `clang/lib/Sema/SemaSPIRV.cpp` - Required for HLSL codegen to work via the SPIR-V fast path, because the types defined in `clang/include/clang/Basic/BuiltinsSPIRV.td` are being overridden - Create SPIR-V backend error test case in `llvm/test/CodeGen/SPIRV/opencl/reflect-error.ll` - Since `reflect` is only available in the GLSL extended instruction set, using it in OpenCL should result in an error Incomplete tasks: - Create SPIR-V backend test case in `llvm/test/CodeGen/SPIRV/opencl/reflect.ll` - An OpenCL test is not applicable in this case because the [OpenCL SPIR-V extended instruction set](https://registry.khronos.org/SPIR-V/specs/unified1/OpenCL.ExtendedInstructionSet.100.html) does not include a `reflect` function	2025-01-21 14:30:29 -08:00
David Green	6dc356d698	[Clang] Add numeric for iota. Hopefuly fixes MSVC build after 547bfda56b2e3f3a4c6d2357d3566dcd3fa996ad.	2025-01-21 10:36:58 +00:00
David Green	547bfda56b	[AArch64] Improve bcvtn2 and remove aarch64_neon_bfcvt intrinsics (#120363 ) This started out as trying to combine bf16 fpround to BFCVT2 instructions, but ended up removing the aarch64.neon.nfcvt intrinsics in favour of generating fpround instructions directly. This simplifies the patterns and can lead to other optimizations. The BFCVT2 instruction is adjusted to makes sure the types are valid, and a bfcvt2 is now generated in more place. The old intrinsics are auto-upgraded to fptrunc instructions too.	2025-01-21 09:16:04 +00:00
Ulrich Weigand	8424bf207e	[SystemZ] Add support for new cpu architecture - arch15 This patch adds support for the next-generation arch15 CPU architecture to the SystemZ backend. This includes: - Basic support for the new processor and its features. - Detection of arch15 as host processor. - Assembler/disassembler support for new instructions. - Exploitation of new instructions for code generation. - New vector (signed\|unsigned\|bool) __int128 data types. - New LLVM intrinsics for certain new instructions. - Support for low-level builtins mapped to new LLVM intrinsics. - New high-level intrinsics in vecintrin.h. - Indicate support by defining __VEC__ == 10305. Note: No currently available Z system supports the arch15 architecture. Once new systems become available, the official system name will be added as supported -march name.	2025-01-20 19:30:21 +01:00
Farzon Lotfi	eddeb36cf1	[SPIRV] add pre legalization instruction combine (#122839 ) - Add the boilerplate to support instcombine in SPIRV - instcombine length(X-Y) to distance(X,Y) - switch HLSL's distance intrinsic to not special case for SPIRV. - fixes #122766 - This RFC we were requested to add in the infra for pattern matching: https://discourse.llvm.org/t/rfc-add-targetbuiltins-for-spirv-to-support-hlsl/83329/13	2025-01-17 14:46:14 -05:00
Adam Yang	4446a9849a	[HLSL][SPIRV][DXIL] Implement `WaveActiveSum` intrinsic (#118580 ) ``` - add clang builtin to Builtins.td - link builtin in hlsl_intrinsics - add codegen for spirv intrinsic and two directx intrinsics to retain signedness information of the operands in CGBuiltin.cpp - add semantic analysis in SemaHLSL.cpp - add lowering of spirv intrinsic to spirv backend in SPIRVInstructionSelector.cpp - add lowering of directx intrinsics to WaveActiveOp dxil op in DXIL.td - add test cases to illustrate passespendent pr merges. ``` Resolves #70106 --------- Co-authored-by: Finn Plummer <canadienfinn@gmail.com>	2025-01-16 10:35:23 -08:00
Ashley Coleman	4f48abff0f	[HLSL] Implement elementwise firstbitlow builtin (#116858 ) Closes https://github.com/llvm/llvm-project/issues/99116 Implements `firstbitlow` by extracting common functionality from `firstbithigh` into a shared function while also fixing a bug for an edge case where `u64x3` and larger vectors will attempt to create vectors larger than the SPRIV max of 4. --------- Co-authored-by: Steven Perron <stevenperron@google.com>	2025-01-15 15:36:50 -07:00
Thurston Dang	55b587506e	[ubsan][NFCI] Use SanitizerOrdinal instead of SanitizerMask for EmitCheck (exactly one sanitizer is required) (#122511 ) The `Checked` parameter of `CodeGenFunction::EmitCheck` is of type `ArrayRef<std::pair<llvm::Value *, SanitizerMask>>`, which is overly generalized: SanitizerMask can denote that zero or more sanitizers are enabled, but `EmitCheck` requires that exactly one sanitizer is specified in the SanitizerMask (e.g., `SanitizeTrap.has(Checked[i].second)` enforces that). This patch replaces SanitizerMask with SanitizerOrdinal in the `Checked` parameter of `EmitCheck` and code that transitively relies on it. This should not affect the behavior of UBSan, but it has the advantages that: - the code is clearer: it avoids ambiguity in EmitCheck about what to do if multiple bits are set - specifying the wrong number of sanitizers in `Checked[i].second` will be detected as a compile-time error, rather than a runtime assertion failure Suggested by Vitaly in https://github.com/llvm/llvm-project/pull/122392 as an alternative to adding an explicit runtime assertion that the SanitizerMask contains exactly one sanitizer.	2025-01-10 12:40:57 -08:00
Farzon Lotfi	b900379e26	[HLSL] Reapply Move length support out of the DirectX Backend (#121611 ) (#122337 ) ## Changes - Delete DirectX length intrinsic - Delete HLSL length lang builtin - Implement length algorithm entirely in the header. ## History - In the past if an HLSL intrinsic lowered to either a spirv op code or a DXIL opcode we represented it with intrinsics ## Why we are moving away? - To make HLSL apis more portable the team decided that it makes sense for some intrinsics to be defined only in the header. - Since there tends to be more SPIRV opcodes than DXIL opcodes the plan is to support SPIRV opcodes either with target specific builtins or via pattern matching.	2025-01-10 14:16:27 -05:00
Nico Weber	9ec92873ec	Revert "[HLSL] Move length support out of the DirectX Backend (#121611 )" This reverts commit a6b7181733c83523a39d4f4e788c6b7a227d477d. Breaks Clang :: CodeGenHLSL/builtins/length.hlsl, see https://github.com/llvm/llvm-project/pull/121611#issuecomment-2581004278	2025-01-09 14:19:03 -05:00
Farzon Lotfi	a6b7181733	[HLSL] Move length support out of the DirectX Backend (#121611 ) ## Changes - Delete DirectX length intrinsic - Delete HLSL length lang builtin - Implement length algorithm entirely in the header. ## History - In the past if an HLSL intrinsic lowered to either a spirv op code or a DXIL opcode we represented it with intrinsics ## Why we are moving away? - To make HLSL apis more portable the team decided that it makes sense for some intrinsics to be defined only in the header. - Since there tends to be more SPIRV opcodes than DXIL opcodes the plan is to support SPIRV opcodes either with target specific builtins or via pattern matching.	2025-01-09 13:11:52 -05:00
Stanislav Mekhanoshin	50ca2eff5f	Drop emitQuaternaryBuiltin from clang (#122169 ) It was superceeded by the emitBuiltinWithOneOverloadedType() some time ago.	2025-01-09 09:56:44 -08:00
Steven Perron	92e575d7e4	[HLSL] Add SPIR-V version of getPointer. (#121963 ) Use the spv version of the resource.getpointer intrinsic when targeting SPIR-V.	2025-01-08 10:51:17 -05:00
Nicholas Guy	21b531ead1	[clang][llvm][aarch64] Add aarch64_sme_in_streaming_mode intrinsic (#120265 ) Replacing the extant streaming mode function call with an intrinsic allows us to make further optimisations around it. For example, if it's called within a function that has a known streaming mode, we can remove the dead code, and avoid the redundant conditional branch.	2025-01-07 09:02:26 +00:00
Farzon Lotfi	21edac25f0	[SPIRV] Add Target Builtins using Distance ext as an example (#121598 ) - Update pr labeler so new SPIRV files get properly labeled. - Add distance target builtin to BuiltinsSPIRV.td. - Update TargetBuiltins.h to account for spirv builtins. - Update clang basic CMakeLists.txt to build spirv builtin tablegen. - Hook up sema for SPIRV in Sema.h\|cpp, SemaSPIRV.h\|cpp, and SemaChecking.cpp. - Hookup sprv target builtins to SPIR.h\|SPIR.cpp target. - Update GBuiltin.cpp to emit spirv intrinsics when we get the expected spirv target builtin. Consensus was reach in this RFC to add both target builtins and pattern matching: https://discourse.llvm.org/t/rfc-add-targetbuiltins-for-spirv-to-support-hlsl/83329. pattern matching will come in a separate pr this one just sets up the groundwork to do target builtins for spirv. partially resolves [#99107](https://github.com/llvm/llvm-project/issues/99107)	2025-01-06 11:37:20 -05:00
Benjamin Maxwell	e4e2f53693	[clang] Add sincos builtin using `llvm.sincos` intrinsic (#114086 ) This registers `sincos[f\|l]` as a clang builtin and updates GCBuiltin to emit the `llvm.sincos.` intrinsic when `-fno-math-errno` is set. Note: `llvm.sincos.` is only emitted by `__builtin_sincos[f\|l]` functions in this initial patch.	2025-01-06 11:07:25 +00:00
Mikhail Goncharov	93743ee566	Revert "[Clang] Re-write codegen for atomic_test_and_set and atomic_clear (#120449 )" This reverts commit 9fc2fadbfcb34df5f72bdaed28a7874bf584eed7. See https://github.com/llvm/llvm-project/pull/120449#issuecomment-2556089016	2024-12-20 08:14:26 +01:00
Oliver Stannard	9fc2fadbfc	[Clang] Re-write codegen for atomic_test_and_set and atomic_clear (#120449 ) Re-write the sema and codegen for the atomic_test_and_set and atomic_clear builtin functions to go via AtomicExpr, like the other atomic builtins do. This simplifies the code, because AtomicExpr already handles things like generating code for to dynamically select the memory ordering, which was duplicated for these builtins. This also fixes a few crash bugs, one when passing an integer to the pointer argument, and one when using an array. This also adds diagnostics for the memory orderings which are not valid for atomic_clear according to https://gcc.gnu.org/onlinedocs/gcc/_005f_005fatomic-Builtins.html, which were missing before. Fixes #111293.	2024-12-19 09:12:19 +00:00
Jie Fu	a1766699c6	[clang] Fix -Wunused-variable in CGBuiltin.cpp (NFC) /llvm-project/clang/lib/CodeGen/CGBuiltin.cpp:19441:17: error: unused variable 'Ty' [-Werror,-Wunused-variable] llvm::Type *Ty = Op->getType(); ^ 1 error generated.	2024-12-17 09:07:44 +08:00
Ashley Coleman	41a6e9cfd6	[HLSL] Implement `WaveActiveAllTrue` Intrinsic (#117245 ) Resolves https://github.com/llvm/llvm-project/issues/99161 - [x] Implement `WaveActiveAllTrue` clang builtin, - [x] Link `WaveActiveAllTrue` clang builtin with `hlsl_intrinsics.h` - [x] Add sema checks for `WaveActiveAllTrue` to `CheckHLSLBuiltinFunctionCall` in `SemaChecking.cpp` - [x] Add codegen for `WaveActiveAllTrue` to `EmitHLSLBuiltinExpr` in `CGBuiltin.cpp` - [x] Add codegen tests to `clang/test/CodeGenHLSL/builtins/WaveActiveAllTrue.hlsl` - [x] Add sema tests to `clang/test/SemaHLSL/BuiltIns/WaveActiveAllTrue-errors.hlsl` - [x] Create the `int_dx_WaveActiveAllTrue` intrinsic in `IntrinsicsDirectX.td` - [x] Create the `DXILOpMapping` of `int_dx_WaveActiveAllTrue` to `114` in `DXIL.td` - [x] Create the `WaveActiveAllTrue.ll` and `WaveActiveAllTrue_errors.ll` tests in `llvm/test/CodeGen/DirectX/` - [x] Create the `int_spv_WaveActiveAllTrue` intrinsic in `IntrinsicsSPIRV.td` - [x] In SPIRVInstructionSelector.cpp create the `WaveActiveAllTrue` lowering and map it to `int_spv_WaveActiveAllTrue` in `SPIRVInstructionSelector::selectIntrinsic`. - [x] Create SPIR-V backend test case in `llvm/test/CodeGen/SPIRV/hlsl-intrinsics/WaveActiveAllTrue.ll`	2024-12-16 16:13:35 -08:00
Momchil Velikov	c2172431c7	[AArch64] Implements FP8 SVE intrinsics for dot-product (#118125 ) This patch adds the following intrinsics: * 8-bit floating-point dot product to single-precision. // Only if (__ARM_FEATURE_SVE2 && __ARM_FEATURE_FP8DOT4) \|\| __ARM_FEATURE_SSVE_FP8DOT4 svfloat32_t svdot[_f32_mf8]_fpm(svfloat32_t zda, svmfloat8_t zn, svmfloat8_t zm, fpm_t fpm); svfloat32_t svdot[_n_f32_mf8]_fpm(svfloat32_t zda, svmfloat8_t zn, mfloat8_t zm, fpm_t fpm); * 8-bit floating-point indexed dot product to single-precision. // Only if (__ARM_FEATURE_SVE2 && __ARM_FEATURE_FP8DOT4) \|\| __ARM_FEATURE_SSVE_FP8DOT4 svfloat32_t svdot_lane[_f32_mf8]_fpm(svfloat32_t zda, svmfloat8_t zn, svmfloat8_t zm, uint64_t imm0_3, fpm_t fpm); * 8-bit floating-point dot product to half-precision. // Only if (__ARM_FEATURE_SVE2 && __ARM_FEATURE_FP8DOT2) \|\| __ARM_FEATURE_SSVE_FP8DOT2 svfloat16_t svdot[_f16_mf8]_fpm(svfloat16_t zda, svmfloat8_t zn, svmfloat8_t zm, fpm_t fpm); svfloat16_t svdot[_n_f16_mf8]_fpm(svfloat16_t zda, svmfloat8_t zn, mfloat8_t zm, fpm_t fpm); * 8-bit floating-point indexed dot product to half-precision. // Only if (__ARM_FEATURE_SVE2 && __ARM_FEATURE_FP8DOT2) \|\| __ARM_FEATURE_SSVE_FP8DOT2 svfloat16_t svdot_lane[_f16_mf8]_fpm(svfloat16_t zda, svmfloat8_t zn, svmfloat8_t zm, uint64_t imm0_7, fpm_t fpm);	2024-12-13 14:06:54 +00:00
anoopkg6	dc04d414df	SystemZ: Add support for __builtin_setjmp and __builtin_longjmp. (#119257 ) This pr includes fixes for original pr##116642. Implementation for __builtin_setjmp and __builtin_longjmp for SystemZ..	2024-12-10 19:50:51 +01:00
SpencerAbson	99f6ca9b7b	[AArch64] Implement intrinsics for SME FP8 FMOPA (#118115 ) This patch implements the following intrinsics: 8-bit floating-point sum of outer products and accumulate. ``` c // Only if __ARM_FEATURE_SME_F8F16 != 0 void svmopa_za16[_mf8]_m_fpm(uint64_t tile, svbool_t pn, svbool_t pm, svmfloat8_t zn, svmfloat8_t zm, fpm_t fpm) __arm_streaming __arm_inout("za"); // Only if __ARM_FEATURE_SME_F8F32 != 0 void svmopa_za32[_mf8]_m_fpm(uint64_t tile, svbool_t pn, svbool_t pm, svmfloat8_t zn, svmfloat8_t zm, fpm_t fpm) __arm_streaming __arm_inout("za"); ``` In accordance with: https://github.com/ARM-software/acle/pull/323/ Co-authored-by: Momchil Velikov momchil.velikov@arm.com Co-authored-by: Marian Lukac marian.lukac@arm.com	2024-12-09 11:13:08 +00:00
Ulrich Weigand	8787bc72a6	Revert "[SystemZ] Add support for __builtin_setjmp and __builtin_longjmp (#116642 )" This reverts commit 030bbc92a705758f1131fb29cab5be6d6a27dd1f.	2024-12-07 00:55:54 +01:00
anoopkg6	030bbc92a7	[SystemZ] Add support for __builtin_setjmp and __builtin_longjmp (#116642 ) Implementation for __builtin_setjmp and __builtin_longjmp for SystemZ.	2024-12-06 23:33:33 +01:00
wwwatermiao	409edc64d1	[AArch64][SME] Fix bug on SMELd1St1 (#118109 ) Patch[1] has update intrinsic interface for ld1/st1, while based on ARM's document, "If the intrinsic also has a vnum argument, the ZA slice number is calculated by adding vnum to slice.". But the "vnum" did not work for our realization now, this patch fix this point. [1]`ee31ba0dd9`	2024-12-05 14:39:02 -03:00
Daniel Paoliello	35c7df1a21	[aarch64][arm] Add support for the _Interlocked[Compare]ExchangePointer_{acq\|nf\|rel} MS intrinsics (#117645 ) Adds support for the following MSVC intrinsics: * `_InterlockedCompareExchangePointer_acq` * `_InterlockedCompareExchangePointer_rel` * `_InterlockedExchangePointer_acq` * `_InterlockedExchangePointer_nf` * `_InterlockedExchangePointer_rel` These are documented at: <https://learn.microsoft.com/en-us/cpp/intrinsics/arm64-intrinsics?view=msvc-170#interlocked-intrinsics> NOTE: `_InterlockedCompareExchangePointer_nf` is not being added since it already exists, although it was incorrectly added for all architectures instead of being Arm & AArch64 specific. This change also unifies how the pointer and non-pointer interlocked compare-exchange intrinsics are being handled.	2024-12-04 13:41:26 -08:00
Daniel Paoliello	ee9e786717	[aarch64] Add support for the __{inc\|add}x18{byte\|word\|dword\|qword intrinsics (#117752 ) Adds support for the following MSVC intrinsics: * `__addx18byte` * `__addx18word` * `__addx18dword` * `__addx18qword` * `__incx18byte` * `__incx18word` * `__incx18dword` * `__incx18qword` These are documented at: <https://learn.microsoft.com/en-us/cpp/intrinsics/arm64-intrinsics?view=msvc-170>	2024-12-04 10:29:58 -08:00
Adam Yang	dd2b2b8bbb	[clang][HLSL] Add GroupMemoryBarrierWithGroupSync intrinsic (#111883 ) partially fixes #70103 ### Changes * Implemented `GroupMemoryBarrierWithGroupSync` clang builtin * Linked `GroupMemoryBarrierWithGroupSync` clang builtin with `hlsl_intrinsics.h` * Added sema checks for `GroupMemoryBarrierWithGroupSync` to `CheckHLSLBuiltinFunctionCall` in `SemaChecking.cpp` * Add codegen for `GroupMemoryBarrierWithGroupSync` to `EmitHLSLBuiltinExpr` in `CGBuiltin.cpp` * Add codegen tests to `clang/test/CodeGenHLSL/builtins/GroupMemoryBarrierWithGroupSync.hlsl` * Add sema tests to `clang/test/SemaHLSL/BuiltIns/GroupMemoryBarrierWithGroupSync-errors.hlsl` ### Related PRs * [[DXIL] Add GroupMemoryBarrierWithGroupSync intrinsic #111884](https://github.com/llvm/llvm-project/pull/111884) * [[SPIRV] Add GroupMemoryBarrierWithGroupSync intrinsic #111888](https://github.com/llvm/llvm-project/pull/111888)	2024-12-03 01:16:49 -08:00
Justin Bogner	bd92e46204	[HLSL] Implement RWBuffer::operator[] via __builtin_hlsl_resource_getpointer (#117017 ) This introduces `__builtin_hlsl_resource_getpointer`, which lowers to `llvm.dx.resource.getpointer` and is used to implement indexing into resources. This will only work through the backend for typed buffers at this point, but the changes to structured buffers should be correct as far as the frontend is concerned. Note: We probably want this to return a reference in the HLSL device address space, but for now we're just using address space 0. Creating a device address space and updating this code can be done later as necessary. Fixes #95956	2024-12-02 14:03:31 -08:00
Matt Arsenault	a796f597cd	AMDGPU: Allow f16/bf16 for DS_READ_TR16_B64 gfx950 builtins (#118297 ) Co-authored-by: Sirish Pande <Sirish.Pande@amd.com>	2024-12-02 14:40:36 -05:00
SpencerAbson	e4ee970c4b	[AArch64] Implement intrinsics for F1CVTL/F2CVTL and BF1CVTL/BF2CVTL (#116959 ) This patch implements the following intrinsics: 8-bit floating-point convert to deinterleaved half-precision or BFloat16. ``` c // Variant is also available for: _bf16[_mf8]_x2 svfloat16x2_t svcvtl1_f16[_mf8]_x2_fpm(svmfloat8_t zn, fpm_t fpm) __arm_streaming; svfloat16x2_t svcvtl2_f16[_mf8]_x2_fpm(svmfloat8_t zn, fpm_t fpm) __arm_streaming; ``` Defined in https://github.com/ARM-software/acle/pull/323 Co-authored-by: Caroline Concatto caroline.concatto@arm.com Co-authored-by: Marian Lukac marian.lukac@arm.com	2024-11-28 12:37:02 +00:00
Matt Arsenault	62dc8f3069	AMDGPU: Add builtins & codegen support for bitop3_b{16\|32} of gfx950. (#117823 ) Co-authored-by: Pravin Jagtap <Pravin.Jagtap@amd.com>	2024-11-26 23:33:07 -05:00
Helena Kotas	cac978331f	[HLSL] Add `Increment`/`DecrementCounter` methods to structured buffers (#117608 ) Introduces `__builtin_hlsl_buffer_update_counter` clang buildin that is used to implement the `IncrementCounter` and `DecrementCounter` methods on `RWStructuredBuffer` and `RasterizerOrderedStructuredBuffer` (see Note). The builtin is translated to LLVM intrisic `llvm.dx.bufferUpdateCounter` or `llvm.spv.bufferUpdateCounter`. Introduces `BuiltinTypeMethodBuilder` helper in `HLSLExternalSemaSource` that enables adding methods to builtin types using builder pattern like this: ``` BuiltinTypeMethodBuilder(Sema, RecordBuilder, "MethodName", ReturnType) .addParam("param_name", Type, InOutModifier) .callBuiltin("buildin_name", { BuiltinParams }) .finalizeMethod(); ``` Fixes #113513 [First version](llvm/llvm-project#114148) of this PR was reverted because of build break.	2024-11-25 16:10:48 -08:00
Matt Arsenault	e97fb2207e	AMDGPU: Add support for load transpose instructions for gfx950 (#117378 ) This patch support for intrinsics in clang, as well as assembly instructions in the backend. Co-authored-by: Sirish Pande <Sirish.Pande@amd.com>	2024-11-25 09:39:04 -08:00
Congcong Cai	cbdd14ee9d	[clang][NFC]add static for internal linkage function (#117482 ) Detected by misc-use-internal-linkage	2024-11-25 06:48:33 +08:00
Helena Kotas	dc4c8de179	Revert "[HLSL] Add `Increment`/`DecrementCounter` methods to structured buffers (#114148 )" (#117448 ) This reverts commit 94bde8cdc39ff7e9c59ee0cd5edda882955242aa.	2024-11-23 12:02:07 -08:00
Helena Kotas	94bde8cdc3	[HLSL] Add `Increment`/`DecrementCounter` methods to structured buffers (#114148 ) Introduces `__builtin_hlsl_buffer_update_counter` clang buildin that is used to implement the `IncrementCounter` and `DecrementCounter` methods on `RWStructuredBuffer` and `RasterizerOrderedStructuredBuffer` (see Note). The builtin is translated to LLVM intrisic `llvm.dx.bufferUpdateCounter` or `llvm.spv.bufferUpdateCounter`. Introduces `BuiltinTypeMethodBuilder` helper in `HLSLExternalSemaSource` that enables adding methods to builtin types using builder pattern like this: ``` BuiltinTypeMethodBuilder(Sema, RecordBuilder, "MethodName", ReturnType) .addParam("param_name", Type, InOutModifier) .callBuiltin("buildin_name", { BuiltinParams }) .finalizeMethod(); ``` Fixes #113513	2024-11-23 09:33:38 -08:00

1 2 3 4 5 ...

2108 Commits