llvm-project

Author	SHA1	Message	Date
Valentin Clement (バレンタインクレメン)	ffe4870472	[flang][cuda] Add interfaces for __float2int_rX and __float2unit_rX (#153691 )	2025-08-14 23:11:45 +00:00
LLVM GN Syncbot	47bc6acf86	[gn build] Port d56fa965243b	2025-08-14 22:56:30 +00:00
Valentin Clement (バレンタインクレメン)	602f308d4f	[flang][cuda] Add interface for __saturatef (#153705 )	2025-08-14 15:55:17 -07:00
Stanislav Mekhanoshin	a629119c75	[AMDGPU] Remove wave64 functions (#153690 ) gfx1250 only supports wave32.	2025-08-14 15:54:33 -07:00
Valentin Clement (バレンタインクレメン)	2775c79c4f	[flang][cuda] Add interfaces for __float2ll_rX (#153702 )	2025-08-14 15:44:52 -07:00
joaosaffran	d56fa96524	[DirectX] Add Range Overlap validation (#152229 ) As part of the Root Signature Spec, we need to validate if Root Signatures are not defining overlapping ranges. Closes: https://github.com/llvm/llvm-project/issues/126645 --------- Co-authored-by: joaosaffran <joao.saffran@microsoft.com> Co-authored-by: Joao Saffran <{ID}+{username}@users.noreply.github.com> Co-authored-by: Joao Saffran <jderezende@microsoft.com>	2025-08-14 18:40:11 -04:00
Valentin Clement (バレンタインクレメン)	ca9ddd54b7	[flang][cuda] Add interfaces for __ll2float_rX (#153694 )	2025-08-14 15:35:02 -07:00
Daniel Paoliello	fc4df2c917	[win][arm64ec] XFAIL x64 intrinsic tests on Arm64EC (#153474 ) Clang defines the x64 preprocessor macro (`__x86_64__`) when building Arm64EC, however the tests for x64 built-ins and intrinsics are currently failing since the relevant functions don't exist, resulting in errors like: ``` Line 165: invalid conversion between vector type '__v2di' (vector of 2 'long long' values) and integer type 'int' of different size ``` (Clang doesn't know the intrinsics being called, and so treats it like an undefined function, which makes it assume the return type is `int`) For now, expect these tests to fail until someone decides to implement these intrinsics.	2025-08-14 15:29:20 -07:00
Stanislav Mekhanoshin	57c1e01e48	[AMDGPU] Don't allow wgp mode on gfx1250 (#153680 ) - gfx1250 only supports cu mode	2025-08-14 15:16:56 -07:00
Andy Kaylor	a1529cd85a	[CIR] Add index support for global_view (#153254 ) The #cir.global_view attribute was initially added without support for the optional index list. This change adds index list support. This is used when the address of an array or structure member is used as an initializer. This patch does not include support for taking the address of a structure or class member. That will be added later.	2025-08-14 15:14:12 -07:00
Valentin Clement (バレンタインクレメン)	df15c0d716	[flang][cuda] Add interfaces for __dsqrt_rn and __dsqrt_rz (#153624 )	2025-08-14 22:08:33 +00:00
DeanSturtevant1	cb2f0d0a5f	[bazel] Fix mlir/BUILD.bazel for VectorToXeGPU. (#153696 )	2025-08-14 15:03:41 -07:00
Craig Topper	defbbf0129	[RISCV][MoveMerge] Don't copy kill flag when moving past an instruction that reads the register. (#153644 ) If we're moving the second copy before another instruction that reads the copied register, we need to clear the kill flag on the combined move. Fixes #153598.	2025-08-14 14:52:54 -07:00
Valentin Clement (バレンタインクレメン)	b989c7c2e0	[flang][cuda] Add interfaces for __drcp_rX (#153681 )	2025-08-14 21:44:47 +00:00
DeanSturtevant1	4e63d704e8	Fix mlir/BUILD.bazel for XeGPUUtils. (#153689 )	2025-08-14 14:32:18 -07:00
Alex Bradbury	db5f7dc374	Revert "[SLP]Support LShr as base for copyable elements" This reverts commit ca4ebf95172d24f8c47655709b2c9eb85bda5cb2. Causes compile-time crashes for some inputs with RVV zvl512b/zvl1024b configurations. See here for a minimal reproducer: https://github.com/llvm/llvm-project/pull/153393#issuecomment-3189898813	2025-08-14 22:18:24 +01:00
Valentin Clement (バレンタインクレメン)	06590444f5	[flang][cuda] Add bind names for __double2ull_rX interfaces (#153678 )	2025-08-14 21:10:20 +00:00
David Green	5836bae463	[AArch64] Change the cost of fma and fmuladd to match fmul. (#152963 ) As fmul and fmadd are so similar, their performance characteristics tend to be the same on most platforms, at least in terms of reciprocal throughputs. Processors capable of performing a given number of fmul per cycle can usually perform the same number of fma, with the extra add being relatively simple on top. This patch makes the scores of the two operations the same, which brings the throughput cost of a fma/fmuladd to 2, and the latency to 3, which are the defaults for fmul. Note that we might also want to change the throughput cost of a fmul to 1, as most processors have ample bandwidth for them, but they should still stay in-line with one another.	2025-08-14 21:53:45 +01:00
Morris Hafner	e56ae9651b	[CIR][NFC] Add Symbol Table to CIRGenFunction (#153625 ) This patchs adds a symbol table to CIRGenFunction plus scopes and insertions to the table where we were missing them previously.	2025-08-14 22:53:09 +02:00
Bill Wendling	1e9fc8edd0	[Clang][attr] Add '-std=c11' to allow for typedef redefinition	2025-08-14 13:51:58 -07:00
Zhaoxuan Jiang	76dd742f7b	[CGData] Lazy loading support for stable function map (#151660 ) The stable function map could be huge for a large application. Fully loading it is slow and consumes a significant amount of memory, which is unnecessary and drastically slows down compilation especially for non-LTO and distributed-ThinLTO setups. This patch introduces an opt-in lazy loading support for the stable function map. The detailed changes are: - `StableFunctionMap` - The map now stores entries in an `EntryStorage` struct, which includes offsets for serialized entries and a `std::once_flag` for thread-safe lazy loading. - The underlying map type is changed from `DenseMap` to `std::unordered_map` for compatibility with `std::once_flag`. - `contains()`, `size()` and `at()` are implemented to only load requested entries on demand. - Lazy Loading Mechanism - When reading indexed codegen data, if the newly-introduced `-indexed-codegen-data-lazy-loading` flag is set, the stable function map is not fully deserialized up front. The binary format for the stable function map now includes offsets and sizes to support lazy loading. - The safety of lazy loading is guarded by the once flag per function hash. This guarantees that even in a multi-threaded environment, the deserialization for a given function hash will happen exactly once. The first thread to request it performs the load, and subsequent threads will wait for it to complete before using the data. For single-threaded builds, the overhead is negligible (a single check on the once flag). For multi-threaded scenarios, users can omit the flag to retain the previous eager-loading behavior.	2025-08-14 13:49:09 -07:00
Valentin Clement (バレンタインクレメン)	bad3df4764	[flang][cuda] Add bind names for __double2ll_rX interfaces (#153660 )	2025-08-14 13:34:25 -07:00
Jonas Devlieghere	52c9489d1d	[lldb] Use the Python limited API with SWIG 4.2 or later (#153119 ) (#153472 ) Use the Python limited API when building with SWIG 4.2 or later.	2025-08-14 15:28:02 -05:00
Florian Hahn	8a0c7e9b32	[LV] Regenerate some more tests.	2025-08-14 21:21:03 +01:00
Stanislav Mekhanoshin	6b316ecb5f	[AMDGPU] Encode NV bit in VIMAGE/VSAMPLE. NFC (#153654 ) This is NFC as this target does not have it.	2025-08-14 13:19:38 -07:00
Erich Keane	e5e3e4bdb5	[OpenACC] Add firstprivate recipe helper methods to ACC dialect (#153604 ) Like we did for the 'private' clause, this adds an easier to use helper function to add the 'firstprivate' clause + recipe to the Parallel and Serial ops.	2025-08-14 13:07:59 -07:00
Bill Wendling	aa4805a090	[Clang][attr] Add 'cfi_salt' attribute (#141846 ) The 'cfi_salt' attribute specifies a string literal that is used as a "salt" for Control-Flow Integrity (CFI) checks to distinguish between functions with the same type signature. This attribute can be applied to function declarations, function definitions, and function pointer typedefs. This attribute prevents function pointers from being replaced with pointers to functions that have a compatible type, which can be a CFI bypass vector. The attribute affects type compatibility during compilation and CFI hash generation during code generation. Attribute syntax: [[clang::cfi_salt("<salt_string>")]] GNU-style syntax: __attribute__((cfi_salt("<salt_string>"))) - The attribute takes a single string of non-NULL ASCII characters. - It only applies to function types; using it on a non-function type will generate an error. - All function declarations and the function definition must include the attribute and use identical salt values. Example usage: // Header file: #define __cfi_salt(S) __attribute__((cfi_salt(S))) // Convenient typedefs to avoid nested declarator syntax. typedef int (fp_unsalted_t)(void); typedef int (fp_salted_t)(void) __cfi_salt("pepper"); struct widget_ops { fp_unsalted_t init; // Regular CFI. fp_salted_t exec; // Salted CFI. fp_unsalted_t teardown; // Regular CFI. }; // bar.c file: static int bar_init(void) { ... } static int bar_salted_exec(void) __cfi_salt("pepper") { ... } static int bar_teardown(void) { ... } static struct widget_generator _generator = { .init = bar_init, .exec = bar_salted_exec, .teardown = bar_teardown, }; struct widget_generator *widget_gen = _generator; // 2nd .c file: int generate_a_widget(void) { int ret; // Called with non-salted CFI. ret = widget_gen.init(); if (ret) return ret; // Called with salted CFI. ret = widget_gen.exec(); if (ret) return ret; // Called with non-salted CFI. return widget_gen.teardown(); } Link: https://github.com/ClangBuiltLinux/linux/issues/1736 Link: https://github.com/KSPP/linux/issues/365 --------- Signed-off-by: Bill Wendling <morbo@google.com> Co-authored-by: Aaron Ballman <aaron@aaronballman.com>	2025-08-14 13:07:38 -07:00
CatherineMoore	5479b7ed42	[OpenMP] Update printf stmt in kmp_settings.cpp (#152800 ) Remove extraneous argument from printf statement --------- Co-authored-by: Joachim <protze@rz.rwth-aachen.de>	2025-08-14 20:04:03 +00:00
Stanislav Mekhanoshin	49f2093477	[AMDGPU] Increase LDS to 320K on gfx1250 (#153645 )	2025-08-14 12:52:00 -07:00
Michael Berg	334a046a3c	[LoopDist] Consider reads and writes together for runtime checks (#145623 ) Emit safety guards for ptr accesses when cross partition loads exist which have a corresponding store to the same address in a different partition. This will emit the necessary ptr checks for these accesses. The test case was obtained from SuperTest, which SiFive runs regularly. We enabled LoopDistribution by default in our downstream compiler, this change was part of that enablement.	2025-08-14 12:50:17 -07:00
Matheus Izvekov	eeada0d30f	[clang] fix source range computation for DeducedTemplateSpecializationType (#153646 ) This was a regression introduced in https://github.com/llvm/llvm-project/pull/147835 Since this regression was never released, there are no release notes. Fixes https://github.com/llvm/llvm-project/issues/153540	2025-08-14 16:42:34 -03:00
Mircea Trofin	a508ea2ad7	Add dependency on `ProfileData` from ScalarOpts (#153651 ) Fixing buildbot failures after PR #153305, e.g. https://lab.llvm.org/buildbot/#/builders/203/builds/19861 Analysis already depends on `ProfileData`, so the transitive closure of the dependencies of `ScalarOpts` doesn't change. Also avoided an extra dependency (and very unnecessary) on `Instrumentation`. The API previously used doesn't need to live in Instrumentation to begin with, but that's something to address in a follow-up.	2025-08-14 12:37:17 -07:00
Abhinav Gaba	2912c9c249	[NFC][Offload] Add missing maps to OpenMP offloading tests. (#153103 ) A few tests were only mapping a pointee, like: `map(pp[0][0])`, on an `int** pp`, but expecting the pointers, like `pp`, `pp[0]` to also be mapped, which is incorrect. This change fixes six such tests.	2025-08-14 12:22:28 -07:00
Erick Velez	4f007041a8	[clang-doc] place HTML/JSON output inside their own directories (#150655 ) Instead of just outputting everything into the designated root folder, HTML and JSON output will be placed in html/ and json/ directories.	2025-08-14 12:21:40 -07:00
Kaitlin Peng	cbfc22c06b	Fix typo in `step` intrinsic comment (#153642 ) `y` should be the first argument and `x` should be the second, otherwise the formula is wrong. This also matches the documentation [here](https://learn.microsoft.com/en-us/windows/win32/direct3dhlsl/dx-graphics-hlsl-step).	2025-08-14 12:02:34 -07:00
Mircea Trofin	016c301d30	[NFC] Use `[[maybe_unused]]` for variable used in assertion (#153639 )	2025-08-14 18:52:56 +00:00
Jonas Devlieghere	b62b65a95f	[lldb] Use (only) PyImport_AppendInittab to patch readline (#153329 ) The current implementation tries to (1) patch the existing readline module definition if it's already present in the inittab and (2) append our patched readline module to the inittab. The former (1) uses the non-stable Python API and I can't find a situation where this is necessary. We do this work before initialization, so for the readline module to exist, it either needs to be added by Python itself (which doesn't seem to be the case), or someone would have had to have added it without initializing.	2025-08-14 13:47:48 -05:00
Valentin Clement (バレンタインクレメン)	20a829937c	[flang][cuda] Add interfaces for __expf and __exp10f (#153633 )	2025-08-14 11:36:55 -07:00
Florian Hahn	db98ac43ec	[LV] Use shl for ((VF * Step) * vscale) in createStepForVF. (#153495 ) Directly emit shl instead of a multiply if VF * Step is a power-of-2. The main motivation here is to prepare the code and test for directly generating and expanding a SCEV expression of the minimum iteration count. SCEVExpander will directly emit shl for multiplies with powers-of-2. InstCombine will also performs this combine, so end-to-end this should effectively by NFC. PR: https://github.com/llvm/llvm-project/pull/153495	2025-08-14 19:27:51 +01:00
Jianhui Li	98728d9dc8	[MLIR][XeGPU] Add lowering from transfer_read/transfer_write to load_gather/store_scatter (#152429 ) Lowering transfer_read/transfer_write to load_gather/store_scatter in case the target uArch doesn't support load_nd/store_nd. The high level steps: 1. compute Strides; 2. compute Offsets; 3. collapseMemrefTo1D; 4. create Load gather or store_scatter op	2025-08-14 11:27:07 -07:00
Thurston Dang	37cc010b91	[asan] Fix-forward undefined type in test from #153142 (#153636 ) Fix Mac build breakage (reported by aeubanks in https://github.com/llvm/llvm-project/pull/153142#issuecomment-3189202274) by including stdint.h and using uintptr_t	2025-08-14 11:20:11 -07:00
Mircea Trofin	f5d284309f	[JTS] Propagate profile info (#153305 ) If the indirect call target being recognized as a jump table has profile info, we can accurately synthesize the branch weights of the switch that replaces the indirect call. Otherwise we insert the "unknown" `MD_prof` to indicate this is the best we can do here. Part of Issue #147390	2025-08-14 11:17:57 -07:00
Min-Yih Hsu	c202d2f515	[IA][RISCV] Recognizing gap masks assembled from bitwise AND (#153324 ) For a deinterleaved masked.load / vp.load, if it's mask, `%c`, is synthesized by the following snippet: ``` %m = shufflevector %s, poison, <0, 0, 0, 1, 1, 1, 2, 2, 2, 3, 3, 3> %g = <1, 1, 0, 1, 1, 0, 1, 1, 0, 1, 1, 0> %c = and %m, %g ``` Then we can know that `%g` is the gap mask and `%s` is the mask for each field / component. This patch teaches InterleaveAccess pass to recognize such patterns	2025-08-14 11:17:50 -07:00
Florian Hahn	ff0ce74be8	[VPlan] Replace scalar preheader with VPIRBB at single place (NFC). Replace the scalar preheader VPBB with an VPIRBB wrapping the IR basic block created by createVectorizedLoopSkeleton.	2025-08-14 19:11:34 +01:00
Valentin Clement (バレンタインクレメン)	e27e4f3a99	[flang][cuda] Add interfaces for __uint2float_rX functions (#153620 ) Also add bind name for __uint2double_rn	2025-08-14 18:05:37 +00:00
Iris Shi	dc0becc4d0	[CIR] Add InlineAsmOp lowering to LLVM (#153387 ) - Part of #153267 Added support for lowering `InlineAsmOp` directly to LLVM IR --------- Co-authored-by: Morris Hafner <mhafner@nvidia.com>	2025-08-14 17:48:14 +00:00
Jonas Devlieghere	ac0ad5093a	[lldb] Use PyThread_get_thread_ident instead of accessing PyThreadState (#153460 ) Use `PyThread_get_thread_ident`, which is part of the Stable API, instead of accessing a member of the PyThreadState, which is opaque when using the Stable API.	2025-08-14 12:41:49 -05:00
Leandro Lupori	91418ecbde	Revert "[lldb] refactor PlatformAndroid and make threadsafe" (#153626 ) Reverts llvm/llvm-project#145382 This broke a couple of buildbots.	2025-08-14 14:36:50 -03:00
Iris Shi	9a28783f5d	[CIR] Add `InlineAsmOp` (#153362 ) - Part of #153267 --------- Co-authored-by: Andy Kaylor <akaylor@nvidia.com> Co-authored-by: Morris Hafner <mmha@users.noreply.github.com>	2025-08-14 17:34:38 +00:00
Valentin Clement (バレンタインクレメン)	efce767a88	[flang][cuda] Add interfaces for __ull2float_rX functions (#153613 )	2025-08-14 10:28:17 -07:00

1 2 3 4 5 ...

548631 Commits