llvm-project

Author	SHA1	Message	Date
Krzysztof Parzyszek	9f1679190e	[flang][OpenMP] Update GetOmpObjectList, move to parser utils (#154389 ) `GetOmpObjectList` takes a clause, and returns the pointer to the contained OmpObjectList, or nullptr if the clause does not contain one. Some clauses with object list were not recognized: handle all clauses, and move the implementation to flang/Parser/openmp-utils.cpp.	2025-08-20 12:41:26 -05:00
Yitzhak Mandelbaum	2be52f309e	[clang][dataflow] Fix uninitialized memory bug. (#154575 ) Commit #3ecfc03 introduced a bug involving an uninitialized field in `exportLogicalContext`. This patch initializes the field properly.	2025-08-20 13:36:42 -04:00
Akash Banerjee	d69ccded4f	[MLIR] Add cpow support in ComplexToROCDLLibraryCalls (#153183 ) This PR adds support for complex power operations (`cpow`) in the `ComplexToROCDLLibraryCalls` conversion pass, specifically targeting AMDGPU architectures. The implementation optimises complex exponentiation by using mathematical identities and special-case handling for small integer powers. - Force lowering to `complex.pow` operations for the `amdgcn-amd-amdhsa` target instead of using library calls - Convert `complex.pow(z, w)` to `complex.exp(w * complex.log(z))` using mathematical identity	2025-08-20 17:18:30 +00:00
Kazu Hirata	65de318d18	[Sema] Fix a warning This patch fixes: clang/lib/Sema/SemaTemplateVariadic.cpp:1069:22: error: variable 'TST' set but not used [-Werror,-Wunused-but-set-variable]	2025-08-20 09:56:27 -07:00
David Tenty	63195d3d7a	[NFC][CMake] quote ${CMAKE_SYSTEM_NAME} consistently (#154537 ) A CMake change included in CMake 4.0 makes `AIX` into a variable (similar to `APPLE`, etc.) `ff03db6657` However, `${CMAKE_SYSTEM_NAME}` unfortunately also expands exactly to `AIX` and `if` auto-expands variable names in CMake. That means you get a double expansion if you write: `if (${CMAKE_SYSTEM_NAME} MATCHES "AIX")` which becomes: `if (AIX MATCHES "AIX")` which is as if you wrote: `if (ON MATCHES "AIX")` You can prevent this by quoting the expansion of "${CMAKE_SYSTEM_NAME}", due to policy [CMP0054](https://cmake.org/cmake/help/latest/policy/CMP0054.html#policy:CMP0054) which is on by default in 4.0+. Most of the LLVM CMake already does this, but this PR fixes the remaining cases where we do not.	2025-08-20 12:45:41 -04:00
schittir	fdfcebb38d	[clang][SYCL] Add sycl_external attribute and restrict emitting device code (#140282 ) This patch is part of the upstreaming effort for supporting SYCL language front end. It makes the following changes: 1. Adds sycl_external attribute for functions with external linkage, which is intended for use to implement the SYCL_EXTERNAL macro as specified by the SYCL 2020 specification 2. Adds checks to avoid emitting device code when sycl_external and sycl_kernel_entry_point attributes are not enabled 3. Fixes test failures caused by the above changes This patch is missing diagnostics for the following diagnostics listed in the SYCL 2020 specification's section 5.10.1, which will be addressed in a subsequent PR: Functions that are declared using SYCL_EXTERNAL have the following additional restrictions beyond those imposed on other device functions: 1. If the SYCL backend does not support the generic address space then the function cannot use raw pointers as parameter or return types. Explicit pointer classes must be used instead; 2. The function cannot call group::parallel_for_work_item; 3. The function cannot be called from a parallel_for_work_group scope. In addition to that, the subsequent PR will also implement diagnostics for inline functions including virtual functions defined as inline. --------- Co-authored-by: Mariya Podchishchaeva <mariya.podchishchaeva@intel.com>	2025-08-20 12:37:37 -04:00
Ilya Biryukov	85043c1c14	[Clang] Add a builtin that deduplicate types into a pack (#106730 ) The new builtin `__builtin_dedup_pack` removes duplicates from list of types. The added builtin is special in that they produce an unexpanded pack in the spirit of P3115R0 proposal. Produced packs can be used directly in template argument lists and get immediately expanded as soon as results of the computation are available. It allows to easily combine them, e.g.: ```cpp template <class ...T> struct Normalize { // Note: sort is not included in this PR, it illustrates the idea. using result = std::tuple< __builtin_sort_pack< __builtin_dedup_pack<int, double, T...>... >...>; } ; ``` Limitations: - only supported in template arguments and bases, - can only be used inside the templates, even if non-dependent, - the builtins cannot be assigned to template template parameters. The actual implementation proceeds as follows: - When the compiler encounters a `__builtin_dedup_pack` or other type-producing builtin with dependent arguments, it creates a dependent `TemplateSpecializationType`. - During substitution, if the template arguments are non-dependent, we will produce: a new type `SubstBuiltinTemplatePackType`, which stores an argument pack that needs to be substituted. This type is similar to the existing `SubstTemplateParmPack` in that it carries the argument pack that needs to be expanded further. The relevant code is shared. - On top of that, Clang also wraps the resulting type into `TemplateSpecializationType`, but this time only as a sugar. - To actually expand those packs, we collect the produced `SubstBuiltinTemplatePackType` inside `CollectUnexpandedPacks`. Because we know the size of the produces packs only after the initial substitution, places that do the actual expansion will need to have a second run over the substituted type to finalize the expansions (in this patch we only support this for template arguments, see `ExpandTemplateArgument`). If the expansion are requested in the places we do not currently support, we will produce an error. More follow-up work will be needed to fully shape this: - adding the builtin that sorts types, - remove the restrictions for expansions, - implementing P3115R0 (scheduled for C++29, see https://github.com/cplusplus/papers/issues/2300).	2025-08-20 18:11:36 +02:00
Craig Topper	2b7b8bdc16	[X86] Accept the canonical form of a sign bit test in MatchVectorAllEqualTest. (#154421 ) This function tries to look for (seteq (and (reduce_or), mask), 0). If the mask is a sign bit, InstCombine will have turned it into (setgt (reduce_or), -1). We should handle that case too. I'm looking into adding the same canonicalization to SimplifySetCC and this change is needed to prevent test regressions.	2025-08-20 09:09:55 -07:00
Craig Topper	562e021103	[RISCV] Minor refactor of RISCVMoveMerge::mergePairedInsns. (#154467 ) Fold the ARegInFirstPair into the later if/else with the same condition. Use std::swap so we don't need to repeat operands in the opposite order.	2025-08-20 09:07:58 -07:00
Matthias Springer	6a285cc8e6	[mlir][IR] Fix `Block::without_terminator` for blocks without terminator (#154498 ) Blocks without a terminator are not handled correctly by `Block::without_terminator`: the last operation is excluded, even when it is not a terminator. With this commit, only terminators are excluded. If the last operation is unregistered, it is included for safety.	2025-08-20 18:02:24 +02:00
Kazu Hirata	f487c0e63c	[AST] Fix warnings This patch fixes: clang/lib/AST/ByteCode/InterpBuiltin.cpp:1827:21: error: unused variable 'ASTCtx' [-Werror,-Wunused-variable] clang/lib/AST/ByteCode/InterpBuiltin.cpp:2724:18: error: unused variable 'Arg2Type' [-Werror,-Wunused-variable] clang/lib/AST/ByteCode/InterpBuiltin.cpp:2725:18: error: unused variable 'Arg3Type' [-Werror,-Wunused-variable] clang/lib/AST/ByteCode/InterpBuiltin.cpp:2748:18: error: unused variable 'ElemT' [-Werror,-Wunused-variable]	2025-08-20 08:58:59 -07:00
Yanzuo Liu	a6da68ed36	[Clang][ASTMatchers] Make `hasConditionVariableStatement` support `for` loop, `while` loop and `switch` statements (#154298 ) Co-authored-by: Aaron Ballman <aaron@aaronballman.com>	2025-08-20 23:57:30 +08:00
Krzysztof Parzyszek	15cb06109d	[Frontend][OpenMP] Allow multiple occurrences of DYN_GROUPPRIVATE (#154549 ) It was mistakenly placed in "allowOnceClauses" on the constructs that allow it.	2025-08-20 10:56:30 -05:00
Samira Bakon	ffe21f1cd7	[clang][dataflow] Transfer more cast expressions. (#153066 ) Transfer all casts by kind as we currently do implicit casts. This obviates the need for specific handling of static casts. Also transfer CK_BaseToDerived and CK_DerivedToBase and add tests for these and missing tests for already-handled cast types. Ensure that CK_BaseToDerived casts result in modeling of the fields of the derived class.	2025-08-20 11:40:06 -04:00
Mirko Brkušanin	80f3b376b3	[AMDGPU][GlobalISel] Combine for breaking s64 and/or into two s32 insts (#151731 ) When either one of the operands is all ones in high or low parts, splitting these opens up other opportunities for combines. One of two new instructions will either be removed or become a simple copy.	2025-08-20 17:32:29 +02:00
Steven Wu	2cfba9678d	[FileSystem] Allow exclusive file lock (#114098 ) Add parameter to file lock API to allow exclusive file lock. Both Unix and Windows support lock the file exclusively for write for one process and LLVM OnDiskCAS uses exclusive file lock to coordinate CAS creation.	2025-08-20 08:32:18 -07:00
Matthias Springer	0499d3a8cf	[mlir][Interfaces] Add `hasUnknownEffects` helper function (#154523 ) I have seen misuse of the `hasEffect` API in downstream projects: users sometimes think that `hasEffect == false` indicates that the operation does not have a certain memory effect. That's not necessarily the case. When the op does not implement the `MemoryEffectsOpInterface`, it is unknown whether it has the specified effect. "false" can also mean "maybe". This commit clarifies the semantics in the documentation. Also adds `hasUnknownEffects` and `mightHaveEffect` convenience functions. Also simplifies a few call sites.	2025-08-20 15:24:53 +00:00
Louis Dionne	2e2349e4e9	[libc++] Add internal checks for some basic_streambuf invariants (#144602 ) These invariants are always expected to hold, however it's not always clear that they do. Adding explicit checks for these invariants inside non-trivial functions of basic_streambuf makes that clear.	2025-08-20 11:09:39 -04:00
Florian Hahn	35be64a416	[VPlan] Factor out logic to common compute costs to helper (NFCI). (#153361 ) A number of recipes compute costs for the same opcodes for scalars or vectors, depending on the recipe. Move the common logic out to a helper in VPRecipeWithIRFlags, that is then used by VPReplicateRecipe, VPWidenRecipe and VPInstruction. This makes it easier to cover all relevant opcodes, without duplication. PR: https://github.com/llvm/llvm-project/pull/153361	2025-08-20 16:05:20 +01:00
Jordan Rupprecht	f1458ec623	[bazel] Port #153504 : add llvm-offload-wrapper (#154553 )	2025-08-20 15:01:46 +00:00
Timothy Choi	d282452e4c	[libc++] Avoid string reallocation in `std::filesystem::path::lexically_relative` (#152964 ) Improves runtime by around 20 to 40%. (1.3x to 1.7x) ``` Benchmark Time CPU Time Old Time New CPU Old CPU New ------------------------------------------------------------------------------------------------------------------------------------------------ BM_LexicallyRelative/small_path/2 -0.2111 -0.2082 229 181 228 180 BM_LexicallyRelative/small_path/4 -0.2579 -0.2550 455 338 452 337 BM_LexicallyRelative/small_path/8 -0.2643 -0.2616 844 621 838 619 BM_LexicallyRelative/small_path/16 -0.2582 -0.2556 1562 1158 1551 1155 BM_LexicallyRelative/small_path/32 -0.2518 -0.2496 3023 2262 3004 2254 BM_LexicallyRelative/small_path/64 -0.2806 -0.2775 6344 4564 6295 4549 BM_LexicallyRelative/small_path/128 -0.2165 -0.2137 11762 9216 11683 9186 BM_LexicallyRelative/small_path/256 -0.2672 -0.2645 24499 17953 24324 17891 BM_LexicallyRelative/large_path/2 -0.3268 -0.3236 426 287 422 285 BM_LexicallyRelative/large_path/4 -0.3274 -0.3248 734 494 729 492 BM_LexicallyRelative/large_path/8 -0.3586 -0.3560 1409 904 1399 901 BM_LexicallyRelative/large_path/16 -0.3978 -0.3951 2764 1665 2743 1659 BM_LexicallyRelative/large_path/32 -0.3934 -0.3908 5323 3229 5283 3218 BM_LexicallyRelative/large_path/64 -0.3629 -0.3605 10340 6587 10265 6564 BM_LexicallyRelative/large_path/128 -0.3450 -0.3423 19379 12694 19233 12649 BM_LexicallyRelative/large_path/256 -0.3097 -0.3054 36293 25052 35943 24965 ``` --------- Co-authored-by: Nikolas Klauser <nikolasklauser@berlin.de>	2025-08-20 16:58:21 +02:00
erichkeane	30fcf69845	[OpenACC] Fixup rules for reduction clause variable refererence type The standard is ambiguous, but we can only support arrays/array-sections/etc of the composite type, so make sure we enforce the rule that way. This will better support how we need to do lowering.	2025-08-20 07:55:16 -07:00
Janek van Oirschot	40e1510146	[AMDGPU][NFC] Enable gfx942 for more tests (#154363 ) Enable gfx942 for tests that are affected by the an AMDGPU bitcast constant combine (#154115) Expecting to see more tests affected in aforementioned PR after rebase on top of this PR	2025-08-20 15:46:26 +01:00
Brox Chen	c50ed05cad	[AMDGPU][True16][CodeGen] use vgpr16 for zext patterns (reopen #153894 ) (#154211 ) recreate this patch from https://github.com/llvm/llvm-project/pull/153894 It seems ISel sliently ignore the `i64 = zext i16` with a chained `reg_sequence` pattern and thus this is causing a selection failure in hip test. Recreate a new patch with an alternative pattern, and added a ll test global-extload-gfx11plus.ll	2025-08-20 10:26:49 -04:00
halbi2	2f237670b1	[Clang] [Sema] Enable nodiscard warnings for function pointers (#154250 ) A call through a function pointer has no associated FunctionDecl, but it still might have a nodiscard return type. Ensure there is a codepath to emit the nodiscard warning in this case. Fixes #142453	2025-08-20 14:14:35 +00:00
Anutosh Bhat	ea634fef56	[clang-repl] Fix InstantiateTemplate & Value test while building against emscripten (#154513 ) Building with assertions flag (-sAssertions=2) gives me these ``` [ RUN ] InterpreterTest.InstantiateTemplate Aborted(Assertion failed: undefined symbol '__clang_Interpreter_SetValueWithAlloc'. perhaps a side module was not linked in? if this global was expected to arrive from a system library, try to build the MAIN_MODULE with EMCC_FORCE_STDLIBS=1 in the environment) Error in loading dynamic library incr_module_3.wasm: RuntimeError: Aborted(Assertion failed: undefined symbol '__clang_Interpreter_SetValueWithAlloc'. perhaps a side module was not linked in? if this global was expected to arrive from a system library, try to build the MAIN_MODULE with EMCC_FORCE_STDLIBS=1 in the environment) Could not load dynamic lib: incr_module_3.wasm RuntimeError: Aborted(Assertion failed: undefined symbol '__clang_Interpreter_SetValueWithAlloc'. perhaps a side module was not linked in? if this global was expected to arrive from a system library, try to build the MAIN_MODULE with EMCC_FORCE_STDLIBS=1 in the environment) [ RUN ] InterpreterTest.InstantiateTemplate Aborted(Assertion failed: undefined symbol '__clang_Interpreter_SetValueNoAlloc'. perhaps a side module was not linked in? if this global was expected to arrive from a system library, try to build the MAIN_MODULE with EMCC_FORCE_STDLIBS=1 in the environment) Error in loading dynamic library incr_module_3.wasm: RuntimeError: Aborted(Assertion failed: undefined symbol '__clang_Interpreter_SetValueNoAlloc'. perhaps a side module was not linked in? if this global was expected to arrive from a system library, try to build the MAIN_MODULE with EMCC_FORCE_STDLIBS=1 in the environment) Could not load dynamic lib: incr_module_3.wasm RuntimeError: Aborted(Assertion failed: undefined symbol '__clang_Interpreter_SetValueNoAlloc'. perhaps a side module was not linked in? if this global was expected to arrive from a system library, try to build the MAIN_MODULE with EMCC_FORCE_STDLIBS=1 in the environment) [ RUN ] InterpreterTest.InstantiateTemplate Aborted(Assertion failed: undefined symbol '_ZnwmPv26__clang_Interpreter_NewTag'. perhaps a side module was not linked in? if this global was expected to arrive from a system library, try to build the MAIN_MODULE with EMCC_FORCE_STDLIBS=1 in the environment) Error in loading dynamic library incr_module_23.wasm: RuntimeError: Aborted(Assertion failed: undefined symbol '_ZnwmPv26__clang_Interpreter_NewTag'. perhaps a side module was not linked in? if this global was expected to arrive from a system library, try to build the MAIN_MODULE with EMCC_FORCE_STDLIBS=1 in the environment) Could not load dynamic lib: incr_module_23.wasm RuntimeError: Aborted(Assertion failed: undefined symbol '_ZnwmPv26__clang_Interpreter_NewTag'. perhaps a side module was not linked in? if this global was expected to arrive from a system library, try to build the MAIN_MODULE with EMCC_FORCE_STDLIBS=1 in the environment) [ RUN ] InterpreterTest.Value Aborted(Assertion failed: undefined symbol '_Z9getGlobalv'. perhaps a side module was not linked in? if this global was expected to arrive from a system library, try to build the MAIN_MODULE with EMCC_FORCE_STDLIBS=1 in the environment) Error in loading dynamic library incr_module_36.wasm: RuntimeError: Aborted(Assertion failed: undefined symbol '_Z9getGlobalv'. perhaps a side module was not linked in? if this global was expected to arrive from a system library, try to build the MAIN_MODULE with EMCC_FORCE_STDLIBS=1 in the environment) Could not load dynamic lib: incr_module_36.wasm [ RUN ] InterpreterTest.Value Aborted(Assertion failed: undefined symbol '_Z9getGlobalv'. perhaps a side module was not linked in? if this global was expected to arrive from a system library, try to build the MAIN_MODULE with EMCC_FORCE_STDLIBS=1 in the environment) Error in loading dynamic library incr_module_36.wasm: RuntimeError: Aborted(Assertion failed: undefined symbol '_Z9setGlobali'. perhaps a side module was not linked in? if this global was expected to arrive from a system library, try to build the MAIN_MODULE with EMCC_FORCE_STDLIBS=1 in the environment) Could not load dynamic lib: incr_module_36.wasm ``` So we have some symbols missing here that are needed by the side modules being created here. First 2 are needed by both tests Last 3 are needed for these lines accordingly in the Value test. `dc23869f98/clang/unittests/Interpreter/InterpreterTest.cpp (L355)` `dc23869f98/clang/unittests/Interpreter/InterpreterTest.cpp (L364)` `dc23869f98/clang/unittests/Interpreter/InterpreterTest.cpp (L365)` Everything should work as expected after this ``` [----------] 9 tests from InterpreterTest [ RUN ] InterpreterTest.Sanity [ OK ] InterpreterTest.Sanity (18 ms) [ RUN ] InterpreterTest.IncrementalInputTopLevelDecls [ OK ] InterpreterTest.IncrementalInputTopLevelDecls (45 ms) [ RUN ] InterpreterTest.Errors [ OK ] InterpreterTest.Errors (29 ms) [ RUN ] InterpreterTest.DeclsAndStatements [ OK ] InterpreterTest.DeclsAndStatements (34 ms) [ RUN ] InterpreterTest.UndoCommand /Users/anutosh491/work/llvm-project/clang/unittests/Interpreter/InterpreterTest.cpp:156: Skipped Test fails for Emscipten builds [ SKIPPED ] InterpreterTest.UndoCommand (0 ms) [ RUN ] InterpreterTest.FindMangledNameSymbol [ OK ] InterpreterTest.FindMangledNameSymbol (85 ms) [ RUN ] InterpreterTest.InstantiateTemplate [ OK ] InterpreterTest.InstantiateTemplate (127 ms) [ RUN ] InterpreterTest.Value [ OK ] InterpreterTest.Value (608 ms) [ RUN ] InterpreterTest.TranslationUnit_CanonicalDecl [ OK ] InterpreterTest.TranslationUnit_CanonicalDecl (64 ms) [----------] 9 tests from InterpreterTest (1014 ms total) ``` This is similar to how we need to take care of some symbols while building side modules during running cppinterop's test suite !	2025-08-20 14:14:19 +00:00
Mehdi Amini	6cedf6e604	[MLIR] Add missing handling for LLVM_LIT_TOOLS_DIR in mlir lit config (NFC) (#154542 ) This is helping some windows users, here is the doc: LLVM_LIT_TOOLS_DIR:PATH The path to GnuWin32 tools for tests. Valid on Windows host. Defaults to the empty string, in which case lit will look for tools needed for tests (e.g. ``grep``, ``sort``, etc.) in your ``%PATH%``. If GnuWin32 is not in your ``%PATH%``, then you can set this variable to the GnuWin32 directory so that lit can find tools needed for tests in that directory.	2025-08-20 16:05:44 +02:00
David Sherwood	e172110d12	[LV] Don't calculate scalar costs for scalable VFs in setVectorizedCallDecision (#152713 ) In setVectorizedCallDecision we attempt to calculate the scalar costs for vectorisation calls, even for scalable VFs where we already know the answer is Invalid. We can avoid doing unnecessary work by skipping this completely for scalable vectors.	2025-08-20 15:00:31 +01:00
Matt Arsenault	694a488708	AMDGPU: Add pseudoinstruction for 64-bit agpr or vgpr constants (#154499 ) 64-bit version of 7425af4b7aaa31da10bd1bc7996d3bb212c79d88. We still need to lower to 32-bit v_accagpr_write_b32s, so this has a unique value restriction that requires both halves of the constant to be 32-bit inline immediates. This only introduces the new pseudo definitions, but doesn't try to use them yet.	2025-08-20 22:54:37 +09:00
Chaitanya Koparkar	f649605bcf	[clang] Enable constexpr handling for __builtin_elementwise_fma (#152919 ) Fixes https://github.com/llvm/llvm-project/issues/152455.	2025-08-20 14:51:40 +01:00
Sirui Mu	318b0dda7c	[CIR] Add atomic load and store operations (#153814 ) This patch adds support for atomic loads and stores. Specifically, it adds support for the following intrinsic calls: - `__atomic_load` and `__atomic_store`; - `__c11_atomic_load` and `__c11_atomic_store`.	2025-08-20 21:49:04 +08:00
Ryotaro Kasuga	2330fd2f73	[LoopPeel] Add new option to peeling loops to convert PHI into IV (#121104 ) LoopPeel currently considers PHI nodes that become loop invariants through peeling. However, in some cases, peeling transforms PHI nodes into induction variables (IVs), potentially enabling further optimizations such as loop vectorization. For example: ```c // TSVC s292 int im = N-1; for (int i=0; i<N; i++) { a[i] = b[i] + b[im]; im = i; } ``` In this case, peeling one iteration converts `im` into an IV, allowing it to be handled by the loop vectorizer. This patch adds a new feature to peel loops when to convert PHIs into IVs. At the moment this feature is disabled by default. Enabling it allows to vectorize the above example. I have measured on neoverse-v2 and observed a speedup of more than 60% (options: `-O3 -ffast-math -mcpu=neoverse-v2 -mllvm -enable-peeling-for-iv`). This PR is taken over from #94900 Related #81851	2025-08-20 13:44:56 +00:00
Mehdi Amini	8b2028ced6	Update log_level for LLVM_DEBUG and associated macros (#154525 ) During the review of #150855 we switched from 0 to 1 for the default log level used, but this macro wasn't updated.	2025-08-20 13:31:13 +00:00
Nikita Popov	99119a5a81	[OpenMPIRBuilder] Add missing LLVM_ABI annotations	2025-08-20 15:19:08 +02:00
Nikita Popov	822496db7f	[SampleContextTracker] Add missing LLVM_ABI annotations	2025-08-20 15:19:08 +02:00
Nikita Popov	7eb5031e2c	[GlobalDCE] Add missing LLVM_ABI annotation	2025-08-20 15:19:08 +02:00
Nikita Popov	6a99ad2975	[Debug] Add missing LLVM_ABI annotations	2025-08-20 15:19:08 +02:00
Harrison Hao	23a5a7bef3	[AMDGPU] Support merging 16-bit and 8-bit TBUFFER load/store instruction (#145078 ) SILoadStoreOptimizer can now recognise consecutive 16-bit and 8-bit `TBUFFER_LOAD`/`TBUFFER_STORE` instructions that each write * a single component (`X`), or * two components (`XY`), and fold them into the wider native variants: ``` X + X --> XY X + X + X + X --> XYZW XY + XY --> XYZW X + X + X --> XYZ XY + X --> XYZ ``` The optimisation cuts the number of TBUFFER instructions, shrinking code size and improving memory throughput.	2025-08-20 21:16:25 +08:00
Zhaoxuan Jiang	2738828c0e	[Reland] [CGData] Lazy loading support for stable function map (#154491 ) This is an attempt to reland #151660 by including a missing STL header found by a buildbot failure. The stable function map could be huge for a large application. Fully loading it is slow and consumes a significant amount of memory, which is unnecessary and drastically slows down compilation especially for non-LTO and distributed-ThinLTO setups. This patch introduces an opt-in lazy loading support for the stable function map. The detailed changes are: - `StableFunctionMap` - The map now stores entries in an `EntryStorage` struct, which includes offsets for serialized entries and a `std::once_flag` for thread-safe lazy loading. - The underlying map type is changed from `DenseMap` to `std::unordered_map` for compatibility with `std::once_flag`. - `contains()`, `size()` and `at()` are implemented to only load requested entries on demand. - Lazy Loading Mechanism - When reading indexed codegen data, if the newly-introduced `-indexed-codegen-data-lazy-loading` flag is set, the stable function map is not fully deserialized up front. The binary format for the stable function map now includes offsets and sizes to support lazy loading. - The safety of lazy loading is guarded by the once flag per function hash. This guarantees that even in a multi-threaded environment, the deserialization for a given function hash will happen exactly once. The first thread to request it performs the load, and subsequent threads will wait for it to complete before using the data. For single-threaded builds, the overhead is negligible (a single check on the once flag). For multi-threaded scenarios, users can omit the flag to retain the previous eager-loading behavior.	2025-08-20 06:15:04 -07:00
Charles Zablit	c56bb124e3	[lldb] make lit use the same PYTHONHOME for building and running the API tests (#154396 ) When testing LLDB, we want to make sure to use the same Python as the one we used to build it. We already did this in https://github.com/llvm/llvm-project/pull/143183 for the Unit and Shell tests. This patch does the same thing for the API tests as well.	2025-08-20 14:10:50 +01:00
Benjamin Maxwell	478b4b012f	[AArch64][SME] Rework VG CFI information for streaming-mode changes (#152283 ) This patch reworks how VG is handled around streaming mode changes. Previously, for functions with streaming mode changes, we would: - Save the incoming VG in the prologue - Emit `.cfi_offset vg, <offset>` and `.cfi_restore vg` around streaming mode changes Additionally, for locally streaming functions, we would: - Also save the streaming VG in the prologue - Emit `.cfi_offset vg, <incoming VG offset>` in the prologue - Emit `.cfi_offset vg, <streaming VG offset>` and `.cfi_restore vg` around streaming mode changes In both cases, this ends up doing more than necessary and would be hard for an unwinder to parse, as using `.cfi_offset` in this way does not follow the semantics of the underlying DWARF CFI opcodes. So the new scheme in this patch is to: In functions with streaming mode changes (inc locally streaming) - Save the incoming VG in the prologue - Emit `.cfi_offset vg, <offset>` in the prologue (not at streaming mode changes) - Emit `.cfi_restore vg` after the saved VG has been deallocated - This will be in the function epilogue, where VG is always the same as the entry VG - Explicitly reference the incoming VG expressions for SVE callee-saves in functions with streaming mode changes - Ensure the CFA is not described in terms of VG in functions with streaming mode changes A more in-depth discussion of this scheme is available in: https://gist.github.com/MacDue/b7a5c45d131d2440858165bfc903e97b But the TLDR is that following this scheme, SME unwinding can be implemented with minimal changes to existing unwinders. All unwinders need to do is initialize VG to `CNTD` at the start of unwinding, then everything else is handled by standard opcodes (which don't need changes to handle VG).	2025-08-20 14:06:12 +01:00
Hank	c075fb8c37	[MLIR] Fix duplicated attribute nodes in MLIR bytecode deserialization (#151267 ) Fixes #150163 MLIR bytecode does not preserve alias definitions, so each attribute encountered during deserialization is treated as a new one. This can generate duplicate `DISubprogram` nodes during deserialization. The patch adds a `StringMap` cache that records attributes and fetches them when encountered again.	2025-08-20 13:03:26 +00:00
Qihan Cai	5f0515debd	[RISCV] Support Remaining P Extension Instructions for RV32/64 (#150379 ) This patch implements pages 15-17 from jhauser.us/RISCV/ext-P/RVP-instrEncodings-015.pdf Documentation: jhauser.us/RISCV/ext-P/RVP-baseInstrs-014.pdf jhauser.us/RISCV/ext-P/RVP-instrEncodings-015.pdf	2025-08-20 22:54:07 +10:00
Joseph Huber	5a929a4249	[Clang] Support using boolean vectors in ternary operators (#154145 ) Summary: It's extremely common to conditionally blend two vectors. Previously this was done with mask registers, which is what the normal ternary code generation does when used on a vector. However, since Clang 15 we have supported boolean vector types in the compiler. These are useful in general for checking the mask registers, but are currently limited because they do not map to an LLVM-IR select instruction. This patch simply relaxes these checks, which are technically forbidden by the OpenCL standard. However, general vector support should be able to handle these. We already support this for Arm SVE types, so this should be make more consistent with the clang vector type.	2025-08-20 07:49:26 -05:00
Michał Górny	29067ac6e1	[OpenMP][OMPD] Fix GDB plugin to work correctly when installed (#153956 ) Fix the `sys.path` logic in the GDB plugin to insert the intended self-path in the first position rather than appending it to the end. The latter implied that if `sys.path` (naturally) contained the GDB's `gdb-plugin` directory, `import ompd` would return the top-level `ompd/__init__.py` module rather than the `ompd/ompd.py` submodule, as intended by adding the `ompd/` directory to `sys.path`. This is intended to be a minimal change necessary to fix the issue. Alternatively, the code could be modified to import `ompd.ompd` and stop modifying `sys.path` entirely. However, I do not know why this option was chosen in the first place, so I can't tell if this won't break something. Fixes #153954 Signed-off-by: Michał Górny <mgorny@gentoo.org>	2025-08-20 14:36:50 +02:00
Ross Brunton	c8986d1ecb	[Offload] Guard olMemAlloc/Free with a mutex (#153786 ) Both these functions update an `AllocInfoMap` structure in the context, however they did not use any locks, causing random failures in threaded code. Now they use a mutex.	2025-08-20 13:23:57 +01:00
Simeon David Schaub	4c295216e4	[SPIR-V] fix return type for OpAtomicCompareExchange (#154297 ) fixes #152863 Tests were written with some help from Copilot --------- Co-authored-by: Victor Lomuller <victor@codeplay.com>	2025-08-20 13:19:45 +01:00
David Green	c856e8def4	[ARM] Update cmps.ll, control-flow.ll and divrem.ll to use -cost-kind=all. NFC	2025-08-20 12:59:32 +01:00
Matt Arsenault	c876d53378	DAG: Avoid creating illegal extract_subvector in legalizer (#154100 ) Fixes #153808	2025-08-20 20:55:05 +09:00
Jordan Rupprecht	876fdc9e29	[bazel] Port #154452 : WasmSSA dialect importer (#154516 )	2025-08-20 06:51:44 -05:00

1 2 3 4 5 ...

549326 Commits