549326 Commits

Author SHA1 Message Date
Krzysztof Parzyszek
9f1679190e
[flang][OpenMP] Update GetOmpObjectList, move to parser utils (#154389)
`GetOmpObjectList` takes a clause, and returns the pointer to the
contained OmpObjectList, or nullptr if the clause does not contain one.
Some clauses with object list were not recognized: handle all clauses,
and move the implementation to flang/Parser/openmp-utils.cpp.
2025-08-20 12:41:26 -05:00
Yitzhak Mandelbaum
2be52f309e
[clang][dataflow] Fix uninitialized memory bug. (#154575)
Commit #3ecfc03 introduced a bug involving an uninitialized field in
`exportLogicalContext`. This patch initializes the field properly.
2025-08-20 13:36:42 -04:00
Akash Banerjee
d69ccded4f
[MLIR] Add cpow support in ComplexToROCDLLibraryCalls (#153183)
This PR adds support for complex power operations (`cpow`) in the
`ComplexToROCDLLibraryCalls` conversion pass, specifically targeting
AMDGPU architectures. The implementation optimises complex
exponentiation by using mathematical identities and special-case
handling for small integer powers.

- Force lowering to `complex.pow` operations for the `amdgcn-amd-amdhsa`
target instead of using library calls
- Convert `complex.pow(z, w)` to `complex.exp(w * complex.log(z))` using
mathematical identity
2025-08-20 17:18:30 +00:00
Kazu Hirata
65de318d18 [Sema] Fix a warning
This patch fixes:

  clang/lib/Sema/SemaTemplateVariadic.cpp:1069:22: error: variable
  'TST' set but not used [-Werror,-Wunused-but-set-variable]
2025-08-20 09:56:27 -07:00
David Tenty
63195d3d7a
[NFC][CMake] quote ${CMAKE_SYSTEM_NAME} consistently (#154537)
A CMake change included in CMake 4.0 makes `AIX` into a variable
(similar to `APPLE`, etc.)
ff03db6657

However, `${CMAKE_SYSTEM_NAME}` unfortunately also expands exactly to
`AIX` and `if` auto-expands variable names in CMake. That means you get
a double expansion if you write:

`if (${CMAKE_SYSTEM_NAME}  MATCHES "AIX")`
which becomes:
`if (AIX  MATCHES "AIX")`
which is as if you wrote:
`if (ON MATCHES "AIX")`

You can prevent this by quoting the expansion of "${CMAKE_SYSTEM_NAME}",
due to policy
[CMP0054](https://cmake.org/cmake/help/latest/policy/CMP0054.html#policy:CMP0054)
which is on by default in 4.0+. Most of the LLVM CMake already does
this, but this PR fixes the remaining cases where we do not.
2025-08-20 12:45:41 -04:00
schittir
fdfcebb38d
[clang][SYCL] Add sycl_external attribute and restrict emitting device code (#140282)
This patch is part of the upstreaming effort for supporting SYCL
language front end.
It makes the following changes:
1. Adds sycl_external attribute for functions with external linkage,
which is intended for use to implement the SYCL_EXTERNAL macro as
specified by the SYCL 2020 specification
2. Adds checks to avoid emitting device code when sycl_external and
sycl_kernel_entry_point attributes are not enabled
3. Fixes test failures caused by the above changes

This patch is missing diagnostics for the following diagnostics listed
in the SYCL 2020 specification's section 5.10.1, which will be addressed
in a subsequent PR:
Functions that are declared using SYCL_EXTERNAL have the following
additional restrictions beyond those imposed on other device functions:
1. If the SYCL backend does not support the generic address space then
the function cannot use raw pointers as parameter or return types.
Explicit pointer classes must be used instead;
2. The function cannot call group::parallel_for_work_item;
3. The function cannot be called from a parallel_for_work_group scope.

In addition to that, the subsequent PR will also implement diagnostics
for inline functions including virtual functions defined as inline.

---------

Co-authored-by: Mariya Podchishchaeva <mariya.podchishchaeva@intel.com>
2025-08-20 12:37:37 -04:00
Ilya Biryukov
85043c1c14
[Clang] Add a builtin that deduplicate types into a pack (#106730)
The new builtin `__builtin_dedup_pack` removes duplicates from list of
types.

The added builtin is special in that they produce an unexpanded pack
in the spirit of P3115R0 proposal.

Produced packs can be used directly in template argument lists and get
immediately expanded as soon as results of the computation are
available.

It allows to easily combine them, e.g.:

```cpp
template <class ...T>
struct Normalize {
  // Note: sort is not included in this PR, it illustrates the idea.
  using result = std::tuple<
    __builtin_sort_pack<
      __builtin_dedup_pack<int, double, T...>...
    >...>;
}
;
```

Limitations:
- only supported in template arguments and bases,
- can only be used inside the templates, even if non-dependent,
- the builtins cannot be assigned to template template parameters.

The actual implementation proceeds as follows:
- When the compiler encounters a `__builtin_dedup_pack` or other
type-producing
  builtin with dependent arguments, it creates a dependent
  `TemplateSpecializationType`.
- During substitution, if the template arguments are non-dependent, we
  will produce: a new type `SubstBuiltinTemplatePackType`, which stores
  an argument pack that needs to be substituted. This type is similar to
  the existing `SubstTemplateParmPack` in that it carries the argument
  pack that needs to be expanded further. The relevant code is shared.
- On top of that, Clang also wraps the resulting type into
  `TemplateSpecializationType`, but this time only as a sugar.
- To actually expand those packs, we collect the produced
  `SubstBuiltinTemplatePackType` inside `CollectUnexpandedPacks`.
  Because we know the size of the produces packs only after the initial
  substitution, places that do the actual expansion will need to have a
  second run over the substituted type to finalize the expansions (in
  this patch we only support this for template arguments, see
  `ExpandTemplateArgument`).

If the expansion are requested in the places we do not currently
support, we will produce an error.

More follow-up work will be needed to fully shape this:
- adding the builtin that sorts types,
- remove the restrictions for expansions,
- implementing P3115R0 (scheduled for C++29, see
  https://github.com/cplusplus/papers/issues/2300).
2025-08-20 18:11:36 +02:00
Craig Topper
2b7b8bdc16
[X86] Accept the canonical form of a sign bit test in MatchVectorAllEqualTest. (#154421)
This function tries to look for (seteq (and (reduce_or), mask), 0). If
the mask is a sign bit, InstCombine will have turned it into (setgt
(reduce_or), -1). We should handle that case too.

I'm looking into adding the same canonicalization to SimplifySetCC and
this change is needed to prevent test regressions.
2025-08-20 09:09:55 -07:00
Craig Topper
562e021103
[RISCV] Minor refactor of RISCVMoveMerge::mergePairedInsns. (#154467)
Fold the ARegInFirstPair into the later if/else with the same condition.
Use std::swap so we don't need to repeat operands in the opposite order.
2025-08-20 09:07:58 -07:00
Matthias Springer
6a285cc8e6
[mlir][IR] Fix Block::without_terminator for blocks without terminator (#154498)
Blocks without a terminator are not handled correctly by
`Block::without_terminator`: the last operation is excluded, even when
it is not a terminator. With this commit, only terminators are excluded.
If the last operation is unregistered, it is included for safety.
2025-08-20 18:02:24 +02:00
Kazu Hirata
f487c0e63c [AST] Fix warnings
This patch fixes:

  clang/lib/AST/ByteCode/InterpBuiltin.cpp:1827:21: error: unused
  variable 'ASTCtx' [-Werror,-Wunused-variable]

  clang/lib/AST/ByteCode/InterpBuiltin.cpp:2724:18: error: unused
  variable 'Arg2Type' [-Werror,-Wunused-variable]

  clang/lib/AST/ByteCode/InterpBuiltin.cpp:2725:18: error: unused
  variable 'Arg3Type' [-Werror,-Wunused-variable]

  clang/lib/AST/ByteCode/InterpBuiltin.cpp:2748:18: error: unused
  variable 'ElemT' [-Werror,-Wunused-variable]
2025-08-20 08:58:59 -07:00
Yanzuo Liu
a6da68ed36
[Clang][ASTMatchers] Make hasConditionVariableStatement support for loop, while loop and switch statements (#154298)
Co-authored-by: Aaron Ballman <aaron@aaronballman.com>
2025-08-20 23:57:30 +08:00
Krzysztof Parzyszek
15cb06109d
[Frontend][OpenMP] Allow multiple occurrences of DYN_GROUPPRIVATE (#154549)
It was mistakenly placed in "allowOnceClauses" on the constructs that
allow it.
2025-08-20 10:56:30 -05:00
Samira Bakon
ffe21f1cd7
[clang][dataflow] Transfer more cast expressions. (#153066)
Transfer all casts by kind as we currently do implicit casts. This
obviates the need for specific handling of static casts.

Also transfer CK_BaseToDerived and CK_DerivedToBase and add tests for
these and missing tests for already-handled cast types.

Ensure that CK_BaseToDerived casts result in modeling of the fields of the derived class.
2025-08-20 11:40:06 -04:00
Mirko Brkušanin
80f3b376b3
[AMDGPU][GlobalISel] Combine for breaking s64 and/or into two s32 insts (#151731)
When either one of the operands is all ones in high or low parts,
splitting these opens up other opportunities for combines. One of two
new instructions will either be removed or become a simple copy.
2025-08-20 17:32:29 +02:00
Steven Wu
2cfba9678d
[FileSystem] Allow exclusive file lock (#114098)
Add parameter to file lock API to allow exclusive file lock. Both Unix
and Windows support lock the file exclusively for write for one process
and LLVM OnDiskCAS uses exclusive file lock to coordinate CAS creation.
2025-08-20 08:32:18 -07:00
Matthias Springer
0499d3a8cf
[mlir][Interfaces] Add hasUnknownEffects helper function (#154523)
I have seen misuse of the `hasEffect` API in downstream projects: users
sometimes think that `hasEffect == false` indicates that the operation
does not have a certain memory effect. That's not necessarily the case.
When the op does not implement the `MemoryEffectsOpInterface`, it is
unknown whether it has the specified effect. "false" can also mean
"maybe".

This commit clarifies the semantics in the documentation. Also adds
`hasUnknownEffects` and `mightHaveEffect` convenience functions. Also
simplifies a few call sites.
2025-08-20 15:24:53 +00:00
Louis Dionne
2e2349e4e9
[libc++] Add internal checks for some basic_streambuf invariants (#144602)
These invariants are always expected to hold, however it's not always
clear that they do. Adding explicit checks for these invariants inside
non-trivial functions of basic_streambuf makes that clear.
2025-08-20 11:09:39 -04:00
Florian Hahn
35be64a416
[VPlan] Factor out logic to common compute costs to helper (NFCI). (#153361)
A number of recipes compute costs for the same opcodes for scalars or
vectors, depending on the recipe.

Move the common logic out to a helper in VPRecipeWithIRFlags, that is
then used by VPReplicateRecipe, VPWidenRecipe and VPInstruction.

This makes it easier to cover all relevant opcodes, without duplication.

PR: https://github.com/llvm/llvm-project/pull/153361
2025-08-20 16:05:20 +01:00
Jordan Rupprecht
f1458ec623
[bazel] Port #153504: add llvm-offload-wrapper (#154553) 2025-08-20 15:01:46 +00:00
Timothy Choi
d282452e4c
[libc++] Avoid string reallocation in std::filesystem::path::lexically_relative (#152964)
Improves runtime by around 20 to 40%. (1.3x to 1.7x)

```
Benchmark                                                           Time             CPU      Time Old      Time New       CPU Old       CPU New
------------------------------------------------------------------------------------------------------------------------------------------------
BM_LexicallyRelative/small_path/2                                -0.2111         -0.2082           229           181           228           180
BM_LexicallyRelative/small_path/4                                -0.2579         -0.2550           455           338           452           337
BM_LexicallyRelative/small_path/8                                -0.2643         -0.2616           844           621           838           619
BM_LexicallyRelative/small_path/16                               -0.2582         -0.2556          1562          1158          1551          1155
BM_LexicallyRelative/small_path/32                               -0.2518         -0.2496          3023          2262          3004          2254
BM_LexicallyRelative/small_path/64                               -0.2806         -0.2775          6344          4564          6295          4549
BM_LexicallyRelative/small_path/128                              -0.2165         -0.2137         11762          9216         11683          9186
BM_LexicallyRelative/small_path/256                              -0.2672         -0.2645         24499         17953         24324         17891
BM_LexicallyRelative/large_path/2                                -0.3268         -0.3236           426           287           422           285
BM_LexicallyRelative/large_path/4                                -0.3274         -0.3248           734           494           729           492
BM_LexicallyRelative/large_path/8                                -0.3586         -0.3560          1409           904          1399           901
BM_LexicallyRelative/large_path/16                               -0.3978         -0.3951          2764          1665          2743          1659
BM_LexicallyRelative/large_path/32                               -0.3934         -0.3908          5323          3229          5283          3218
BM_LexicallyRelative/large_path/64                               -0.3629         -0.3605         10340          6587         10265          6564
BM_LexicallyRelative/large_path/128                              -0.3450         -0.3423         19379         12694         19233         12649
BM_LexicallyRelative/large_path/256                              -0.3097         -0.3054         36293         25052         35943         24965
```

---------

Co-authored-by: Nikolas Klauser <nikolasklauser@berlin.de>
2025-08-20 16:58:21 +02:00
erichkeane
30fcf69845 [OpenACC] Fixup rules for reduction clause variable refererence type
The standard is ambiguous, but we can only support
arrays/array-sections/etc of the composite type, so make sure we enforce
the rule that way. This will better support  how we need to do lowering.
2025-08-20 07:55:16 -07:00
Janek van Oirschot
40e1510146
[AMDGPU][NFC] Enable gfx942 for more tests (#154363)
Enable gfx942 for tests that are affected by the an AMDGPU bitcast
constant combine (#154115)

Expecting to see more tests affected in aforementioned PR after rebase
on top of this PR
2025-08-20 15:46:26 +01:00
Brox Chen
c50ed05cad
[AMDGPU][True16][CodeGen] use vgpr16 for zext patterns (reopen #153894) (#154211)
recreate this patch from
https://github.com/llvm/llvm-project/pull/153894

It seems ISel sliently ignore the `i64 = zext i16` with a chained
`reg_sequence` pattern and thus this is causing a selection failure in
hip test. Recreate a new patch with an alternative pattern, and added a
ll test global-extload-gfx11plus.ll
2025-08-20 10:26:49 -04:00
halbi2
2f237670b1
[Clang] [Sema] Enable nodiscard warnings for function pointers (#154250)
A call through a function pointer has no associated FunctionDecl, but it
still might have a nodiscard return type. Ensure there is a codepath to
emit the nodiscard warning in this case.

Fixes #142453
2025-08-20 14:14:35 +00:00
Anutosh Bhat
ea634fef56
[clang-repl] Fix InstantiateTemplate & Value test while building against emscripten (#154513)
Building with assertions flag (-sAssertions=2) gives me these 

```
[ RUN ] InterpreterTest.InstantiateTemplate Aborted(Assertion failed: undefined symbol '__clang_Interpreter_SetValueWithAlloc'. perhaps a side module was not linked in? if this global was expected to arrive from a system library, try to build the MAIN_MODULE with EMCC_FORCE_STDLIBS=1 in the environment) Error in loading dynamic library incr_module_3.wasm: RuntimeError: Aborted(Assertion failed: undefined symbol '__clang_Interpreter_SetValueWithAlloc'. perhaps a side module was not linked in? if this global was expected to arrive from a system library, try to build the MAIN_MODULE with EMCC_FORCE_STDLIBS=1 in the environment) Could not load dynamic lib: incr_module_3.wasm RuntimeError: Aborted(Assertion failed: undefined symbol '__clang_Interpreter_SetValueWithAlloc'. perhaps a side module was not linked in? if this global was expected to arrive from a system library, try to build the MAIN_MODULE with EMCC_FORCE_STDLIBS=1 in the environment)
[ RUN ] InterpreterTest.InstantiateTemplate Aborted(Assertion failed: undefined symbol '__clang_Interpreter_SetValueNoAlloc'. perhaps a side module was not linked in? if this global was expected to arrive from a system library, try to build the MAIN_MODULE with EMCC_FORCE_STDLIBS=1 in the environment) Error in loading dynamic library incr_module_3.wasm: RuntimeError: Aborted(Assertion failed: undefined symbol '__clang_Interpreter_SetValueNoAlloc'. perhaps a side module was not linked in? if this global was expected to arrive from a system library, try to build the MAIN_MODULE with EMCC_FORCE_STDLIBS=1 in the environment) Could not load dynamic lib: incr_module_3.wasm RuntimeError: Aborted(Assertion failed: undefined symbol '__clang_Interpreter_SetValueNoAlloc'. perhaps a side module was not linked in? if this global was expected to arrive from a system library, try to build the MAIN_MODULE with EMCC_FORCE_STDLIBS=1 in the environment)
[ RUN ] InterpreterTest.InstantiateTemplate Aborted(Assertion failed: undefined symbol '_ZnwmPv26__clang_Interpreter_NewTag'. perhaps a side module was not linked in? if this global was expected to arrive from a system library, try to build the MAIN_MODULE with EMCC_FORCE_STDLIBS=1 in the environment) Error in loading dynamic library incr_module_23.wasm: RuntimeError: Aborted(Assertion failed: undefined symbol '_ZnwmPv26__clang_Interpreter_NewTag'. perhaps a side module was not linked in? if this global was expected to arrive from a system library, try to build the MAIN_MODULE with EMCC_FORCE_STDLIBS=1 in the environment) Could not load dynamic lib: incr_module_23.wasm RuntimeError: Aborted(Assertion failed: undefined symbol '_ZnwmPv26__clang_Interpreter_NewTag'. perhaps a side module was not linked in? if this global was expected to arrive from a system library, try to build the MAIN_MODULE with EMCC_FORCE_STDLIBS=1 in the environment)
[ RUN ] InterpreterTest.Value Aborted(Assertion failed: undefined symbol '_Z9getGlobalv'. perhaps a side module was not linked in? if this global was expected to arrive from a system library, try to build the MAIN_MODULE with EMCC_FORCE_STDLIBS=1 in the environment) Error in loading dynamic library incr_module_36.wasm: RuntimeError: Aborted(Assertion failed: undefined symbol '_Z9getGlobalv'. perhaps a side module was not linked in? if this global was expected to arrive from a system library, try to build the MAIN_MODULE with EMCC_FORCE_STDLIBS=1 in the environment) Could not load dynamic lib: incr_module_36.wasm
[ RUN ] InterpreterTest.Value Aborted(Assertion failed: undefined symbol '_Z9getGlobalv'. perhaps a side module was not linked in? if this global was expected to arrive from a system library, try to build the MAIN_MODULE with EMCC_FORCE_STDLIBS=1 in the environment) Error in loading dynamic library incr_module_36.wasm: RuntimeError: Aborted(Assertion failed: undefined symbol '_Z9setGlobali'. perhaps a side module was not linked in? if this global was expected to arrive from a system library, try to build the MAIN_MODULE with EMCC_FORCE_STDLIBS=1 in the environment) Could not load dynamic lib: incr_module_36.wasm
```

**So we have some symbols missing here that are needed by the side
modules being created here.**

First 2 are needed by both tests 
Last 3 are needed for these lines accordingly in the Value test.


dc23869f98/clang/unittests/Interpreter/InterpreterTest.cpp (L355)

dc23869f98/clang/unittests/Interpreter/InterpreterTest.cpp (L364)

dc23869f98/clang/unittests/Interpreter/InterpreterTest.cpp (L365)

Everything should work as expected after this 
```
[----------] 9 tests from InterpreterTest
[ RUN      ] InterpreterTest.Sanity
[       OK ] InterpreterTest.Sanity (18 ms)
[ RUN      ] InterpreterTest.IncrementalInputTopLevelDecls
[       OK ] InterpreterTest.IncrementalInputTopLevelDecls (45 ms)
[ RUN      ] InterpreterTest.Errors
[       OK ] InterpreterTest.Errors (29 ms)
[ RUN      ] InterpreterTest.DeclsAndStatements
[       OK ] InterpreterTest.DeclsAndStatements (34 ms)
[ RUN      ] InterpreterTest.UndoCommand
/Users/anutosh491/work/llvm-project/clang/unittests/Interpreter/InterpreterTest.cpp:156: Skipped
Test fails for Emscipten builds

[  SKIPPED ] InterpreterTest.UndoCommand (0 ms)
[ RUN      ] InterpreterTest.FindMangledNameSymbol
[       OK ] InterpreterTest.FindMangledNameSymbol (85 ms)
[ RUN      ] InterpreterTest.InstantiateTemplate
[       OK ] InterpreterTest.InstantiateTemplate (127 ms)
[ RUN      ] InterpreterTest.Value
[       OK ] InterpreterTest.Value (608 ms)
[ RUN      ] InterpreterTest.TranslationUnit_CanonicalDecl
[       OK ] InterpreterTest.TranslationUnit_CanonicalDecl (64 ms)
[----------] 9 tests from InterpreterTest (1014 ms total)
```

This is similar to how we need to take care of some symbols while
building side modules during running cppinterop's test suite !
2025-08-20 14:14:19 +00:00
Mehdi Amini
6cedf6e604
[MLIR] Add missing handling for LLVM_LIT_TOOLS_DIR in mlir lit config (NFC) (#154542)
This is helping some windows users, here is the doc:

**LLVM_LIT_TOOLS_DIR**:PATH
The path to GnuWin32 tools for tests. Valid on Windows host. Defaults to
the empty string, in which case lit will look for tools needed for tests
(e.g. ``grep``, ``sort``, etc.) in your ``%PATH%``. If GnuWin32 is not
in your
``%PATH%``, then you can set this variable to the GnuWin32 directory so
that
  lit can find tools needed for tests in that directory.
2025-08-20 16:05:44 +02:00
David Sherwood
e172110d12
[LV] Don't calculate scalar costs for scalable VFs in setVectorizedCallDecision (#152713)
In setVectorizedCallDecision we attempt to calculate the scalar costs
for vectorisation calls, even for scalable VFs where we already know the
answer is Invalid. We can avoid doing unnecessary work by skipping this
completely for scalable vectors.
2025-08-20 15:00:31 +01:00
Matt Arsenault
694a488708
AMDGPU: Add pseudoinstruction for 64-bit agpr or vgpr constants (#154499)
64-bit version of 7425af4b7aaa31da10bd1bc7996d3bb212c79d88. We
still need to lower to 32-bit v_accagpr_write_b32s, so this has
a unique value restriction that requires both halves of the constant
to be 32-bit inline immediates. This only introduces the new
pseudo definitions, but doesn't try to use them yet.
2025-08-20 22:54:37 +09:00
Chaitanya Koparkar
f649605bcf
[clang] Enable constexpr handling for __builtin_elementwise_fma (#152919)
Fixes https://github.com/llvm/llvm-project/issues/152455.
2025-08-20 14:51:40 +01:00
Sirui Mu
318b0dda7c
[CIR] Add atomic load and store operations (#153814)
This patch adds support for atomic loads and stores. Specifically, it
adds support for the following intrinsic calls:

- `__atomic_load` and `__atomic_store`;
- `__c11_atomic_load` and `__c11_atomic_store`.
2025-08-20 21:49:04 +08:00
Ryotaro Kasuga
2330fd2f73
[LoopPeel] Add new option to peeling loops to convert PHI into IV (#121104)
LoopPeel currently considers PHI nodes that become loop invariants
through peeling. However, in some cases, peeling transforms PHI nodes
into induction variables (IVs), potentially enabling further
optimizations such as loop vectorization. For example:

```c
// TSVC s292
int im = N-1;
for (int i=0; i<N; i++) {
  a[i] = b[i] + b[im];
  im = i;
}
```

In this case, peeling one iteration converts `im` into an IV, allowing
it to be handled by the loop vectorizer.

This patch adds a new feature to peel loops when to convert PHIs into
IVs. At the moment this feature is disabled by default.

Enabling it allows to vectorize the above example. I have measured on
neoverse-v2 and observed a speedup of more than 60% (options: `-O3
-ffast-math -mcpu=neoverse-v2 -mllvm -enable-peeling-for-iv`).

This PR is taken over from #94900
Related #81851
2025-08-20 13:44:56 +00:00
Mehdi Amini
8b2028ced6
Update log_level for LLVM_DEBUG and associated macros (#154525)
During the review of #150855 we switched from 0 to 1 for the default log
level used, but this macro wasn't updated.
2025-08-20 13:31:13 +00:00
Nikita Popov
99119a5a81 [OpenMPIRBuilder] Add missing LLVM_ABI annotations 2025-08-20 15:19:08 +02:00
Nikita Popov
822496db7f [SampleContextTracker] Add missing LLVM_ABI annotations 2025-08-20 15:19:08 +02:00
Nikita Popov
7eb5031e2c [GlobalDCE] Add missing LLVM_ABI annotation 2025-08-20 15:19:08 +02:00
Nikita Popov
6a99ad2975 [Debug] Add missing LLVM_ABI annotations 2025-08-20 15:19:08 +02:00
Harrison Hao
23a5a7bef3
[AMDGPU] Support merging 16-bit and 8-bit TBUFFER load/store instruction (#145078)
SILoadStoreOptimizer can now recognise consecutive 16-bit and 8-bit
`TBUFFER_LOAD`/`TBUFFER_STORE` instructions that each write

* a single component (`X`), or
* two components (`XY`),

and fold them into the wider native variants:

```
X + X          -->  XY
X + X + X + X  -->  XYZW
XY + XY        -->  XYZW
X + X + X      -->  XYZ
XY + X         -->  XYZ
```

The optimisation cuts the number of TBUFFER instructions, shrinking code
size and improving memory throughput.
2025-08-20 21:16:25 +08:00
Zhaoxuan Jiang
2738828c0e
[Reland] [CGData] Lazy loading support for stable function map (#154491)
This is an attempt to reland #151660 by including a missing STL header
found by a buildbot failure.

The stable function map could be huge for a large application. Fully
loading it is slow and consumes a significant amount of memory, which is
unnecessary and drastically slows down compilation especially for
non-LTO and distributed-ThinLTO setups. This patch introduces an opt-in
lazy loading support for the stable function map. The detailed changes
are:

- `StableFunctionMap`
- The map now stores entries in an `EntryStorage` struct, which includes
offsets for serialized entries and a `std::once_flag` for thread-safe
lazy loading.
- The underlying map type is changed from `DenseMap` to
`std::unordered_map` for compatibility with `std::once_flag`.
- `contains()`, `size()` and `at()` are implemented to only load
requested entries on demand.

- Lazy Loading Mechanism
- When reading indexed codegen data, if the newly-introduced
`-indexed-codegen-data-lazy-loading` flag is set, the stable function
map is not fully deserialized up front. The binary format for the stable
function map now includes offsets and sizes to support lazy loading.
- The safety of lazy loading is guarded by the once flag per function
hash. This guarantees that even in a multi-threaded environment, the
deserialization for a given function hash will happen exactly once. The
first thread to request it performs the load, and subsequent threads
will wait for it to complete before using the data. For single-threaded
builds, the overhead is negligible (a single check on the once flag).
For multi-threaded scenarios, users can omit the flag to retain the
previous eager-loading behavior.
2025-08-20 06:15:04 -07:00
Charles Zablit
c56bb124e3
[lldb] make lit use the same PYTHONHOME for building and running the API tests (#154396)
When testing LLDB, we want to make sure to use the same Python as the
one we used to build it.

We already did this in https://github.com/llvm/llvm-project/pull/143183
for the Unit and Shell tests. This patch does the same thing for the API
tests as well.
2025-08-20 14:10:50 +01:00
Benjamin Maxwell
478b4b012f
[AArch64][SME] Rework VG CFI information for streaming-mode changes (#152283)
This patch reworks how VG is handled around streaming mode changes.

Previously, for functions with streaming mode changes, we would:

- Save the incoming VG in the prologue
- Emit `.cfi_offset vg, <offset>` and `.cfi_restore vg` around streaming
  mode changes

Additionally, for locally streaming functions, we would:

- Also save the streaming VG in the prologue
- Emit `.cfi_offset vg, <incoming VG offset>` in the prologue
- Emit `.cfi_offset vg, <streaming VG offset>` and `.cfi_restore vg`
  around streaming mode changes

In both cases, this ends up doing more than necessary and would be hard
for an unwinder to parse, as using `.cfi_offset` in this way does not
follow the semantics of the underlying DWARF CFI opcodes.

So the new scheme in this patch is to:

In functions with streaming mode changes (inc locally streaming)

- Save the incoming VG in the prologue
- Emit `.cfi_offset vg, <offset>` in the prologue (not at streaming mode
  changes)
- Emit `.cfi_restore vg` after the saved VG has been deallocated 
- This will be in the function epilogue, where VG is always the same as
  the entry VG
- Explicitly reference the incoming VG expressions for SVE callee-saves
in functions with streaming mode changes
- Ensure the CFA is not described in terms of VG in functions with
  streaming mode changes

A more in-depth discussion of this scheme is available in:
https://gist.github.com/MacDue/b7a5c45d131d2440858165bfc903e97b

But the TLDR is that following this scheme, SME unwinding can be
implemented with minimal changes to existing unwinders. All unwinders
need to do is initialize VG to `CNTD` at the start of unwinding, then
everything else is handled by standard opcodes (which don't need changes
to handle VG).
2025-08-20 14:06:12 +01:00
Hank
c075fb8c37
[MLIR] Fix duplicated attribute nodes in MLIR bytecode deserialization (#151267)
Fixes #150163 

MLIR bytecode does not preserve alias definitions, so each attribute
encountered during deserialization is treated as a new one. This can
generate duplicate `DISubprogram` nodes during deserialization.

The patch adds a `StringMap` cache that records attributes and fetches
them when encountered again.
2025-08-20 13:03:26 +00:00
Qihan Cai
5f0515debd
[RISCV] Support Remaining P Extension Instructions for RV32/64 (#150379)
This patch implements pages 15-17 from
jhauser.us/RISCV/ext-P/RVP-instrEncodings-015.pdf

Documentation:
jhauser.us/RISCV/ext-P/RVP-baseInstrs-014.pdf
jhauser.us/RISCV/ext-P/RVP-instrEncodings-015.pdf
2025-08-20 22:54:07 +10:00
Joseph Huber
5a929a4249
[Clang] Support using boolean vectors in ternary operators (#154145)
Summary:
It's extremely common to conditionally blend two vectors. Previously
this was done with mask registers, which is what the normal ternary code
generation does when used on a vector. However, since Clang 15 we have
supported boolean vector types in the compiler. These are useful in
general for checking the mask registers, but are currently limited
because they do not map to an LLVM-IR select instruction.

This patch simply relaxes these checks, which are technically forbidden
by
the OpenCL standard. However, general vector support should be able to
handle these. We already support this for Arm SVE types, so this should
be make more consistent with the clang vector type.
2025-08-20 07:49:26 -05:00
Michał Górny
29067ac6e1
[OpenMP][OMPD] Fix GDB plugin to work correctly when installed (#153956)
Fix the `sys.path` logic in the GDB plugin to insert the intended
self-path in the first position rather than appending it to the end. The
latter implied that if `sys.path` (naturally) contained the GDB's
`gdb-plugin` directory, `import ompd` would return the top-level
`ompd/__init__.py` module rather than the `ompd/ompd.py` submodule, as
intended by adding the `ompd/` directory to `sys.path`.

This is intended to be a minimal change necessary to fix the issue.
Alternatively, the code could be modified to import `ompd.ompd` and stop
modifying `sys.path` entirely. However, I do not know why this option
was chosen in the first place, so I can't tell if this won't break
something.

Fixes #153954

Signed-off-by: Michał Górny <mgorny@gentoo.org>
2025-08-20 14:36:50 +02:00
Ross Brunton
c8986d1ecb
[Offload] Guard olMemAlloc/Free with a mutex (#153786)
Both these functions update an `AllocInfoMap` structure in the context,
however they did not use any locks, causing random failures in threaded
code. Now they use a mutex.
2025-08-20 13:23:57 +01:00
Simeon David Schaub
4c295216e4
[SPIR-V] fix return type for OpAtomicCompareExchange (#154297)
fixes #152863

Tests were written with some help from Copilot

---------

Co-authored-by: Victor Lomuller <victor@codeplay.com>
2025-08-20 13:19:45 +01:00
David Green
c856e8def4 [ARM] Update cmps.ll, control-flow.ll and divrem.ll to use -cost-kind=all. NFC 2025-08-20 12:59:32 +01:00
Matt Arsenault
c876d53378
DAG: Avoid creating illegal extract_subvector in legalizer (#154100)
Fixes #153808
2025-08-20 20:55:05 +09:00
Jordan Rupprecht
876fdc9e29
[bazel] Port #154452: WasmSSA dialect importer (#154516) 2025-08-20 06:51:44 -05:00