549390 Commits

Author SHA1 Message Date
Nikita Popov
6a99ad2975 [Debug] Add missing LLVM_ABI annotations 2025-08-20 15:19:08 +02:00
Harrison Hao
23a5a7bef3
[AMDGPU] Support merging 16-bit and 8-bit TBUFFER load/store instruction (#145078)
SILoadStoreOptimizer can now recognise consecutive 16-bit and 8-bit
`TBUFFER_LOAD`/`TBUFFER_STORE` instructions that each write

* a single component (`X`), or
* two components (`XY`),

and fold them into the wider native variants:

```
X + X          -->  XY
X + X + X + X  -->  XYZW
XY + XY        -->  XYZW
X + X + X      -->  XYZ
XY + X         -->  XYZ
```

The optimisation cuts the number of TBUFFER instructions, shrinking code
size and improving memory throughput.
2025-08-20 21:16:25 +08:00
Zhaoxuan Jiang
2738828c0e
[Reland] [CGData] Lazy loading support for stable function map (#154491)
This is an attempt to reland #151660 by including a missing STL header
found by a buildbot failure.

The stable function map could be huge for a large application. Fully
loading it is slow and consumes a significant amount of memory, which is
unnecessary and drastically slows down compilation especially for
non-LTO and distributed-ThinLTO setups. This patch introduces an opt-in
lazy loading support for the stable function map. The detailed changes
are:

- `StableFunctionMap`
- The map now stores entries in an `EntryStorage` struct, which includes
offsets for serialized entries and a `std::once_flag` for thread-safe
lazy loading.
- The underlying map type is changed from `DenseMap` to
`std::unordered_map` for compatibility with `std::once_flag`.
- `contains()`, `size()` and `at()` are implemented to only load
requested entries on demand.

- Lazy Loading Mechanism
- When reading indexed codegen data, if the newly-introduced
`-indexed-codegen-data-lazy-loading` flag is set, the stable function
map is not fully deserialized up front. The binary format for the stable
function map now includes offsets and sizes to support lazy loading.
- The safety of lazy loading is guarded by the once flag per function
hash. This guarantees that even in a multi-threaded environment, the
deserialization for a given function hash will happen exactly once. The
first thread to request it performs the load, and subsequent threads
will wait for it to complete before using the data. For single-threaded
builds, the overhead is negligible (a single check on the once flag).
For multi-threaded scenarios, users can omit the flag to retain the
previous eager-loading behavior.
2025-08-20 06:15:04 -07:00
Charles Zablit
c56bb124e3
[lldb] make lit use the same PYTHONHOME for building and running the API tests (#154396)
When testing LLDB, we want to make sure to use the same Python as the
one we used to build it.

We already did this in https://github.com/llvm/llvm-project/pull/143183
for the Unit and Shell tests. This patch does the same thing for the API
tests as well.
2025-08-20 14:10:50 +01:00
Benjamin Maxwell
478b4b012f
[AArch64][SME] Rework VG CFI information for streaming-mode changes (#152283)
This patch reworks how VG is handled around streaming mode changes.

Previously, for functions with streaming mode changes, we would:

- Save the incoming VG in the prologue
- Emit `.cfi_offset vg, <offset>` and `.cfi_restore vg` around streaming
  mode changes

Additionally, for locally streaming functions, we would:

- Also save the streaming VG in the prologue
- Emit `.cfi_offset vg, <incoming VG offset>` in the prologue
- Emit `.cfi_offset vg, <streaming VG offset>` and `.cfi_restore vg`
  around streaming mode changes

In both cases, this ends up doing more than necessary and would be hard
for an unwinder to parse, as using `.cfi_offset` in this way does not
follow the semantics of the underlying DWARF CFI opcodes.

So the new scheme in this patch is to:

In functions with streaming mode changes (inc locally streaming)

- Save the incoming VG in the prologue
- Emit `.cfi_offset vg, <offset>` in the prologue (not at streaming mode
  changes)
- Emit `.cfi_restore vg` after the saved VG has been deallocated 
- This will be in the function epilogue, where VG is always the same as
  the entry VG
- Explicitly reference the incoming VG expressions for SVE callee-saves
in functions with streaming mode changes
- Ensure the CFA is not described in terms of VG in functions with
  streaming mode changes

A more in-depth discussion of this scheme is available in:
https://gist.github.com/MacDue/b7a5c45d131d2440858165bfc903e97b

But the TLDR is that following this scheme, SME unwinding can be
implemented with minimal changes to existing unwinders. All unwinders
need to do is initialize VG to `CNTD` at the start of unwinding, then
everything else is handled by standard opcodes (which don't need changes
to handle VG).
2025-08-20 14:06:12 +01:00
Hank
c075fb8c37
[MLIR] Fix duplicated attribute nodes in MLIR bytecode deserialization (#151267)
Fixes #150163 

MLIR bytecode does not preserve alias definitions, so each attribute
encountered during deserialization is treated as a new one. This can
generate duplicate `DISubprogram` nodes during deserialization.

The patch adds a `StringMap` cache that records attributes and fetches
them when encountered again.
2025-08-20 13:03:26 +00:00
Qihan Cai
5f0515debd
[RISCV] Support Remaining P Extension Instructions for RV32/64 (#150379)
This patch implements pages 15-17 from
jhauser.us/RISCV/ext-P/RVP-instrEncodings-015.pdf

Documentation:
jhauser.us/RISCV/ext-P/RVP-baseInstrs-014.pdf
jhauser.us/RISCV/ext-P/RVP-instrEncodings-015.pdf
2025-08-20 22:54:07 +10:00
Joseph Huber
5a929a4249
[Clang] Support using boolean vectors in ternary operators (#154145)
Summary:
It's extremely common to conditionally blend two vectors. Previously
this was done with mask registers, which is what the normal ternary code
generation does when used on a vector. However, since Clang 15 we have
supported boolean vector types in the compiler. These are useful in
general for checking the mask registers, but are currently limited
because they do not map to an LLVM-IR select instruction.

This patch simply relaxes these checks, which are technically forbidden
by
the OpenCL standard. However, general vector support should be able to
handle these. We already support this for Arm SVE types, so this should
be make more consistent with the clang vector type.
2025-08-20 07:49:26 -05:00
Michał Górny
29067ac6e1
[OpenMP][OMPD] Fix GDB plugin to work correctly when installed (#153956)
Fix the `sys.path` logic in the GDB plugin to insert the intended
self-path in the first position rather than appending it to the end. The
latter implied that if `sys.path` (naturally) contained the GDB's
`gdb-plugin` directory, `import ompd` would return the top-level
`ompd/__init__.py` module rather than the `ompd/ompd.py` submodule, as
intended by adding the `ompd/` directory to `sys.path`.

This is intended to be a minimal change necessary to fix the issue.
Alternatively, the code could be modified to import `ompd.ompd` and stop
modifying `sys.path` entirely. However, I do not know why this option
was chosen in the first place, so I can't tell if this won't break
something.

Fixes #153954

Signed-off-by: Michał Górny <mgorny@gentoo.org>
2025-08-20 14:36:50 +02:00
Ross Brunton
c8986d1ecb
[Offload] Guard olMemAlloc/Free with a mutex (#153786)
Both these functions update an `AllocInfoMap` structure in the context,
however they did not use any locks, causing random failures in threaded
code. Now they use a mutex.
2025-08-20 13:23:57 +01:00
Simeon David Schaub
4c295216e4
[SPIR-V] fix return type for OpAtomicCompareExchange (#154297)
fixes #152863

Tests were written with some help from Copilot

---------

Co-authored-by: Victor Lomuller <victor@codeplay.com>
2025-08-20 13:19:45 +01:00
David Green
c856e8def4 [ARM] Update cmps.ll, control-flow.ll and divrem.ll to use -cost-kind=all. NFC 2025-08-20 12:59:32 +01:00
Matt Arsenault
c876d53378
DAG: Avoid creating illegal extract_subvector in legalizer (#154100)
Fixes #153808
2025-08-20 20:55:05 +09:00
Jordan Rupprecht
876fdc9e29
[bazel] Port #154452: WasmSSA dialect importer (#154516) 2025-08-20 06:51:44 -05:00
Sergei Barannikov
19ac1ff56e
[TableGen][DecoderEmitter] Factor populateFixedLenEncoding (NFC) (#154511)
Also drop the debug code under `#if 0` and a seemingly outdated comment.
2025-08-20 11:34:59 +00:00
Morris Hafner
b01f05977c
[CIR] Add support for string literal lvalues in ConstantLValueEmitter (#154514) 2025-08-20 13:30:21 +02:00
Fraser Cormack
8b128388b5
[clang] Introduce elementwise ctlz/cttz builtins (#131995)
These builtins are modeled on the clzg/ctzg builtins, which accept an
optional second argument. This second argument is returned if the first
argument is 0. These builtins unconditionally exhibit zero-is-undef
behaviour, regardless of target preference for the other ctz/clz
builtins. The builtins have constexpr support.

Fixes #154113
2025-08-20 12:18:28 +01:00
Simon Pilgrim
d770567a51
[X86] SimplifyDemandedVectorEltsForTargetNode - don't split X86ISD::CVTTP2UI nodes without AVX512VL (#154504)
Unlike CVTTP2SI, CVTTP2UI is only available on AVX512 targets, so we
don't fallback to the AVX1 variant when we split a 512-bit vector, so we
can only use the 128/256-bit variants if we have AVX512VL.

Fixes #154492
2025-08-20 12:18:10 +01:00
Florian Hahn
dc23869f98
[LV] Handle vector trip count being zero in preparePlanForEpiVectorLoop.
After a485e0e, we may not set the vector trip count in
preparePlanForEpilogueVectorLoop if it is zero. We should not choose a
VF * UF that makes the main vector loop dead (i.e. vector trip count is
zero), but there are some cases where this can happen currently.

In those cases, set EPI.VectorTripCount to zero.
2025-08-20 11:54:22 +01:00
Morris Hafner
0989ff5de8
[CIR] Add constant attribute to GlobalOp (#154359)
This patch adds the constant attribute to cir.global, the appropriate
lowering to LLVM constant and updates the tests.

---------

Co-authored-by: Andy Kaylor <akaylor@nvidia.com>
2025-08-20 12:53:00 +02:00
Morris Hafner
3b9664840b
[CIR] Implement__builtin_va_arg (#153834)
Part of https://github.com/llvm/llvm-project/issues/153286.
Depends on https://github.com/llvm/llvm-project/pull/153819.

This patch adds support for __builtin_va_arg by adding the cir.va.arg
operator. Unlike the incubator it doesn't depend on any target specific
lowering (yet) but maps to llvm.va_arg.
2025-08-20 12:52:11 +02:00
Charles Zablit
93189ec514
Revert "[lldb][windows] use Windows APIs to print to the console (#149493)" (#154423)
This reverts commit f55dc0824ebcf546b1d34a5102021c15101e4d3b in order to
fix the issue reported
[here](https://github.com/llvm/llvm-project/pull/149493#issuecomment-3201146559).
2025-08-20 11:45:34 +01:00
Morris Hafner
088555cf6b
[CIR] Add support for base classes in type conversion safety check (#154385)
This patch enables the record layout computation of types that are
derived more than once.
2025-08-20 12:44:26 +02:00
DanilaZhebryakov
0a3ee7de9c
[PowerPC] fix bug affecting float to int32 conversion on LE PowerPC (#150194)
When moving fcti results from float registers to normal registers
through memory, even though MPI was adjusted to account for endianness,
FIPtr was always adjusted for big-endian, which caused loads of wrong
half of a value in little-endian mode.
2025-08-20 12:37:14 +02:00
Nikita Popov
fea7e6934a
[PowerPC] Remove custom original type tracking (NFCI) (#154090)
The OrigTy is passed to CC lowering nowadays, so use it directly instead
of custom pre-analysis.
2025-08-20 12:36:23 +02:00
Charles Zablit
af5f16ba91
[lldb][windows] remove duplicate implementation of UTF8ToUTF16 (#154424)
`std::wstring AnsiToUtf16(const std::string &ansi)` is a
reimplementation of `llvm::sys::windows::UTF8ToUTF16`. This patch
removes `AnsiToUtf16` and its usages entirely.
2025-08-20 11:30:51 +01:00
Marco Elver
e2a077fed9
Thread Safety Analysis: Graduate ACQUIRED_BEFORE() and ACQUIRED_AFTER() from beta features (#152853)
Both these attributes were introduced in ab1dc2d54db5 ("Thread Safety
Analysis: add support for before/after annotations on mutexes") back in
2015 as "beta" features.

Anecdotally, we've been using `-Wthread-safety-beta` for years without
problems.

Furthermore, this feature requires the user to explicitly use these
attributes in the first place.

After 10 years, let's graduate the feature to the stable feature set,
and reserve `-Wthread-safety-beta` for new upcoming features.
2025-08-20 12:17:52 +02:00
Paul Walker
d6a688fb3d
[LLVM][CodeGen][SME] hasB16b16() is not sufficient to prove BFADD availability. (#154143)
The FEAT_SVE_B16B16 arithmetic instructions are only available to
streaming mode functions when SME2 is available.
2025-08-20 11:12:43 +01:00
SivanShani-Arm
460e9a8837
[LLVM][AArch64] Build attributes: Support switching to a defined subsection by name only (#154159)
The AArch64 build attribute specification now allows switching to an
already-defined subsection using its name alone, without repeating the
optionality and type parameters.

This patch updates the parser to support that behavior.

Spec reference: https://github.com/ARM-software/abi-aa/pull/230/files
2025-08-20 10:50:48 +01:00
Nikita Popov
5ae749b77d
[FunctionAttr] Invalidate callers with mismatching signature (#154289)
If FunctionAttrs infers additional attributes on a function, it also
invalidates analysis on callers of that function. The way it does this
right now limits this to calls with matching signature. However, the
function attributes will also be used when the signatures do not match.
Use getCalledOperand() to avoid a signature check.

This is not a correctness fix, just improves analysis quality. I noticed
this due to
https://github.com/llvm/llvm-project/pull/144497#issuecomment-3199330709,
where LICM ends up with a stale MemoryDef that could be a MemoryUse
(which is a bug in LICM, but still non-optimal).
2025-08-20 11:38:31 +02:00
Nikita Popov
5dbf73f54d
[Lanai] Use ArgFlags to distinguish fixed parameters (#154278)
Whether the argument is fixed is now available via ArgFlags, so make use
of it. The previous implementation was quite problematic, because it
stored the number of fixed arguments in a global variable, which is not
thread safe.
2025-08-20 11:38:02 +02:00
jyli0116
9df7ca1f0f
[GlobalISel] Legalize Saturated Truncate instructions and intrinsics (#154340)
Adds legalization support for `G_TRUNC_SSAT_S`, `G_TRUNC_SSAT_S`,
`G_TRUNC_USAT_U` instructions for GlobalISel.
2025-08-20 10:37:22 +01:00
Younan Zhang
5f9630b388
[Clang] Reapply "Only remove lambda scope after computing evaluation context" (#154458)
The immediate evaluation context needs the lambda scope info to
propagate some flags, however that LSI was removed in
ActOnFinishFunctionBody which happened before rebuilding a lambda
expression.

The last attempt destroyed LSI at the end of the block scope, after
which we still need it in DiagnoseShadowingLambdaDecls.

This also converts the wrapper function to default arguments as a
drive-by fix, as well as does some cleanup.

Fixes https://github.com/llvm/llvm-project/issues/145776
2025-08-20 17:30:33 +08:00
Link
46e77ebf71
[RISCV][NFC] Ensure files end with newline. (#154457)
Add trailing newlines to the following files to comply with POSIX
standards:
- llvm/lib/Target/RISCV/RISCVInstrInfoXSpacemiT.td
- llvm/test/MC/RISCV/xsmtvdot-invalid.s
- llvm/test/MC/RISCV/xsmtvdot-valid.s

Closes #151706
2025-08-20 17:18:16 +08:00
Orlando Cazalet-Hyams
6c9352530a
[RemoveDIs][NFC] Clean up BasicBlockUtils now intrinsics are gone (#154326)
A couple of minor readability changes now that we're not supporting both
intrinsics and records.
2025-08-20 10:03:44 +01:00
Jim Lin
174135863f [OMPIRBuilder] Use CreateNUWMul instead of passing flags to CreateMul. NFC 2025-08-20 17:00:17 +08:00
Jie Fu
80bc38bc92 [RISCV] Silent a warning (NFC)
/llvm-project/clang/lib/CodeGen/Targets/RISCV.cpp:865:9:
 error: unused variable 'FixedSrcTy' [-Werror,-Wunused-variable]
  auto *FixedSrcTy = cast<llvm::FixedVectorType>(SrcTy);
        ^
1 error generated.
2025-08-20 16:59:12 +08:00
Simon Pilgrim
035f40e8d5 [RISCV] incorrect-extract-subvector-combine.ll - remove quotes around -mattr argument. NFC.
The unnecessary quotes prevents DOS scripts from executing the RUN line.
2025-08-20 09:57:45 +01:00
Luc Forget
95fbc18a70
[MLIR][Wasm] Extending Wasm binary to WasmSSA dialect importer (#154452)
This is a cherry pick of #154053 with a fix for bad handling of
endianess when loading float and double litteral from the binary.

---------

Co-authored-by: Ferdinand Lemaire <ferdinand.lemaire@woven-planet.global>
Co-authored-by: Jessica Paquette <jessica.paquette@woven-planet.global>
Co-authored-by: Luc Forget <luc.forget@woven.toyota>
2025-08-20 10:55:55 +02:00
Kerry McLaughlin
c34cba0413
[AArch64][SME] Lower aarch64.sme.cnts* to vscale when in streaming mode (#154305)
In streaming mode, both the @llvm.aarch64.sme.cnts and @llvm.aarch64.sve.cnt
intrinsics are equivalent. For SVE, cnt* is lowered in instCombineIntrinsic
to @llvm.sme.vscale(). This patch lowers the SME intrinsic similarly when
in streaming-mode.
2025-08-20 09:48:36 +01:00
Guillaume Chatelet
4dd9e99284
[libc] Fix constexpr add_with_carry/sub_with_borrow (#154282)
The previous version of the code would prevent the use of the compiler
builtins.
2025-08-20 10:41:51 +02:00
Brandon Wu
52a2e68fda
[clang][RISCV] Fix crash on VLS calling convention (#145489)
This patch handle struct of fixed vector and struct of array of fixed
vector correctly for VLS calling convention in EmitFunctionProlog,
EmitFunctionEpilog and EmitCall.

stack on: https://github.com/llvm/llvm-project/pull/147173
2025-08-20 16:39:02 +08:00
Timm Baeder
e16ced3ef4
[clang][bytecode] Diagnose one-past-end reads from global arrays (#154484)
Fixes #154312
2025-08-20 10:34:44 +02:00
Amr Hesham
018c5ba161
[CIR] Implement MemberExpr with VarDecl for ComplexType (#154307)
This change adds support for MemberExpr with VarDecl ComplexType

Issue: https://github.com/llvm/llvm-project/issues/141365
2025-08-20 10:33:08 +02:00
Nikolas Klauser
2dc0a5f3cd
[libc++][NFC] Use early returns in a few basic_string functions (#137299)
Using early returns tends to make the code easier to read, without any
changes to the generated code.
2025-08-20 10:16:13 +02:00
Matt Arsenault
0fa6fdfbb8
AMDGPU: Correct inst size for av_mov_b32_imm_pseudo (#154459)
In the AGPR case this will be an 8 byte instruction,
which is part of why this case is a pain to deal with in the
first place.
2025-08-20 17:01:48 +09:00
Benjamin Maxwell
eb2af3a5be
[ComplexDeinterleaving] Use BumpPtrAllocator for CompositeNodes (NFC) (#153217)
I was looking over this pass and noticed it was using shared pointers
for CompositeNodes. However, all nodes are owned by the deinterleaving
graph and are not released until the graph is destroyed. This means a
bump allocator and raw pointers can be used, which have a simpler
ownership model and less overhead than shared pointers.

The changes in this PR are to:
- Add a `SpecificBumpPtrAllocator<CompositeNode>` to the
`ComplexDeinterleavingGraph`
- This allocates new nodes and will deallocate them when the graph is
destroyed
  - Replace `NodePtr` and `RawNodePtr` with  `CompositeNode *`
2025-08-20 08:59:31 +01:00
David Green
e3cf967cdf [AArch64] Regenerate and update itofp.ll and fptoi.ll
This updates the fp<->int tests to include some store(fptoi) and itofp(load)
test cases. It also cuts down on the number of large vector cases which are not
testing anything new.
2025-08-20 08:43:10 +01:00
Lang Hames
70e41a2ab1
[orc-rt] Add orc_rt::span, a pre-c++20 std::span substitute (#154478)
This patch introduces an orc_rt::span class template that the ORC
runtime can use until we're able to move to c++20.
2025-08-20 17:28:02 +10:00
Timm Baeder
a9de444aa1
[clang][bytecode][NFC] Use an anonymous union in Pointer (#154405)
So we can save ourselves writing PointeeStorage all the time.
2025-08-20 09:23:19 +02:00