llvm-project

Author	SHA1	Message	Date
Nikita Popov	6a99ad2975	[Debug] Add missing LLVM_ABI annotations	2025-08-20 15:19:08 +02:00
Harrison Hao	23a5a7bef3	[AMDGPU] Support merging 16-bit and 8-bit TBUFFER load/store instruction (#145078 ) SILoadStoreOptimizer can now recognise consecutive 16-bit and 8-bit `TBUFFER_LOAD`/`TBUFFER_STORE` instructions that each write * a single component (`X`), or * two components (`XY`), and fold them into the wider native variants: ``` X + X --> XY X + X + X + X --> XYZW XY + XY --> XYZW X + X + X --> XYZ XY + X --> XYZ ``` The optimisation cuts the number of TBUFFER instructions, shrinking code size and improving memory throughput.	2025-08-20 21:16:25 +08:00
Zhaoxuan Jiang	2738828c0e	[Reland] [CGData] Lazy loading support for stable function map (#154491 ) This is an attempt to reland #151660 by including a missing STL header found by a buildbot failure. The stable function map could be huge for a large application. Fully loading it is slow and consumes a significant amount of memory, which is unnecessary and drastically slows down compilation especially for non-LTO and distributed-ThinLTO setups. This patch introduces an opt-in lazy loading support for the stable function map. The detailed changes are: - `StableFunctionMap` - The map now stores entries in an `EntryStorage` struct, which includes offsets for serialized entries and a `std::once_flag` for thread-safe lazy loading. - The underlying map type is changed from `DenseMap` to `std::unordered_map` for compatibility with `std::once_flag`. - `contains()`, `size()` and `at()` are implemented to only load requested entries on demand. - Lazy Loading Mechanism - When reading indexed codegen data, if the newly-introduced `-indexed-codegen-data-lazy-loading` flag is set, the stable function map is not fully deserialized up front. The binary format for the stable function map now includes offsets and sizes to support lazy loading. - The safety of lazy loading is guarded by the once flag per function hash. This guarantees that even in a multi-threaded environment, the deserialization for a given function hash will happen exactly once. The first thread to request it performs the load, and subsequent threads will wait for it to complete before using the data. For single-threaded builds, the overhead is negligible (a single check on the once flag). For multi-threaded scenarios, users can omit the flag to retain the previous eager-loading behavior.	2025-08-20 06:15:04 -07:00
Charles Zablit	c56bb124e3	[lldb] make lit use the same PYTHONHOME for building and running the API tests (#154396 ) When testing LLDB, we want to make sure to use the same Python as the one we used to build it. We already did this in https://github.com/llvm/llvm-project/pull/143183 for the Unit and Shell tests. This patch does the same thing for the API tests as well.	2025-08-20 14:10:50 +01:00
Benjamin Maxwell	478b4b012f	[AArch64][SME] Rework VG CFI information for streaming-mode changes (#152283 ) This patch reworks how VG is handled around streaming mode changes. Previously, for functions with streaming mode changes, we would: - Save the incoming VG in the prologue - Emit `.cfi_offset vg, <offset>` and `.cfi_restore vg` around streaming mode changes Additionally, for locally streaming functions, we would: - Also save the streaming VG in the prologue - Emit `.cfi_offset vg, <incoming VG offset>` in the prologue - Emit `.cfi_offset vg, <streaming VG offset>` and `.cfi_restore vg` around streaming mode changes In both cases, this ends up doing more than necessary and would be hard for an unwinder to parse, as using `.cfi_offset` in this way does not follow the semantics of the underlying DWARF CFI opcodes. So the new scheme in this patch is to: In functions with streaming mode changes (inc locally streaming) - Save the incoming VG in the prologue - Emit `.cfi_offset vg, <offset>` in the prologue (not at streaming mode changes) - Emit `.cfi_restore vg` after the saved VG has been deallocated - This will be in the function epilogue, where VG is always the same as the entry VG - Explicitly reference the incoming VG expressions for SVE callee-saves in functions with streaming mode changes - Ensure the CFA is not described in terms of VG in functions with streaming mode changes A more in-depth discussion of this scheme is available in: https://gist.github.com/MacDue/b7a5c45d131d2440858165bfc903e97b But the TLDR is that following this scheme, SME unwinding can be implemented with minimal changes to existing unwinders. All unwinders need to do is initialize VG to `CNTD` at the start of unwinding, then everything else is handled by standard opcodes (which don't need changes to handle VG).	2025-08-20 14:06:12 +01:00
Hank	c075fb8c37	[MLIR] Fix duplicated attribute nodes in MLIR bytecode deserialization (#151267 ) Fixes #150163 MLIR bytecode does not preserve alias definitions, so each attribute encountered during deserialization is treated as a new one. This can generate duplicate `DISubprogram` nodes during deserialization. The patch adds a `StringMap` cache that records attributes and fetches them when encountered again.	2025-08-20 13:03:26 +00:00
Qihan Cai	5f0515debd	[RISCV] Support Remaining P Extension Instructions for RV32/64 (#150379 ) This patch implements pages 15-17 from jhauser.us/RISCV/ext-P/RVP-instrEncodings-015.pdf Documentation: jhauser.us/RISCV/ext-P/RVP-baseInstrs-014.pdf jhauser.us/RISCV/ext-P/RVP-instrEncodings-015.pdf	2025-08-20 22:54:07 +10:00
Joseph Huber	5a929a4249	[Clang] Support using boolean vectors in ternary operators (#154145 ) Summary: It's extremely common to conditionally blend two vectors. Previously this was done with mask registers, which is what the normal ternary code generation does when used on a vector. However, since Clang 15 we have supported boolean vector types in the compiler. These are useful in general for checking the mask registers, but are currently limited because they do not map to an LLVM-IR select instruction. This patch simply relaxes these checks, which are technically forbidden by the OpenCL standard. However, general vector support should be able to handle these. We already support this for Arm SVE types, so this should be make more consistent with the clang vector type.	2025-08-20 07:49:26 -05:00
Michał Górny	29067ac6e1	[OpenMP][OMPD] Fix GDB plugin to work correctly when installed (#153956 ) Fix the `sys.path` logic in the GDB plugin to insert the intended self-path in the first position rather than appending it to the end. The latter implied that if `sys.path` (naturally) contained the GDB's `gdb-plugin` directory, `import ompd` would return the top-level `ompd/__init__.py` module rather than the `ompd/ompd.py` submodule, as intended by adding the `ompd/` directory to `sys.path`. This is intended to be a minimal change necessary to fix the issue. Alternatively, the code could be modified to import `ompd.ompd` and stop modifying `sys.path` entirely. However, I do not know why this option was chosen in the first place, so I can't tell if this won't break something. Fixes #153954 Signed-off-by: Michał Górny <mgorny@gentoo.org>	2025-08-20 14:36:50 +02:00
Ross Brunton	c8986d1ecb	[Offload] Guard olMemAlloc/Free with a mutex (#153786 ) Both these functions update an `AllocInfoMap` structure in the context, however they did not use any locks, causing random failures in threaded code. Now they use a mutex.	2025-08-20 13:23:57 +01:00
Simeon David Schaub	4c295216e4	[SPIR-V] fix return type for OpAtomicCompareExchange (#154297 ) fixes #152863 Tests were written with some help from Copilot --------- Co-authored-by: Victor Lomuller <victor@codeplay.com>	2025-08-20 13:19:45 +01:00
David Green	c856e8def4	[ARM] Update cmps.ll, control-flow.ll and divrem.ll to use -cost-kind=all. NFC	2025-08-20 12:59:32 +01:00
Matt Arsenault	c876d53378	DAG: Avoid creating illegal extract_subvector in legalizer (#154100 ) Fixes #153808	2025-08-20 20:55:05 +09:00
Jordan Rupprecht	876fdc9e29	[bazel] Port #154452 : WasmSSA dialect importer (#154516 )	2025-08-20 06:51:44 -05:00
Sergei Barannikov	19ac1ff56e	[TableGen][DecoderEmitter] Factor populateFixedLenEncoding (NFC) (#154511 ) Also drop the debug code under `#if 0` and a seemingly outdated comment.	2025-08-20 11:34:59 +00:00
Morris Hafner	b01f05977c	[CIR] Add support for string literal lvalues in ConstantLValueEmitter (#154514 )	2025-08-20 13:30:21 +02:00
Fraser Cormack	8b128388b5	[clang] Introduce elementwise ctlz/cttz builtins (#131995 ) These builtins are modeled on the clzg/ctzg builtins, which accept an optional second argument. This second argument is returned if the first argument is 0. These builtins unconditionally exhibit zero-is-undef behaviour, regardless of target preference for the other ctz/clz builtins. The builtins have constexpr support. Fixes #154113	2025-08-20 12:18:28 +01:00
Simon Pilgrim	d770567a51	[X86] SimplifyDemandedVectorEltsForTargetNode - don't split X86ISD::CVTTP2UI nodes without AVX512VL (#154504 ) Unlike CVTTP2SI, CVTTP2UI is only available on AVX512 targets, so we don't fallback to the AVX1 variant when we split a 512-bit vector, so we can only use the 128/256-bit variants if we have AVX512VL. Fixes #154492	2025-08-20 12:18:10 +01:00
Florian Hahn	dc23869f98	[LV] Handle vector trip count being zero in preparePlanForEpiVectorLoop. After a485e0e, we may not set the vector trip count in preparePlanForEpilogueVectorLoop if it is zero. We should not choose a VF * UF that makes the main vector loop dead (i.e. vector trip count is zero), but there are some cases where this can happen currently. In those cases, set EPI.VectorTripCount to zero.	2025-08-20 11:54:22 +01:00
Morris Hafner	0989ff5de8	[CIR] Add constant attribute to GlobalOp (#154359 ) This patch adds the constant attribute to cir.global, the appropriate lowering to LLVM constant and updates the tests. --------- Co-authored-by: Andy Kaylor <akaylor@nvidia.com>	2025-08-20 12:53:00 +02:00
Morris Hafner	3b9664840b	[CIR] Implement__builtin_va_arg (#153834 ) Part of https://github.com/llvm/llvm-project/issues/153286. Depends on https://github.com/llvm/llvm-project/pull/153819. This patch adds support for __builtin_va_arg by adding the cir.va.arg operator. Unlike the incubator it doesn't depend on any target specific lowering (yet) but maps to llvm.va_arg.	2025-08-20 12:52:11 +02:00
Charles Zablit	93189ec514	Revert "[lldb][windows] use Windows APIs to print to the console (#149493 )" (#154423 ) This reverts commit f55dc0824ebcf546b1d34a5102021c15101e4d3b in order to fix the issue reported [here](https://github.com/llvm/llvm-project/pull/149493#issuecomment-3201146559).	2025-08-20 11:45:34 +01:00
Morris Hafner	088555cf6b	[CIR] Add support for base classes in type conversion safety check (#154385 ) This patch enables the record layout computation of types that are derived more than once.	2025-08-20 12:44:26 +02:00
DanilaZhebryakov	0a3ee7de9c	[PowerPC] fix bug affecting float to int32 conversion on LE PowerPC (#150194 ) When moving fcti results from float registers to normal registers through memory, even though MPI was adjusted to account for endianness, FIPtr was always adjusted for big-endian, which caused loads of wrong half of a value in little-endian mode.	2025-08-20 12:37:14 +02:00
Nikita Popov	fea7e6934a	[PowerPC] Remove custom original type tracking (NFCI) (#154090 ) The OrigTy is passed to CC lowering nowadays, so use it directly instead of custom pre-analysis.	2025-08-20 12:36:23 +02:00
Charles Zablit	af5f16ba91	[lldb][windows] remove duplicate implementation of UTF8ToUTF16 (#154424 ) `std::wstring AnsiToUtf16(const std::string &ansi)` is a reimplementation of `llvm::sys::windows::UTF8ToUTF16`. This patch removes `AnsiToUtf16` and its usages entirely.	2025-08-20 11:30:51 +01:00
Marco Elver	e2a077fed9	Thread Safety Analysis: Graduate ACQUIRED_BEFORE() and ACQUIRED_AFTER() from beta features (#152853 ) Both these attributes were introduced in ab1dc2d54db5 ("Thread Safety Analysis: add support for before/after annotations on mutexes") back in 2015 as "beta" features. Anecdotally, we've been using `-Wthread-safety-beta` for years without problems. Furthermore, this feature requires the user to explicitly use these attributes in the first place. After 10 years, let's graduate the feature to the stable feature set, and reserve `-Wthread-safety-beta` for new upcoming features.	2025-08-20 12:17:52 +02:00
Paul Walker	d6a688fb3d	[LLVM][CodeGen][SME] hasB16b16() is not sufficient to prove BFADD availability. (#154143 ) The FEAT_SVE_B16B16 arithmetic instructions are only available to streaming mode functions when SME2 is available.	2025-08-20 11:12:43 +01:00
SivanShani-Arm	460e9a8837	[LLVM][AArch64] Build attributes: Support switching to a defined subsection by name only (#154159 ) The AArch64 build attribute specification now allows switching to an already-defined subsection using its name alone, without repeating the optionality and type parameters. This patch updates the parser to support that behavior. Spec reference: https://github.com/ARM-software/abi-aa/pull/230/files	2025-08-20 10:50:48 +01:00
Nikita Popov	5ae749b77d	[FunctionAttr] Invalidate callers with mismatching signature (#154289 ) If FunctionAttrs infers additional attributes on a function, it also invalidates analysis on callers of that function. The way it does this right now limits this to calls with matching signature. However, the function attributes will also be used when the signatures do not match. Use getCalledOperand() to avoid a signature check. This is not a correctness fix, just improves analysis quality. I noticed this due to https://github.com/llvm/llvm-project/pull/144497#issuecomment-3199330709, where LICM ends up with a stale MemoryDef that could be a MemoryUse (which is a bug in LICM, but still non-optimal).	2025-08-20 11:38:31 +02:00
Nikita Popov	5dbf73f54d	[Lanai] Use ArgFlags to distinguish fixed parameters (#154278 ) Whether the argument is fixed is now available via ArgFlags, so make use of it. The previous implementation was quite problematic, because it stored the number of fixed arguments in a global variable, which is not thread safe.	2025-08-20 11:38:02 +02:00
jyli0116	9df7ca1f0f	[GlobalISel] Legalize Saturated Truncate instructions and intrinsics (#154340 ) Adds legalization support for `G_TRUNC_SSAT_S`, `G_TRUNC_SSAT_S`, `G_TRUNC_USAT_U` instructions for GlobalISel.	2025-08-20 10:37:22 +01:00
Younan Zhang	5f9630b388	[Clang] Reapply "Only remove lambda scope after computing evaluation context" (#154458 ) The immediate evaluation context needs the lambda scope info to propagate some flags, however that LSI was removed in ActOnFinishFunctionBody which happened before rebuilding a lambda expression. The last attempt destroyed LSI at the end of the block scope, after which we still need it in DiagnoseShadowingLambdaDecls. This also converts the wrapper function to default arguments as a drive-by fix, as well as does some cleanup. Fixes https://github.com/llvm/llvm-project/issues/145776	2025-08-20 17:30:33 +08:00
Link	46e77ebf71	[RISCV][NFC] Ensure files end with newline. (#154457 ) Add trailing newlines to the following files to comply with POSIX standards: - llvm/lib/Target/RISCV/RISCVInstrInfoXSpacemiT.td - llvm/test/MC/RISCV/xsmtvdot-invalid.s - llvm/test/MC/RISCV/xsmtvdot-valid.s Closes #151706	2025-08-20 17:18:16 +08:00
Orlando Cazalet-Hyams	6c9352530a	[RemoveDIs][NFC] Clean up BasicBlockUtils now intrinsics are gone (#154326 ) A couple of minor readability changes now that we're not supporting both intrinsics and records.	2025-08-20 10:03:44 +01:00
Jim Lin	174135863f	[OMPIRBuilder] Use CreateNUWMul instead of passing flags to CreateMul. NFC	2025-08-20 17:00:17 +08:00
Jie Fu	80bc38bc92	[RISCV] Silent a warning (NFC) /llvm-project/clang/lib/CodeGen/Targets/RISCV.cpp:865:9: error: unused variable 'FixedSrcTy' [-Werror,-Wunused-variable] auto *FixedSrcTy = cast<llvm::FixedVectorType>(SrcTy); ^ 1 error generated.	2025-08-20 16:59:12 +08:00
Simon Pilgrim	035f40e8d5	[RISCV] incorrect-extract-subvector-combine.ll - remove quotes around -mattr argument. NFC. The unnecessary quotes prevents DOS scripts from executing the RUN line.	2025-08-20 09:57:45 +01:00
Luc Forget	95fbc18a70	[MLIR][Wasm] Extending Wasm binary to WasmSSA dialect importer (#154452 ) This is a cherry pick of #154053 with a fix for bad handling of endianess when loading float and double litteral from the binary. --------- Co-authored-by: Ferdinand Lemaire <ferdinand.lemaire@woven-planet.global> Co-authored-by: Jessica Paquette <jessica.paquette@woven-planet.global> Co-authored-by: Luc Forget <luc.forget@woven.toyota>	2025-08-20 10:55:55 +02:00
Kerry McLaughlin	c34cba0413	[AArch64][SME] Lower aarch64.sme.cnts* to vscale when in streaming mode (#154305 ) In streaming mode, both the @llvm.aarch64.sme.cnts and @llvm.aarch64.sve.cnt intrinsics are equivalent. For SVE, cnt* is lowered in instCombineIntrinsic to @llvm.sme.vscale(). This patch lowers the SME intrinsic similarly when in streaming-mode.	2025-08-20 09:48:36 +01:00
Guillaume Chatelet	4dd9e99284	[libc] Fix `constexpr` `add_with_carry`/`sub_with_borrow` (#154282 ) The previous version of the code would prevent the use of the compiler builtins.	2025-08-20 10:41:51 +02:00
Brandon Wu	52a2e68fda	[clang][RISCV] Fix crash on VLS calling convention (#145489 ) This patch handle struct of fixed vector and struct of array of fixed vector correctly for VLS calling convention in EmitFunctionProlog, EmitFunctionEpilog and EmitCall. stack on: https://github.com/llvm/llvm-project/pull/147173	2025-08-20 16:39:02 +08:00
Timm Baeder	e16ced3ef4	[clang][bytecode] Diagnose one-past-end reads from global arrays (#154484 ) Fixes #154312	2025-08-20 10:34:44 +02:00
Amr Hesham	018c5ba161	[CIR] Implement MemberExpr with VarDecl for ComplexType (#154307 ) This change adds support for MemberExpr with VarDecl ComplexType Issue: https://github.com/llvm/llvm-project/issues/141365	2025-08-20 10:33:08 +02:00
Nikolas Klauser	2dc0a5f3cd	[libc++][NFC] Use early returns in a few basic_string functions (#137299 ) Using early returns tends to make the code easier to read, without any changes to the generated code.	2025-08-20 10:16:13 +02:00
Matt Arsenault	0fa6fdfbb8	AMDGPU: Correct inst size for av_mov_b32_imm_pseudo (#154459 ) In the AGPR case this will be an 8 byte instruction, which is part of why this case is a pain to deal with in the first place.	2025-08-20 17:01:48 +09:00
Benjamin Maxwell	eb2af3a5be	[ComplexDeinterleaving] Use BumpPtrAllocator for CompositeNodes (NFC) (#153217 ) I was looking over this pass and noticed it was using shared pointers for CompositeNodes. However, all nodes are owned by the deinterleaving graph and are not released until the graph is destroyed. This means a bump allocator and raw pointers can be used, which have a simpler ownership model and less overhead than shared pointers. The changes in this PR are to: - Add a `SpecificBumpPtrAllocator<CompositeNode>` to the `ComplexDeinterleavingGraph` - This allocates new nodes and will deallocate them when the graph is destroyed - Replace `NodePtr` and `RawNodePtr` with `CompositeNode *`	2025-08-20 08:59:31 +01:00
David Green	e3cf967cdf	[AArch64] Regenerate and update itofp.ll and fptoi.ll This updates the fp<->int tests to include some store(fptoi) and itofp(load) test cases. It also cuts down on the number of large vector cases which are not testing anything new.	2025-08-20 08:43:10 +01:00
Lang Hames	70e41a2ab1	[orc-rt] Add orc_rt::span, a pre-c++20 std::span substitute (#154478 ) This patch introduces an orc_rt::span class template that the ORC runtime can use until we're able to move to c++20.	2025-08-20 17:28:02 +10:00
Timm Baeder	a9de444aa1	[clang][bytecode][NFC] Use an anonymous union in Pointer (#154405 ) So we can save ourselves writing PointeeStorage all the time.	2025-08-20 09:23:19 +02:00

1 2 3 4 5 ...

549390 Commits