llvm-project

Author	SHA1	Message	Date
Jakub Kuderski	1633e0ba8b	[ADT] Add `from_range` constructor for (Small)DenseMap (#153515 ) This follows how we support range construction for (Small)DenseSet.	2025-08-14 08:53:52 -04:00
Ritanya-B-Bharadwaj	e3dcdb64ee	Claiming support for groupprivate and variable-category (#153553 )	2025-08-14 18:15:46 +05:30
Jaden Angella	bfda0e777d	[mlir][EmitC] Expand the MemRefToEmitC pass - Lowering `CopyOp` (#151206 ) This patch lowers `memref.copy` to `emitc.call_opaque "memcpy"`. From: ``` func.func @copying(%arg0 : memref<9x4x5x7xf32>, %arg1 : memref<9x4x5x7xf32>) { memref.copy %arg0, %arg1 : memref<9x4x5x7xf32> to memref<9x4x5x7xf32> return } ``` To: ```cpp #include <cstring> void copying(float v1[9][4][5][7], float v2[9][4][5][7]) { size_t v3 = 0; float* v4 = &v2[v3][v3][v3][v3]; float* v5 = &v1[v3][v3][v3][v3]; size_t v6 = sizeof(float); size_t v7 = 1260; size_t v8 = v6 * v7; memcpy(v5, v4, v8); return; } ```	2025-08-14 05:25:55 -07:00
lonely eagle	6d08a39eeb	[mlir][nvgpu] Add tma last dim bytes check (#153451 ) Add the check the number of bytes in the last dimension of Tma must be a multiple of 16.	2025-08-14 20:14:20 +08:00
Igor Wodiany	87de48d11f	[mlir][spirv] Add spirv validation for module.mlir target test (#153227 ) Creating this patch as an example on using the new `mlir-translate` flag. Eventually all tests will be updated to validate SPIR-V modules.	2025-08-14 12:45:55 +01:00
Vincent	d3bbdc7bde	[clang] constexpr `__builtin_elementwise_abs` support (#152497 ) Added constant evaluation support for `__builtin_elementwise_abs` on integer, float and vector type. fixes #152276 --------- Co-authored-by: Simon Pilgrim <llvm-dev@redking.me.uk>	2025-08-14 12:34:23 +01:00
Lang Hames	3bc3b4cf5f	[ORC] Add cloneExternalModuleToContext API. cloneExternalModuleToContext can be used to clone an LLVM module onto a given ThreadSafeContext. Callers of this function are responsible for ensuring exclusive access to the source module and its LLVMContext.	2025-08-14 21:21:17 +10:00
mdenson	f5b36eb3a4	[clang] fix comment lexing of command names with underscore (#152943 ) Comment lexer fails to parse non-alphanumeric names. fixes #33296 --------- Co-authored-by: Brock Denson <brock.denson@virscient.com>	2025-08-14 13:03:55 +02:00
Theodoros Theodoridis	d15b7a83a7	[llvm][LICM] Limit multi-use BOAssociation to FP and Vector (#149829 ) Limit the re-association of BOps with multiple users to FP and Vector arithmetic.	2025-08-14 11:56:55 +01:00
Corentin Jabot	186176de45	[Clang] Do not consider a variadic function ellipsis part of a default arg (#153496 ) When stashing the tokens of a parameter of a member function, we would munch an ellipsis, as the only considered terminal conditions were `,` and `)`. Fixes #153445	2025-08-14 12:51:58 +02:00
Andrzej Warzyński	8d4f3171fa	[mlir][linalg] Fix UnPackOp::getTiledOuterDims (#152960 ) Fixes `getTiledOuterDims` by making sure that the `outer_dims_perm` attribute from `linalg.unpack` is taken into account. Fixes #152037	2025-08-14 11:39:50 +01:00
Michael Kruse	38853a0146	[flang][OpenMP] MSVC buildbot fix PR #153488 caused the msvc build (https://lab.llvm.org/buildbot/#/builders/166/builds/1397) to fail: ``` ..\llvm-project\flang\include\flang/Evaluate/rewrite.h(78): error C2668: 'Fortran::evaluate::rewrite::Identity::operator ()': ambiguous call to overloaded function ..\llvm-project\flang\include\flang/Evaluate/rewrite.h(43): note: could be 'Fortran::evaluate::Expr<Fortran::evaluate::SomeType> Fortran::evaluate::rewrite::Identity::operator ()<Fortran::evaluate::SomeType,S>(Fortran::evaluate::Expr<Fortran::evaluate::SomeType> &&,const U &)' with [ S=Fortran::evaluate::value::Integer<128,true,32,unsigned int,unsigned __int64,128>, U=Fortran::evaluate::value::Integer<128,true,32,unsigned int,unsigned __int64,128> ] ..\llvm-project\flang\lib\Semantics\check-omp-atomic.cpp(174): note: or 'Fortran::evaluate::Expr<Fortran::evaluate::SomeType> Fortran::semantics::ReassocRewriter::operator ()<Fortran::evaluate::SomeType,S,void>(Fortran::evaluate::Expr<Fortran::evaluate::SomeType> &&,const U &,Fortran::semantics::ReassocRewriter::NonIntegralTag)' with [ S=Fortran::evaluate::value::Integer<128,true,32,unsigned int,unsigned __int64,128>, U=Fortran::evaluate::value::Integer<128,true,32,unsigned int,unsigned __int64,128> ] ..\llvm-project\flang\include\flang/Evaluate/rewrite.h(78): note: while trying to match the argument list '(Fortran::evaluate::Expr<Fortran::evaluate::SomeType>, const S)' with [ S=Fortran::evaluate::value::Integer<128,true,32,unsigned int,unsigned __int64,128> ] ..\llvm-project\flang\include\flang/Evaluate/rewrite.h(78): note: the template instantiation context (the oldest one first) is ..\llvm-project\flang\lib\Semantics\check-omp-atomic.cpp(814): note: see reference to function template instantiation 'U Fortran::evaluate::rewrite::Mutator<Fortran::semantics::ReassocRewriter>::operator ()<const Fortran::evaluate::Expr<Fortran::evaluate::SomeType>&,Fortran::evaluate::Expr<Fortran::evaluate::SomeType>>(T)' being compiled with [ U=Fortran::evaluate::Expr<Fortran::evaluate::SomeType>, T=const Fortran::evaluate::Expr<Fortran::evaluate::SomeType> & ] ``` The reason is that there is an ambiguity between operator() of ReassocRewriter itself and operator() of the base class `Identity` through `using Id::operator();`. By the C++ specification, method declarations in ReassocRewriter hide methods with the same signature from a using declaration, but this does not apply to ``` evaluate::Expr<T> operator()(..., NonIntegralTag = {}) ``` which has a different signature due to an additional tag parameter. Since it has a default value, it is ambiguous with operator() without tag parameter. GCC and Clang both accept this, but in my understanding MSVC is correct here. Since the overloads of ReassocRewriter cover all cases (integral and non-integral), removing the using declaration to avoid the ambiguity.	2025-08-14 12:30:59 +02:00
Florian Hahn	d92671cf7d	[PhaseOrdering] Add tests for optimizing std::find for AArch64.	2025-08-14 11:25:55 +01:00
Ege Beysel	8de85e753f	[mlir][linalg] Add support for scalable vectorization of `linalg.batch_mmt4d` (#152984 ) This PR builds upon the previous #146531 and enables scalable vectorization for `batch_mmt4d` as well. --------- Signed-off-by: Ege Beysel <beyselege@gmail.com>	2025-08-14 11:47:51 +02:00
Simon Pilgrim	c96d0da62b	[X86] lowerShuffleAsLanePermuteAndPermute - ensure we've simplified the demanded shuffle mask elts before testing for a matching shuffle (#153554 ) When lowering using sublane shuffles, we can sometimes end up with the same mask as we started with. We already bail in these occasions, but we weren't fully simplifying the new shuffle mask before testing if it matched. Fixes #153457	2025-08-14 10:47:11 +01:00
Matheus Izvekov	9255580a3a	[clang] fix skipped parsing of late parsed attributes (#153558 )	2025-08-14 06:42:55 -03:00
tangaac	9315d701eb	[LoongArch] Optimize inserting extracted element for v4i64/v8i32 (#152629 )	2025-08-14 17:06:50 +08:00
Björn Pettersson	5e7924a3cb	[SelectionDAG] Handle more opcodes in isGuaranteedNotToBeUndefOrPoison (#147019 ) Add special handling of EXTRACT_SUBVECTOR, INSERT_SUBVECTOR, EXTRACT_VECTOR_ELT, INSERT_VECTOR_ELT and SCALAR_TO_VECTOR in isGuaranteedNotToBeUndefOrPoison. Make use of DemandedElts to improve the analysis and only check relevant elements for each operand. Also start using DemandedElts in the recursive calls that check isGuaranteedNotToBeUndefOrPoison for all operands for operations that do not create undef/poison. We can do that for a number of elementwise operations for which the DemandedElts can be applied to every operand (e.g. ADD, OR, BITREVERSE, TRUNCATE).	2025-08-14 09:05:15 +00:00
Jan Patrick Lehr	cd8c3bdf14	[ARM] Fix after #153394 (#153561 ) This removes two double definitions.	2025-08-14 11:00:19 +02:00
TianYe	44e6bc6fc0	[Headers][X86] Allow AVX2/AVX512 broadcast intrinsics to be used in Constexpr (#153363 ) Fix [issue](https://github.com/llvm/llvm-project/issues/152499) This patch adds support for the following broadcast intrinsics by wrapping them around existing generic shuffle implementations: ``` _mm_broadcastb_epi8 _mm_broadcastw_epi16 _mm_broadcastd_epi32 _mm_broadcastq_epi64 _mm_broadcastss_ps _mm_broadcastsd_pd _mm256_broadcastb_epi8 _mm256_broadcastw_epi16 _mm256_broadcastd_epi32 _mm256_broadcastq_epi64 _mm256_broadcastss_ps _mm256_broadcastsd_pd _mm256_broadcastsi128_si256 _mm512_broadcastb_epi8 _mm512_broadcastw_epi16 _mm512_broadcastd_epi32 _mm512_broadcastq_epi64 _mm512_broadcastss_ps _mm512_broadcastsd_pd _mm512_broadcast_f32x2 _mm256_broadcast_f32x2 _mm512_broadcast_i32x2 _mm256_broadcast_i32x2 _mm_broadcast_i32x2 _mm512_broadcast_f32x4 _mm256_broadcast_f32x4 _mm512_broadcast_i32x4 _mm256_broadcast_i32x4 _mm512_broadcast_f32x8 _mm512_broadcast_i32x8 _mm512_broadcast_f64x2 _mm256_broadcast_f64x2 _mm512_broadcast_i64x2 _mm256_broadcast_i64x2 _mm512_broadcast_f64x4 _mm512_broadcast_i64x4 ``` Co-authored-by: Simon Pilgrim <llvm-dev@redking.me.uk>	2025-08-14 09:40:11 +01:00
mcbarton	b24b8a5bb4	Enable running ClangReplInterpreterTests in an Emscripten environment (#150977 ) @vgvassilev @anutosh491 This is what it took for me to enable running ClangReplInterpreterTests in an Emscripten environment. When I ran this patch for llvm 20 we could run InterpreterTest.InstantiateTemplate , but now it crashes gtest when running in node. Let me know what you think.	2025-08-14 14:07:13 +05:30
Matt Arsenault	ddb2dc50af	ARM: Move gnu half convert calling conv config into tablegen (#153394 )	2025-08-14 17:36:29 +09:00
Matt Arsenault	4aae7bc625	ARM: Move half convert libcall config to tablegen (#153389 )	2025-08-14 17:35:58 +09:00
Shoreshen	04aebbfbe2	[AMDGPU] Delete AMDGPU Unify Metadata pass (#153548 ) Fixes #153150	2025-08-14 16:16:32 +08:00
David Spickett	b0151cb91d	[compiler-rt][hwasan][test] Tweak check in release-shadow.c (#153181 ) Since we (Linaro) moved out bots to a new machine, this test has been failing: https://lab.llvm.org/buildbot/#/builders/121/builds/1566 Most of the time, the rss difference is greater than 512 on the first iteration then settles down to 512 for all the rest. ``` starting rss 512 shadow pages: 1024 p = 0xe083e0800000 1536 -> 740 diff 796 1252 -> 740 diff 512 1252 -> 740 diff 512 1252 -> 740 diff 512 1252 -> 740 diff 512 1252 -> 740 diff 512 1252 -> 740 diff 512 1252 -> 740 diff 512 1252 -> 740 diff 512 1252 -> 740 diff 512 p = 0xe083e0800000 passed 1 out of 10 release-shadow.c.tmp: /home/tcwg-buildbot/worker/clang-aarch64-lld-2stage/llvm/compiler-rt/test/hwasan/TestCases/Linux/release-shadow.c:81: int main(): Assertion `success_count > total_count * 0.8' failed. ``` Given that the test was looking for a diff of at least 513, I guess that 512 is ok too. For future reference, the original bot host was running this kernel: Linux 5.15.0-136-generic #147-Ubuntu SMP Sat Mar 15 15:51:36 UTC 2025 aarch64 aarch64 aarch64 GNU/Linux And the new host: Linux 6.8.0-64-generic #67-Ubuntu SMP PREEMPT_DYNAMIC Sun Jun 15 20:23:40 UTC 2025 aarch64 aarch64 aarch64 GNU/Linux Though the new host also has more RAM, so the kernel may be less aggresive with memory management.	2025-08-14 09:13:53 +01:00
Nikita Popov	d1952baa5d	[CodeGen] Remove unnecessary setTypeListBeforeSoften() parameter (NFC) It does not make sense to set the softening type list without setting IsSoften=true.	2025-08-14 10:04:56 +02:00
Elvis Wang	01fac67e2a	[TTI] Add cost kind to getAddressComputationCost(). NFC. (#153342 ) This patch add cost kind to `getAddressComputationCost()` for #149955. Note that this patch also remove all the default value in `getAddressComputationCost()`.	2025-08-14 16:01:44 +08:00
Piotr Fusik	18782db4c9	[RISCV] Improve instruction selection for most significant bit extraction (#151687 ) (seteq (and X, 1<<XLEN-1), 0) -> (xori (srli X, XLEN-1), 1) (seteq (and X, 1<<31), 0) -> (xori (srliw X, 31), 1) // RV64 (setlt X, 0) -> (srli X, XLEN-1) // SRLI is compressible (setlt (sext X), 0) -> (srliw X, 31) // RV64	2025-08-14 09:59:43 +02:00
Nikolas Klauser	7b904b09eb	[libc++] Remove assertions from <string_view> that are unreachable (#148598 ) When assertions are enabled it is impossible to construct a `string_view` which contains a null pointer and a non-zero size, so assertions where we check for that on an already constructed `string_view` are unreachable.	2025-08-14 09:24:20 +02:00
Nikolas Klauser	5b258884db	[libc++] Document how __tree is laid out and how we iterate through it (#152453 )	2025-08-14 09:23:23 +02:00
Pavel Skripkin	30144226a4	[llvm] [InstCombine] fold "icmp eq (X + (V - 1)) & -V, X" to "icmp eq (and X, V - 1), 0" (#152851 ) This fold optimizes ```llvm define i1 @src(i32 %num, i32 %val) { %mask = add i32 %val, -1 %neg = sub nsw i32 0, %val %num.biased = add i32 %num, %mask %_2.sroa.0.0 = and i32 %num.biased, %neg %_0 = icmp eq i32 %_2.sroa.0.0, %num ret i1 %_0 } ``` to ```llvm define i1 @tgt(i32 %num, i32 %val) { %mask = add i32 %val, -1 %tmp = and i32 %num, %mask %ret = icmp eq i32 %tmp, 0 ret i1 %ret } ``` For power-of-two `val`. Observed in real life for following code ```rust pub fn is_aligned(num: usize) -> bool { num.next_multiple_of(1 << 12) == num } ``` which verifies that num is aligned to 4096. Alive2 proof https://alive2.llvm.org/ce/z/QisECm	2025-08-14 10:23:03 +03:00
Carl Ritson	f92afe7171	[AMDGPU] Preserve post dominator tree through SILowerControlFlow (#153528 ) Change dominator tree updates to also handle post dominator tree.	2025-08-14 16:19:46 +09:00
XChy	f393f2a61e	[BranchFolding] Avoid moving blocks to fall through to an indirect target (#152916 ) Depend on #152591 to fix https://github.com/llvm/llvm-project/issues/149023. Similar to an EH pad, there is no real advantage in "falling through" to an indirect target of an INLINEASM_BR. And multiple indirect targets of inline asm at the end of a function may be rotated infinitely. Therefore, this patch avoids such optimization on indirect target of inline asm as fall through.	2025-08-14 16:18:36 +09:00
David Green	4c28bbf5b8	[AArch64] Fix ‘>= 0’ is always true warning. NFC	2025-08-14 08:17:10 +01:00
Matt Arsenault	bbcac029db	ARM: Move more aeabi libcall config into tablegen (#152109 )	2025-08-14 15:43:15 +09:00
quic_hchandel	71b066e3a2	[RISCV] Add CodeGen support for qc.insbi and qc.insb insert instructions (#152447 ) This patch adds CodeGen support for qc.insbi and qc.insb instructions defined in the Qualcomm uC Xqcibm extension. qc.insbi and qc.insb inserts bits into destination register from immediate and register operand respectively. A sequence of `xor`, `and` & `xor` depending on appropriate conditions are converted to `qc.insbi` or `qc.insb` which depends on the immediate's value.	2025-08-14 12:08:28 +05:30
Chuanqi Xu	ab5a5a90c0	[C++20] [Modules] Fix incorrect diagnostic for using befriend target Close https://github.com/llvm/llvm-project/issues/138558 The compiler failed to understand the redeclaration-relationship when performing checks when MergeFunctionDecl. This seemed to be a complex circular problem (how can we know the redeclaration relationship before performing merging?). But the fix seems to be easy and safe. It is fine to only perform the check only if the using decl is a local decl.	2025-08-14 14:23:14 +08:00
Stanislav Mekhanoshin	23b65edfbc	[AMDGPU] Add NV bit to CPol::ALL mask. NFCI. (#153487 )	2025-08-13 23:02:50 -07:00
Stanislav Mekhanoshin	1216152f30	[AMDGPU] Fix the comment for OperandType. NFC. (#153489 )	2025-08-13 23:02:28 -07:00
Stanislav Mekhanoshin	80d430df5d	[AMDGPU] Add MSG_SAVEWAVE_HAS_TDM on gfx1250 (#153483 )	2025-08-13 23:01:50 -07:00
Stanislav Mekhanoshin	fc911fe928	[AMDGPU] Add HW_REG_IB_STS2 on gfx1250 (#153479 )	2025-08-13 23:01:28 -07:00
Stanislav Mekhanoshin	cc0d227154	[AMDGPU] Disable s_setkill on gfx1250 (#153471 )	2025-08-13 23:01:04 -07:00
Stanislav Mekhanoshin	742bcee2a0	[AMDGPU] Drop duplicated field HasMatrixReuse. NFCI. (#153467 )	2025-08-13 23:00:30 -07:00
David Green	d9d9d9ad19	[ARM][MVE] Add shuffle costs for LDn and STn instructions. (#145304 ) LD2 is represented in IR as deinterleave-shuffle(load), and ST2 as store(interleave-shuffle). Whilst the shuffle would be expensive in general for MVE (it does not have zip/uzp instructions), it should be treated as cheap when part of the LD2/ST2 pattern. This borrows some code from the AArch64 backed to produce lower costs. (Some of which still shows as higher than it should - that just shows how broken the generic shuffle costs are at the moment, they would be lower if getShuffleCost was called directly as opposed to going through getInstructionCost).	2025-08-14 06:59:37 +01:00
Carlos Galvez	3b6d8798ba	[clang-tidy][doc] Improve documentation of the -line-filter flag (#153372 ) Fixes #25589 Co-authored-by: Carlos Gálvez <carlos.galvez@zenseact.com>	2025-08-14 07:55:20 +02:00
Terapines MLIR	c164e6309b	[flang][fir] Add conversion of `fir.iterate_while` to `scf.while`. (#152439 ) This commmit is a supplement for https://github.com/llvm/llvm-project/pull/140374. RFC:https://discourse.llvm.org/t/rfc-add-fir-affine-optimization-fir-pass-pipeline/86190/6	2025-08-14 13:39:55 +08:00
Aleksei Babushkin	aa503f6572	[compiler-rt][libFuzzer] Add %run directives to focus-function.test (#153185 ) Contrary to most testcases in the libFuzzer test suite, `focus-function.test` seems to lack the `%run` directives, which is an inconvenience in cases when `%run` actually gets substituted for something. This PR adds said directives.	2025-08-14 08:36:25 +03:00
Craig Topper	ace08d5ccf	[RISCV] Add MC support for more P extension instructions. (#153458 ) These instructions are the shift by immediate and saturate by immediate instructions from the top half of page 9 of https://jhauser.us/RISCV/ext-P/RVP-instrEncodings-015.pdf I've also improved the CHECK lines in the invalid tests to check line and column number from the diagnostic. Co-authored-by: realqhc <caiqihan021@hotmail.com>	2025-08-13 22:07:03 -07:00
Oliver Hunt	d8850ee6c0	[clang][Obj-C][PAC] Add support for authenticating block metadata (#152978 ) Introduces the use of pointer authentication to protect the invocation, copy and dispose, reference, and descriptor pointers in Objective-C block objects. Resolves #141176	2025-08-13 22:01:24 -07:00
Craig Topper	9f96e3f80f	[SelectionDAG] Pass SDValue to InstrEmitter::EmitCopyFromReg. NFC (#153485 ) Instead of passing SDNode and ResNo separately. This allows us to use SDValue::operator== and avoid creating SDValue from the operands inside the function.	2025-08-13 21:46:48 -07:00

1 2 3 4 5 ...

548550 Commits