llvm-project

Author	SHA1	Message	Date
Aiden Grossman	5b2c3aac90	[MCA][X86] Pretend To Have a Stack Engine (#153348 ) This patch removes RSP dependencies from push and pop instructions to pretend that we have a stack engine. This does not model details like sync uops that are relevant implementation details due to complexity. This is just enabled on all X86 CPUs given LLVM does not have a scheduling model for any X86 CPU that does not have a stack engine. This fixes #152008.	2025-08-18 13:44:43 +00:00
Shilei Tian	e37eff5dcd	[AMDGPU] Add an option to completely disable kernel argument preload (#153975 ) The existing `amdgpu-kernarg-preload-count` can't be used as a switch to turn it off if it is set to 0. This PR adds an extra option to turn it off. Fixes SWDEV-550147.	2025-08-18 09:44:20 -04:00
Jonathan Thackray	f38c83c582	[AArch64][llvm] Disassemble instructions in `SYS` alias encoding space more correctly (#153905 ) For instructions in the `SYS` alias encoding space which take no register operands, and where the unused 5 register bits are not all set (0x31, 0b11111), then disassemble to a `SYS` alias and not the instruction, since it is not considered valid. This is because it is specified in the Arm ARM in text similar to this (e.g. page C5-1037 of DDI0487L.b for `TLBI ALLE1`, or page C5-1585 for `GCSPOPX`): ``` Rt should be encoded as 0b11111. If the Rt field is not set to 0b11111, it is CONSTRAINED UNPREDICTABLE whether: * The instruction is UNDEFINED. * The instruction behaves as if the Rt field is set to 0b11111. ``` Since we want to follow "should" directives, and not encourage undefined behaviour, only assemble or disassemble instructions considered valid. Add an extra test-case for this, and all existing test-cases are continuing to pass.	2025-08-18 14:41:41 +01:00
Timm Baeder	31d2db2a68	[clang][bytecode][NFC] Use UnsignedOrNone for Block::DeclID (#154104 )	2025-08-18 15:40:44 +02:00
Erich Keane	340fa3e1bb	[OpenACC] Implement firstprivate lowering except init. (#153847 ) This patch implements the basic lowering infrastructure, but does not quite implement the copy initialization, which requires #153622. It does however pass verification for the 'copy' section, which just contains a yield.	2025-08-18 06:33:40 -07:00
Aiden Grossman	1650e4a73c	[X86] Remove TuningPOPCNTFalseDeps from AlderLake (#154004 ) This false dependency issue was fixed in CannonLake looking at the data from uops.info. This is confirmed not to be an issue based on benchmarking data in #153983. Setting this can potentially lead to extra xor instructions whihc could consume extra frontend/renaming resources. None of the other CPUs that have had this fixed have the tuning flag. Fixes #153983.	2025-08-18 06:31:16 -07:00
Matthias Springer	f84aaa6eaa	[mlir][Transforms] Dialect conversion: Add flag to dump materialization kind (#119532 ) Add a debugging flag to the dialect conversion to dump the materialization kind. This flag is useful to find out whether a missing materialization rule is for source or target materializations. Also add missing test coverage for the `buildMaterializations` flag.	2025-08-18 13:25:18 +00:00
Nikita Popov	ba45ac61b6	[CAS] Temporarily disable broken test This test hangs forever if executed with less than three cores available, see: https://github.com/llvm/llvm-project/pull/114096#issuecomment-3196698403	2025-08-18 15:09:08 +02:00
Chaitanya	4a3bf27c69	[OpenMP] Introduce omp.target_allocmem and omp.target_freemem omp dialect ops. (#145464 ) This PR introduces two new ops in omp dialect, omp.target_allocmem and omp.target_freemem. omp.target_allocmem: Allocates heap memory on device. Will be lowered to omp_target_alloc call in llvm. omp.target_freemem: Deallocates heap memory on device. Will be lowered to omp+target_free call in llvm. Example: %1 = omp.target_allocmem %device : i32, i64 omp.target_freemem %device, %1 : i32, i64 The work in this PR is C-P/inspired from @ivanradanov commit from coexecute implementation: [Add fir omp target alloc and free ops](`be860ac8ba`) [Lower omp_target_{alloc,free} to llvm](`6e2d584dc9`)	2025-08-18 18:15:11 +05:30
jofrn	e8e3e6e893	[LiveVariables] Mark use without implicit if defined at instr (#119446 ) LiveVariables will mark instructions with their implicit subregister uses. However, it will also mark the subregister as an implicit if its own definition is a subregister of it, i.e. `$r3 = OP val, implicit-def $r0_r1_r2_r3, ..., implicit $r2_r3`, even if it is otherwise unused, which defines $r3 on the same line it is used. This change ensures such uses are marked without implicit, i.e. `$r3 = OP val, implicit-def $r0_r1_r2_r3, ..., $r2_r3`. --------- Co-authored-by: Matt Arsenault <arsenm2@gmail.com>	2025-08-18 08:34:59 -04:00
Akash Banerjee	6aafe6582d	Fix test added in 1fd1d634630754cc9b9c4b5526961d5856f64ff9	2025-08-18 13:29:23 +01:00
ZhaoQi	8f671a675f	[LoongArch] Always emit symbol-based relocations regardless of relaxation (#153943 ) This commit changes all relocations to be relocated with symbols. Without this commit, errors may occur in some cases, such as when using `llc/lto+relax`, or combining relaxed and norelaxed object files using `ld -r`. Some tests updated.	2025-08-18 20:15:49 +08:00
Jonathan Cohen	c6fe567064	[AArch64][MachineCombiner] Combine sequences of gather patterns (#152979 ) Reland of #142941 Squashed with fixes for #150004, #149585 This pattern matches gather-like patterns where values are loaded per lane into neon registers, and replaces it with loads into 2 separate registers, which will be combined with a zip instruction. This decreases the critical path length and improves Memory Level Parallelism. rdar://151851094	2025-08-18 15:10:59 +03:00
Utkarsh Saxena	673750feea	[LifetimeSafety] Implement a basic use-after-free diagnostic (#149731 ) Implement use-after-free detection in the lifetime safety analysis with two warning levels. - Added a `LifetimeSafetyReporter` interface for reporting lifetime safety issues - Created two warning levels: - Definite errors (reported with `-Wexperimental-lifetime-safety-permissive`) - Potential errors (reported with `-Wexperimental-lifetime-safety-strict`) - Implemented a `LifetimeChecker` class that analyzes loan propagation and expired loans to detect use-after-free issues. - Added tracking of use sites through a new `UseFact` class. - Enhanced the `ExpireFact` to track the expressions where objects are destroyed. - Added test cases for both definite and potential use-after-free scenarios. The implementation now tracks pointer uses and can determine when a pointer is dereferenced after its loan has been expired, with appropriate diagnostics. The two warning levels provide flexibility - definite errors for high-confidence issues and potential errors for cases that depend on control flow.	2025-08-18 13:46:43 +02:00
Kareem Ergawy	c1e2a9c66d	[flang][OpenMP] Only privaize pre-determined symbols when defined the evaluation. (#154070 ) Fixes a regression uncovered by Fujitsu test 0686_0024.f90. In particular, verifies that a pre-determined symbol is only privatized by its defining evaluation (e.g. the loop for which the symbol was marked as pre-determined).	2025-08-18 13:36:08 +02:00
Mehdi Amini	cfe5975eaf	[MLIR] Fix SCF verifier crash (#153974 ) An operand of the nested yield op can be null and hasn't been verified yet when processing the enclosing operation. Using `getResultTypes()` will dereference this null Value and crash in the verifier.	2025-08-18 12:48:55 +02:00
Simon Pilgrim	681ecae913	[DAG] visitTRUNCATE - test abd legality early to avoid unnecessary computeKnownBits/ComputeNumSignBits calls. NFC. (#154085 ) isOperationLegal is much cheaper than value tracking	2025-08-18 11:06:29 +01:00
林克	6842cc5562	[RISCV] Add SpacemiT XSMTVDot (SpacemiT Vector Dot Product) extension. (#151706 ) The full spec can be found at spacemit-x60 processor support scope: Section 2.1.2.2 (Features): https://developer.spacemit.com/documentation?token=BWbGwbx7liGW21kq9lucSA6Vnpb#2.1 This patch only supports assembler.	2025-08-18 18:03:17 +08:00
Arne Stenkrona	ea2f5395b1	[SimplifyCFG] Avoid threading for loop headers (#151142 ) Updates SimplifyCFG to avoid jump threading through loop headers if -keep-loops is requested. Canonical loop form requires a loop header that dominates all blocks in the loop. If we thread through a header, we risk breaking its domination of the loop. This change avoids this issue by conservatively avoiding threading through headers entirely. Fixes: https://github.com/llvm/llvm-project/issues/151144	2025-08-18 09:46:55 +00:00
Simon Pilgrim	169b43d4b8	Remove unused variable introduced in #152705	2025-08-18 10:45:09 +01:00
ZhaoQi	76fb1619f0	[LoongArch] Reduce number of reserved relocations when relax enabled (#153769 )	2025-08-18 17:42:43 +08:00
Andrzej Warzyński	51b5a3e1a6	[MLIR] Add Egress dialects maintainers (#151721 ) As per https://discourse.llvm.org/t/mlir-project-maintainers/87189, this PR adds maintainers for the "egress" dialects. Compared to the original proposal, two changes are included: * The "mesh" dialect has been renamed to "shard" (https://discourse.llvm.org/t/mlir-mesh-cleanup-mesh/). * The "XeVM" dialect has been added (https://discourse.llvm.org/t/rfc-proposal-for-new-xevm-dialect/).	2025-08-18 10:34:44 +01:00
Simon Pilgrim	36f911173a	[X86] avx512vlbw-builtins.c - add C/C++ test coverage	2025-08-18 10:30:15 +01:00
Simon Pilgrim	6036e5d0d7	[X86] avx512vlbw-reduceIntrin.c - add C/C++ and -fno-signed-char test coverage	2025-08-18 10:30:14 +01:00
Timm Baeder	0d05c42b6a	[clang][bytecode] Improve __builtin_{,dynamic_}object_size implementation (#153601 )	2025-08-18 11:12:33 +02:00
Oliver Hunt	bcab8ac126	[clang] return type not correctly deduced for discarded lambdas (#153921 ) The early return for lamda expressions with deduced return types in Sema::ActOnCapScopeReturnStmt meant that we were not actually perform the required return type deduction for such lambdas when in a discarded context. This PR removes that early return allowing the existing return type deduction steps to be performed. Fixes #153884 Fix developed by, and Co-authored-by: Corentin Jabot <corentinjabot@gmail.com>	2025-08-18 02:07:27 -07:00
Mehdi Amini	16aa283344	[MLIR] Refactor the walkAndApplyPatterns driver to remove the recursion (#154037 ) This is in preparation of a follow-up change to stop traversing unreachable blocks. This is not NFC because of a subtlety of the early_inc. On a test case like: ``` scf.if %cond { "test.move_after_parent_op"() ({ "test.any_attr_of_i32_str"() {attr = 0 : i32} : () -> () }) : () -> () } ``` We recursively traverse the nested regions, and process an op when the region is done (post-order). We need to pre-increment the iterator before processing an operation in case it gets deleted. However we can do this before or after processing the nested region. This implementation does the latter.	2025-08-18 09:07:19 +00:00
Balázs Kéri	a0f325bd41	[clang-tidy] Added check 'misc-override-with-different-visibility' (#140086 )	2025-08-18 11:00:42 +02:00
Mehdi Amini	87e6fd161a	[MLIR] Erase unreachable blocks before applying patterns in the greedy rewriter (#153957 ) Operations like: %add = arith.addi %add, %add : i64 are legal in unreachable code. Unfortunately many patterns would be unsafe to apply on such IR and can lead to crashes or infinite loops. To avoid this we can remove unreachable blocks before attempting to apply patterns. We may have to do this also whenever the CFG is changed by a pattern, it is left up for future work right now. Fixes #153732	2025-08-18 10:59:43 +02:00
David Sherwood	7ee6cf06c8	[LV] Fix incorrect cost kind in VPReplicateRecipe::computeCost (#153216 ) We were incorrectly using the TTI::TCK_RecipThroughput cost kind and ignoring the kind set in the context.	2025-08-18 09:52:31 +01:00
hstk30-hw	c99cbc880f	[llvm] Fix typo for CGProfile (NFC) (#153370 )	2025-08-18 16:46:27 +08:00
Joachim	98dd1888bf	[OpenMP][Test][NFC] output tool data as hex to improve readibility (#152757 ) Using hex format allows to better interpret IDs: the first digits represent the thread number, the last digits represent the ID within a thread The main change is in callback.h: PRIu64 -> PRIx64 The patch also guards RUN/CHECK lines in openmp/runtime/tests/ompt with clang-format on/off comments and clang-formats the directory. --------- Co-authored-by: Kaloyan Ignatov <kaloyan.ignatov@rwth-aachen.de>	2025-08-18 10:42:33 +02:00
Simon Pilgrim	ce5276f61c	[Clang][X86] Add avx512 __builtin_ia32_select* constexpr handling (#152705 ) This should allow us to constexpr many avx512 predicated intrinsics where they wrap basic intrinsics that are already constexpr Fixes #152321	2025-08-18 09:37:20 +01:00
Matthias Springer	ff68f7115c	[mlir][builtin] Make `unrealized_conversion_cast` inlineable (#139722 ) Until now, `builtin.unrealized_conversion_cast` ops could not be inlined by the Inliner pass.	2025-08-18 10:23:26 +02:00
Matt Arsenault	53e9d3247e	DAG: Remove unnecessary getPointerTy call (#154055 ) getValueType already did this	2025-08-18 17:12:16 +09:00
David Green	8f98529209	[AArch64] Remove SIMDLongThreeVectorTiedBHSabal tablegen class. Similar to #152987 this removes SIMDLongThreeVectorTiedBHSabal as it is equivalent to SIMDLongThreeVectorTiedBHS with a better TriOpFrag pattern.	2025-08-18 09:11:13 +01:00
ZhaoQi	8181c76bca	[LoongArch][NFC] More tests to ensure branch relocs reserved when relax enabled (#153768 )	2025-08-18 16:07:36 +08:00
Ahmad Yasin	1b0bce972b	Reorder checks to speed up getAppleRuntimeUnrollPreferences() (#154010 ) - Delay load/store values calculation unless a best unroll-count is found - Remove extra getLoopLatch() invocation	2025-08-18 11:06:37 +03:00
Matthias Springer	f7b09ad700	[mlir][LLVM] `ArithToLLVM`: Add 1:N support for `arith.select` lowering (#153944 ) Add 1:N support for the `arith.select` lowering. Only cases where the entire true/false value is selected are supported.	2025-08-18 09:42:37 +02:00
Jim Lin	127ba533bd	[RISCV] Remove ST->hasVInstructions() from getIntrinsicInstrCost for cttz/ctlz/ctpop. NFC. (#154064 ) That isn't necessary if we've checked ST->hasStdExtZvbb().	2025-08-18 15:24:25 +08:00
Nikita Popov	246a64a12e	[Clang] Rename HasLegalHalfType -> HasFastHalfType (NFC) (#153163 ) This option is confusingly named. What it actually controls is whether, under the default of `-ffloat16-excess-precision=standard`, it is beneficial for performance to perform calculations on float (without intermediate rounding) or not. For `-ffloat16-excess-precision=none` the LLVM `half` type will always be used, and all backends are expected to legalize it correctly.	2025-08-18 09:23:48 +02:00
Nikita Popov	238c3dcd0d	[CodeGen][Mips] Remove fp128 libcall list (#153798 ) Mips requires fp128 args/returns to be passed differently than i128. It handles this by inspecting the pre-legalization type. However, for soft float libcalls, the original type is currently not provided (it will look like a i128 call). To work around that, MIPS maintains a list of libcalls working on fp128. This patch removes that list by providing the original, pre-softening type to calling convention lowering. This is done by carrying additional information in CallLoweringInfo, as we unfortunately do need both types (we want the un-softened type for OrigTy, but we need the softened type for the actual register assignment etc.) This is in preparation for completely removing all the custom pre-analysis code in the Mips backend and replacing it with use of OrigTy.	2025-08-18 09:22:41 +02:00
David Green	790bee99de	[VectorCombine] Remove dead node immediately in VectorCombine (#149047 ) The vector combiner will process all instructions as it first loops through the function, adding any newly added and deleted instructions to a worklist which is then processed when all nodes are done. These leaves extra uses in the graph as the initial processing is performed, leading to sub-optimal decisions being made for other combines. This changes it so that trivially dead instructions are removed immediately. The main changes that this requires is to make sure iterator invalidation does not occur.	2025-08-18 07:55:21 +01:00
ZhaoQi	6957e44d8e	[LoongArch][MC] Refine conditions for emitting ALIGN relocations (#153365 ) According to the suggestions in https://github.com/llvm/llvm-project/pull/150816, this commit refine the conditions for emitting R_LARCH_ALIGN relocations. Some existing tests are updated to avoid being affected by this optimization. New tests are added to verify: removal of redundant ALIGN relocations, ALIGN emitted after the first linker-relaxable instruction, and conservatively emitted ALIGN in lower-numbered subsections.	2025-08-18 14:54:27 +08:00
Kazu Hirata	b6a62a496f	[ADT] Use range-based for loops in SetVector (NFC) (#154058 )	2025-08-17 23:46:43 -07:00
Kazu Hirata	cbf5af9668	[llvm] Remove unused includes (NFC) (#154051 ) These are identified by misc-include-cleaner. I've filtered out those that break builds. Also, I'm staying away from llvm-config.h, config.h, and Compiler.h, which likely cause platform- or compiler-specific build failures.	2025-08-17 23:46:35 -07:00
Kazu Hirata	400dde6ca8	[RISCV] Remove an unnecessary cast (NFC) (#154049 ) &UncompressedMI is already of MCInst *.	2025-08-17 23:46:27 -07:00
Kazu Hirata	1f3c38f125	[Support] Remove an unnecessary cast (NFC) (#154048 ) qp is already of uint64_t.	2025-08-17 23:46:20 -07:00
Guray Ozen	5d300afa80	[MLIR][NVVM] Add support for multiple return values in `inline_ptx` (#153774 ) This PR adds the ability for `nvvm.inline_ptx` to return multiple values, matching the expected semantics in PTX while respecting LLVM’s constraints. LLVM’s `inline_asm` op does not natively support multiple returns — instead, it requires packing results into an LLVM `struct` and then extracting them. This PR implements automatic packing/unpacking so that multiple return values can be expressed naturally in MLIR without extra user boilerplate. Example MLIR: ``` %r1, %r2 = nvvm.inline_ptx "{ .reg .pred p; setp.ge.s32 p, $2, $3; selp.s32 $0, $2, $3, p; selp.s32 $1, $2, $3, !p; }" (%a, %b) : i32, i32 -> i32, i32 %r3 = llvm.add %r1, %r2 : i32 ``` Lowered LLVM IR: ``` %1 = llvm.inline_asm has_side_effects asm_dialect = att "{\0A\09 .reg .pred p;\0A\09 setp.ge.s32 p, $2, $3;\0A\09 selp.s32 $0, $2, $3, p;\0A\09 selp.s32 $1, $2, $3, !p;\0A\09}\0A", "=r,=r,r,r" %a, %b : (i32, i32) -> !llvm.struct<(i32, i32)> %2 = llvm.extractvalue %1[0] : !llvm.struct<(i32, i32)> %3 = llvm.extractvalue %1[1] : !llvm.struct<(i32, i32)> %4 = llvm.add %2, %3 : i32 ```	2025-08-18 08:37:55 +02:00
yronglin	e6e874ce8f	[clang] Allow trivial pp-directives before C++ module directive (#153641 ) Consider the following code: ```cpp # 1 __FILE__ 1 3 export module a; ``` According to the wording in [P1857R3](https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2020/p1857r3.html): ``` A module directive may only appear as the first preprocessing tokens in a file (excluding the global module fragment.) ``` and the wording in [[cpp.pre]](https://eel.is/c++draft/cpp.pre#nt:module-file) ``` module-file: pp-global-module-fragment[opt] pp-module group[opt] pp-private-module-fragment[opt] ``` `#` is the first pp-token in the translation unit, and it was rejected by clang, but they really should be exempted from this rule. The goal is to not allow any preprocessor conditionals or most state changes, but these don't fit that. State change would mean most semantically observable preprocessor state, particularly anything that is order dependent. Global flags like being a system header/module shouldn't matter. We should exempt a brunch of directives, even though it violates the current standard wording. In this patch, we introduce a `TrivialDirectiveTracer` to trace the State change that described above and propose to exempt the following kind of directive: `#line`, GNU line marker, `#ident`, `#pragma comment`, `#pragma mark`, `#pragma detect_mismatch`, `#pragma clang __debug`, `#pragma message`, `#pragma GCC warning`, `#pragma GCC error`, `#pragma gcc diagnostic`, `#pragma OPENCL EXTENSION`, `#pragma warning`, `#pragma execution_character_set`, `#pragma clang assume_nonnull` and builtin macro expansion. Fixes https://github.com/llvm/llvm-project/issues/145274 --------- Signed-off-by: yronglin <yronglin777@gmail.com>	2025-08-18 14:17:35 +08:00

1 2 3 4 5 ...

548981 Commits