llvm-project

Author	SHA1	Message	Date
mitchell	150d9b7a77	[clang-tidy][NFC] Add clang-tidy formatting commit to `.git-blame-ignore-revs` (#167126 ) Co-authored-by: Baranov Victor <bar.victor.2002@gmail.com>	2025-11-20 21:42:10 +08:00
Andrzej Warzyński	cfda27d0fb	[mlir][Vector] Add support for scalable vectors to `ScanToArithOps` (#123117 ) Note, scalable reductions dims are left as a TODO.	2025-11-20 13:39:52 +00:00
Alexander Johnston	76f1949cfa	[HLSL] Implement the `fwidth` intrinsic for DXIL and SPIR-V target (#161378 ) Adds the fwidth intrinsic for HLSL. The DXIL path only requires modification to the hlsl headers. The SPIRV path implements the OpFwidth builtin in Clang and instruction selection for the OpFwidth instruction in LLVM. Also adds shader stage tests to the ddx_coarse and ddy_coarse instructions used by fwidth. Closes #99120 --------- Co-authored-by: Alexander Johnston <alexander.johnston@amd.com>	2025-11-20 07:38:32 -05:00
Paul Walker	21c4c1502e	[LLVM][CodeGen][SVE] Only use unpredicated bfloat instructions when all lanes are in use. (#168387 ) While SVE support for exception safe floating point code generation is bare bones we try to ensure inactive lanes remiain inert. I mistakenly broke this rule when adding support for SVE-B16B16 by lowering some bfloat operations of unpacked vectors to unpredicated instructions.	2025-11-20 12:01:04 +00:00
Mehdi Amini	3da82af83f	[MLIR] Apply clang-tidy fixes for bugprone-argument-comment in SparseBufferRewriting.cpp (NFC)	2025-11-20 03:35:18 -08:00
Mehdi Amini	9e86c0d5da	[MLIR] Apply clang-tidy fixes for readability-container-size-empty in LinalgOps.cpp (NFC)	2025-11-20 03:35:18 -08:00
Mehdi Amini	c6a79a55ff	[MLIR] Apply clang-tidy fixes for readability-identifier-naming in LLVMToLLVMIRTranslation.cpp (NFC)	2025-11-20 03:35:18 -08:00
sskzakaria	a2b4c0fbe0	[X86][Clang] VectorExprEvaluator::VisitCallExpr / InterpretBuiltin - allow AVX512 mask predicate intrinsics to be used in constexpr (#165054 ) Enables constexpr evaluation for the following AVX512 Instrinsics: ``` _mm_movepi8_mask _mm256_movepi8_mask _mm512_movepi8_mask _mm_movepi16_mask _mm256_movepi16_mask _mm512_movepi16_mask _mm_movepi32_mask _mm256_movepi32_mask _mm512_movepi32_mask _mm_movepi64_mask _mm256_movepi64_mask _mm512_movepi64_mask ``` Part of #162072	2025-11-20 11:25:23 +00:00
Benjamin Maxwell	02db2de905	[AArch64][SVE] Implement demanded bits for @llvm.aarch64.sve.cntp (#168714 ) This allows DemandedBits to see that the SVE CNTP intrinsic will only ever produce small positive integers. The maximum value you could get here is 256, which is CNTP on a nxv16i1 on a machine with a 2048bit vector size (the maximum for SVE). Using this various redundant operations (zexts, sexts, ands, ors, etc) can be eliminated.	2025-11-20 11:23:05 +00:00
Zichen Lu	0a88e96228	[MLIR][LLVM] Extend DIScopeForLLVMFuncOp to handle cross-file operatio… (#167844 ) The current `DIScopeForLLVMFuncOp` pass handles debug information for inlined code by processing `CallSiteLoc` attributes. However, some compilation scenarios compose code from multiple source files directly into a single function without generating `CallSiteLoc`. Scenario: ```python # a.py def kernel_a(tensor): print("a: {}", tensor) # a.py:3 jit_func_b(tensor) # Calls b.py code # b.py def func_b(tensor): print("b: {}", tensor) # b.py:7 ``` The scenario executes Python at compile-time and directly inserts operations from `b.py` into the kernel function, resulting in MLIR like: ```mlir @kernel_a(...) { print("a: {}", %arg0) loc(#loc_a) // a.py:3 print("b: {}", %arg0) loc(#loc_b) // b.py:7 <- FileLineColLoc, not CallSiteLoc } loc(#loc_kernel) // a.py:1 #loc1 = loc("a.py":3:.) #loc2 = loc("b.py":7:.) #loc_a = loc("print"(#loc1)) #loc_b = loc("print"(#loc2)) ``` ```llvm !6 = !DIFile(filename: "a.py", directory: "...") !9 = distinct !DISubprogram(name: "...", linkageName: "...", scope: !6, file: !6, line: 13, ...) !10 = !DILocation(line: 7, column: ., scope: !9) // Points to kernel's DISubprogram, not correct ```	2025-11-20 12:14:14 +01:00
Simon Pilgrim	53dfdf7ffd	[X86] BuiltinsX86.td - attempt to pack the builtins for each SSE level close together. NFC. (#168844 ) Avoid some repeated feature blocks - we should have a single place in each file that we can find most builtins for a particular ISA level. Also, avoid some of the 80col wrapping that just makes it harder to find anything at all. There's a lot more we can do - but I don't want to completely refactor this while we still have so much work to do for #30794	2025-11-20 10:34:51 +00:00
Matthias Springer	95d788c761	Revert "[mlir][Pass] Fix crash when applying a pass to an optional interface" (#168847 ) Reverts llvm/llvm-project#168499	2025-11-20 18:31:51 +08:00
Sam Tebbs	3396b4654b	[LV] Allow partial reductions with an extended bin op (#165536 ) A pattern of the form reduce.add(ext(mul)) is valid for a partial reduction as long as the mul and its operands fulfill the requirements of a normal partial reduction. The mul's extend operands will be optimised to the wider extend, and we already have oneUse checks in place to make sure the mul and operands can be modified safely. 1. -> https://github.com/llvm/llvm-project/pull/165536 2. https://github.com/llvm/llvm-project/pull/165543	2025-11-20 10:22:11 +00:00
Jeremy Morse	2cf550a040	[DebugInfo] Force early line-zero calls to have meaningful locations (#156850 ) In functions that have been seriously deformed during optimisation, there can be call instructions with line-zero immediately after frame setup (see C reproducer in the test added). Our previous algorithms for prologue_end ignored these, meaning someone entering a function at prologue_end would break-in after a function call had completed. Prefer instead to place prologue_end and the function scope-line on the line zero call: this isn't false (it's the first meaningful instruction of the function) and is approximately true. Given a less than ideal function, this is an OK solution.	2025-11-20 10:20:47 +00:00
Aaditya	74cebce264	Revert "[AMDGPU] Add wave reduce intrinsics for float types - 2 (#161… (#168845 ) …815)" This reverts commit dcab4cb49bfb0aa17df3d3fabe582696100e0d35.	2025-11-20 15:44:57 +05:30
Matthias Springer	54f69caf1f	[mlir][Pass] Fix crash when applying a pass to an optional interface (#168499 ) Interfaces can be optional: whether an op implements an interface or not can depend on the state of the operation. ``` // An optional code block for adding additional "classof" logic. This can // be used to better enable "optional" interfaces, where an entity only // implements the interface if some dynamic characteristic holds. // `$_attr`/`$_op`/`$_type` may be used to refer to an instance of the // interface instance being checked. code extraClassOf = ""; ``` The current `Pass::canScheduleOn(RegisteredOperationName)` is insufficient. This commit adds an additional overload to inspect `Operation *`. This commit fixes a crash when scheduling an `InterfacePass` for an optional interface on an operation that does not actually implement the interface.	2025-11-20 17:51:44 +08:00
Aleksandr Nogikh	131cf7d5b2	[AllocToken] Enable alloc token instrumentation for size-returning functions (#168840 ) Consider a newly added "malloc_span" attribute in the allocation token instrumentation to ensure that allocation functions with the "malloc_span" attribute are processed similarly to other memory allocation functions. Update the tests to demonstrate applicability to __size_returning_new.	2025-11-20 10:33:24 +01:00
Kiran Kumar T P	dc343d2f05	[NFC][flang] Replace use of flang -fc1 with %flang_fc1 in few test case (#168830 ) Replace use of flang -fc1 with %flang_fc1 in few test case	2025-11-20 15:00:15 +05:30
Jim Lin	bdf598f8dd	CodeGen: Add missing subtarget to TargetLoweringBase constructor for ARC, CSKY and M68K (#168811 ) Those were missing in https://github.com/llvm/llvm-project/pull/168620.	2025-11-20 17:11:28 +08:00
Simon Pilgrim	07a31adf28	[X86] EltsFromConsecutiveLoads - recognise reverse load patterns. (#168706 ) See if we can create a vector load from the src elements in reverse and then shuffle these back into place. SLP will (usually) catch this in the middle-end, but there are a few BUILD_VECTOR scalarizations etc. that appear during DAG legalization. I did start looking at a more general permute fold, but I haven't found any good test examples for this yet - happy to take another look if somebody has examples.	2025-11-20 09:08:39 +00:00
Sam Parker	e44646b795	[WebAssembly] Lower ANY_EXTEND_VECTOR_INREG (#167529 ) Treat it in the same manner of zero_extend_vector_inreg and generate an extend_low_u if possible. This is to try an prevent expensive shuffles from being generated instead. computeKnownBitsForTargetNode has also been updated to specify known zeros on extend_low_u.	2025-11-20 08:57:08 +00:00
Aaditya	dcab4cb49b	[AMDGPU] Add wave reduce intrinsics for float types - 2 (#161815 ) Supported Ops: `fadd`, `fsub`	2025-11-20 14:21:54 +05:30
Aaditya	dbf4525351	[AMDGPU] Add wave reduce intrinsics for float types - 1 (#161814 ) Supported Ops: `fmin`, `fmax`	2025-11-20 13:23:02 +05:30
Brandon Wu	3e5fafdc22	[RISCV][llvm] Select splat_vector(constant) with PLI (#168204 ) Default DAG combiner combine BUILD_VECTOR with same elements to SPLAT_VECTOR, we can just map constant splat to PLI if possible.	2025-11-20 15:02:40 +08:00
zhangtianhao6	fde2aadb80	[CodeGen] update code generation optimization level(nfc) (#168190 )	2025-11-20 14:54:42 +08:00
Craig Topper	8608344778	[CFIInserter] Turn a reachable llvm_unreachable into a report_fatal_error. (#168777 ) This prevents it from being optimized out in non-asserts builds. Update X86 test to remove REQUIRES: asserts and check for LLVM ERROR. Add FileCheck to RISC-V test and remove UNSUPPORTED. This is the more complete fix for #168772 and #168525.	2025-11-19 22:32:26 -08:00
Matt Arsenault	db20a7f2bc	DAG: Fix constructing a temporary TargetTransformInfo instance (#168480 )	2025-11-20 01:19:23 -05:00
Jinjie Huang	7f0dbf049a	[NFC] Reduce the size of test input in incompatible_dwarf_version.test (#168825 ) Use smaller test inputs in in incompatible_dwarf_version.test to reduce disk usage and execution time.	2025-11-20 13:56:10 +08:00
lonely eagle	765208b313	[mlir] Make remove-dead-values remove block and successorOperands before delete ops (#166766 ) Reland https://github.com/llvm/llvm-project/pull/165725, fix the Failed test by removing successor operands before delete operations. Following the deletion of cond.branch, its successor operands will subsequently be removed.	2025-11-20 13:55:09 +08:00
Volodymyr Sapsai	b39a9db3ab	[clang][deps] Add module map describing compiled module to file dependencies. (#160226 ) When we add the module map describing the compiled module to the command line, add it to the file dependencies as well. Discovered while working on reproducers where a command line input was missing in the captured files as it wasn't considered a dependency.	2025-11-19 20:17:43 -08:00
Nicolai Hähnle	13ed14f47e	AMDGPU: Autogenerate checks in a test (#168815 )	2025-11-20 03:51:32 +00:00
marius doerner	7198279707	[clang][bytecode] Implement case ranges (#168418 ) Fixes #165969 Implement GNU case ranges for constexpr bytecode interpreter.	2025-11-20 04:50:32 +01:00
Luke Lau	47b756a5a6	[RISCV] Only reduce VLs of instructions with demanded VLs (#168693 ) In RISCVVLOptimizer we first compute all the demanded VLs, then we walk backwards through the function and try to reduce any VLs. We don't actually need to walk backwards anymore since after #124530 the order in which we modify the instructions doesn't matter. This patch changes it to just iterate over the instructions with a demanded VL computed, which means we don't iterate over scalar instructions etc. This also fixes #168665, where we triggered an assert on instructions with a dead $vxsat implicit-def: dead %x:vr = PseudoVSADDU_VV_M1 $noreg, $noreg, $noreg, -1, 3 /* e8 /, 0 / tu, mu */, implicit-def dead $vxsat Because $vxsat is a reserved register, DeadMachineInstructionElim won't remove it and the instruction makes it to RISCVVLOptimizer. And because the def of %x is dead, we don't reach this instruction in the dataflow analysis. This instruction returns true for isCandidate, so we would try to lookup its demanded VL which doesn't exist and assert. But with this patch we don't try to reduce instructions that aren't in DemandedVLs, which fixes the crash.	2025-11-20 03:49:59 +00:00
Hristo Hristov	3f151a3fa6	[libc++][memory] Applied `[[nodiscard]]` to smart pointers (#168483 ) Applied `[[nodiscard]]` where relevant to smart pointers and related functions. - [x] - `std::unique_ptr` - [x] - `std::shared_ptr` - [x] - `std::weak_ptr` See guidelines: - https://libcxx.llvm.org/CodingGuidelines.html#apply-nodiscard-where-relevant - `[[nodiscard]]` should be applied to functions where discarding the return value is most likely a correctness issue. For example a locking constructor in unique_lock. --------- Co-authored-by: Hristo Hristov <zingam@outlook.com>	2025-11-20 04:19:15 +02:00
Pranav Kant	fda20d99ae	[bazel] Fix #165009 (#168804 )	2025-11-19 18:17:35 -08:00
Jinjie Huang	79fffed60a	[llvm-dwp] Give more information when incompatible version found (#168511 ) Provide more information when detecting a DWARF version mismatch in .dwo files to help locate the issue and align with other similar errors.	2025-11-20 10:17:13 +08:00
Paddy McDonald	beac880da5	Better fix for the stack_container_dynamic_lib test (#168798 ) Add the missing %libdl to the link command	2025-11-19 18:06:11 -08:00
Gang Chen	9e9fe08b16	Re-land [Transform][LoadStoreVectorizer] allow redundant in Chain (#168135 ) This is the fixed version of https://github.com/llvm/llvm-project/pull/163019	2025-11-19 17:39:10 -08:00
Yaxun (Sam) Liu	f9696949c3	[ClangLinkerWrapper] Refactor target ID sanitization for Windows file… (#168744 ) … names Fix non-RDC mode HIP compilation for the new driver on Windows due to invalid temporary file names when offload arch is a target ID containing ':', which is invalid in file names on Windows. Refactor the existing handling of ':' in file names on Windows from clang driver into a shared function sanitizeTargetIDInFileName in clang/Basic/TargetID.h. This function replaces ':' with '@' on Windows only, preserving the original behavior. Update both clang/lib/Driver/Driver.cpp and clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp to use this shared function, ensuring consistent handling across both tools.	2025-11-19 20:22:22 -05:00
Sayan Saha	def8ecbda9	[tosa] : Relax dynamic dimension checks for batch for conv decompositions (#168764 ) This PR relaxes the validation checks to allow input/output data to have dynamic batch dimensions.	2025-11-19 20:17:01 -05:00
Alexey Bataev	2c3aa92089	[SLP]Fix insertion point for setting for the nodes The problem with the many def-use chain problems in SLP vectorizer are related to the fact that some nodes reuse the same instruction as insertion point. Insertion point is not the instruction, but the place between instructions. To set it correctly, better to generate pseudo instruction immediately after the last instruction, and use it as insertion point. It resolves the issues in most cases. Fixes #168512 #168576	2025-11-19 17:15:24 -08:00
Eli Friedman	4e275f7274	[Arm64EC][clang] Implement varargs support in clang. (#152411 ) The clang side of the calling convention code for arm64 vs. arm64ec is close enough that this isn't really noticeable in most cases, but the rule for choosing whether to pass a struct directly or indirectly is significantly different. (Adapted from my old patch https://reviews.llvm.org/D125419 .) Fixes #89615.	2025-11-19 16:45:08 -08:00
Carl Ritson	b1c4b55118	RenameIndependentSubregs: try to only implicit def used subregs (#167486 ) Attempt to only define used subregisters when creating IMPLICIT_DEF fix ups for live interval subranges. This avoids the appearance at the MIR level of entire (wide) registers becoming live rather than relying only on transient LiveIntervals dead definitions for unused subregisters.	2025-11-20 09:28:34 +09:00
Dhruva Chakrabarti	94e4ee38aa	[AMDGPU] Fixed crash in getLastMIForRegion when the region is empty. (#168653 ) PreRARematStage builds region live-outs if GCN trackers are enabled. If rematerialization leads to empty regions, this can cause a crash because of dereference of an invalid iterator in getLastMIForRegion. The fix is to skip calling getLastMIForRegion for empty regions. This patch fixes another bug in the same code region. getLastMIForRegion calls skipDebugInstructionsBackward which may immediately return the RegionEnd if it is not the begin instruction and it is a non-debug instruction. That would imply considering an instruction that is outside the relevant region. The fix is to always pass the previous of RegionEnd to skipDebugInstructionsBackward. This bug was found while using GCN trackers on the existing LIT test machine-scheduler-sink-trivial-remats.mir. Here's the assertion failure. llvm-project/llvm/include/llvm/ADT/ilist_iterator.h:168: llvm::ilist_iterator<OptionsT, IsReverse, IsConst>::reference llvm::ilist_iterator<OptionsT, IsReverse, IsConst>::operator*() const [with OptionsT = llvm::ilist_detail::node_options<llvm::MachineInstr, true, true, void, false, void>; bool IsReverse = false; bool IsConst = false; llvm::ilist_iterator<OptionsT, IsReverse, IsConst>::reference = llvm::MachineInstr&]: Assertion `!NodePtr->isKnownSentinel()' failed.	2025-11-19 16:19:20 -08:00
Nishant Patel	af73aeaa19	[MLIR][Vector] Add unroll pattern for vector.shape_cast (#167738 ) This PR adds pattern for unrolling shape_cast given a targetShape. This PR is a follow up of #164010 which was very general and was using inserts and extracts on each element (which is also LowerVectorShapeCast.cpp is doing). After doing some more research on use cases, we (me and @Jianhui-Li ) realized that the previous version in #164010 is unnecessarily generic and doesn't fit our performance needs. Our use case requires that targetShape is contiguous in both source and result vector. This pattern only applies when contiguous slices can be extracted from the source vector and inserted into the result vector such that each slice remains in vector form with targetShape (and not decompose to scalars). In these cases, the unrolling proceeds as: vector.extract_strided_slice -> vector.shape_cast (on the slice unrolled) -> vector.insert_strided_slice	2025-11-19 16:16:44 -08:00
Sang Ik Lee	7de59f0b24	[MLIR][Conversion] XeGPU to XeVM: Use adaptor for getting base address from memref. (#168610 ) adaptor already lowers memref to base address. Conversion patterns should use it instead of generating code to get base address from memref.	2025-11-19 16:15:27 -08:00
Andy Kaylor	ef0cd1dae3	[CIR][NFC] Fix warnings in release builds (#168791 ) This fixes several warnings that occur in CIR release builds.	2025-11-19 16:12:17 -08:00
Paddy McDonald	ff39d59000	Disable test under GCC (#168792 ) New test stack_container_dynamic_lib.cpp has errors under gcc. Require clang while better fix is investigated	2025-11-19 16:08:34 -08:00
Jan Svoboda	835951325e	[clang][deps] Enable calling `DepScanFile::getBuffer()` repeatedly (#168789 ) This PR makes it possible to call `getBuffer()` on `DepScanFile` (a `llvm::vfs::File`) repeatedly. Previously, this function would return a moved-from `unique_ptr`. This doesn't fix any existing bugs, I discovered this while experimenting with the VFSs in the scanner. Note that the returned instances of `llvm::MemoryBuffer` are non-owning and share the underlying buffer storage.	2025-11-19 16:03:30 -08:00
Haocong Lu	80f862b692	[CIR] Upstream CIR codegen for `lzcnt` and `tzcnt` x86 builtins (#168479 ) Support CIR codegen for x86 builtins `__builtin_ia32_lzcnt` and `__builtin_ia32_tzcnt`.	2025-11-19 16:01:56 -08:00

1 2 3 4 5 ...

560143 Commits