llvm-project

Author	SHA1	Message	Date
Wenju He	7be9972cb2	[libclc] Fix llvm-spirv dependency when llvm-spirv is built in-tree (#188896 ) When SPIRV-LLVM-Translator is built in-tree (i.e., placed in llvm/projects folder), llvm-spirv target exists. Drop legacy llvm-spirv_target dependency (was for non-runtime build) and add llvm-spirv to runtimes dependencies.	2026-03-28 07:06:23 +08:00
Farzon Lotfi	89ae675f59	[SPIRV][Matrix] Legalize store of matrix to array of vector memory layout (#188139 ) fixes #188131 This change address stylistic changes @bogners requested in https://github.com/llvm/llvm-project/pull/186215/ It also adds the `storeMatrixArrayFromVector`. to SPIRVLegalizePointerCast.cpp when we detect the matrix array of vector memory layout Changes to storeArrayFromVector were cleanup Assisted-by Github Copilot for test case check lines	2026-03-27 19:01:56 -04:00
Ilija Tovilo	1128d74438	[LLD][skip ci] Fix typo in linker_script.rst (#148867 )	2026-03-27 15:50:25 -07:00
Luke Wren	efba01ae12	[RISCV] Allocate feature bits for Zifencei and Zmmul (#143306 ) As proposed in https://github.com/riscv-non-isa/riscv-c-api-doc/pull/110. No real compiler-rt implementation as Linux does not list these extensions in hwprobe. Signed-off-by: Luke Wren <wren6991@gmail.com>	2026-03-27 15:47:57 -07:00
Stanislav Mekhanoshin	a2d84b5d8d	[AMDGPU] Remove neg support from 4 more gfx1250 WMMA (#189115 ) These are previously covered by AMDGPUWmmaIntrinsicModsAllReuse.	2026-03-27 15:20:14 -07:00
Cristian Assaiante	044876423b	[ProfInfo] Fix integer overflow in getDisjunctionWeights (#189079 ) This PR fixes an integer overflow in [`getDisjunctionWeights`](https://github.com/llvm/llvm-project/blob/main/llvm/include/llvm/IR/ProfDataUtils.h#L241) and adds a regression test to cover the failing case. Casting branch weights before the computations solved the issue. Issue https://github.com/llvm/llvm-project/issues/189021	2026-03-27 15:19:53 -07:00
Jeff Bailey	d6ff5e7778	[libc][docs] Parse inline macro_value from YAML in docgen (#189118 ) The docgen script was previously hardcoded to assume all implemented macros must be placed in a *-macros.h header. This updates docgen to read inline macro_value properties directly from the source YAML files, correctly recognizing them as implemented.	2026-03-27 22:19:29 +00:00
Jeff Bailey	0aba82eb70	[libc] Add missing POSIX macros to cpio.h (#188840 ) Define the POSIX cpio.h header and its standard macros in the libc build system. Configure the macros directly in the YAML specification to allow automated header generation without a custom definition template.	2026-03-27 22:08:13 +00:00
fineg74	563d3f6865	[OFFLOAD] Disable tests that may cause hangs in CI (#189116 )	2026-03-27 21:32:25 +00:00
Sergei Barannikov	01768d3b95	[lldb] Fix the order of arguments in the StackFrame constructor call (#189108 ) `pc` and `cfa` arguments were swapped.	2026-03-28 00:28:25 +03:00
Matt Arsenault	e825f42427	AMDGPU: Improve fsqrt f64 expansion with ninf (#183695 )	2026-03-27 22:25:32 +01:00
Pau Sum	0f81923735	[CIR][AArch64] Upstream vmull_/vmull_high_ and vmul_p8/vmul_high_p8 Neon builtins (#188371 ) Add CIR generation for AArch64 NEON builtins `vmull_` and `vmull_high_` The accompanying tests from [AArch64/neon-instrinsics](https://github.com/llvm/llvm-project/blob/main/clang/test/CodeGen/AArch64/neon-intrinsics.c) were integrated with new checks for CIR codegen. Part of #185382	2026-03-27 21:18:24 +00:00
Sergei Barannikov	eea4af4c29	[lldb] Use Address(section, offset) constructor in more places (NFC) (#189101 ) After this change, Address::SetSection() had only one use left (in a unit test) and was removed. Address::ClearSection() had no uses, now also removed. (It is unlikely that someone needs to change the section without simultaneously changing the section offset, and for that we have a constructor.)	2026-03-28 00:14:15 +03:00
Sergei Barannikov	26af10f837	[lldb] Use Address::Slide() to simplify code (NFC) (#189097 )	2026-03-28 00:12:46 +03:00
Justin Stitt	f9ad232421	[Clang] Show inlining hints for __attribute__((warning/error)) (#174892 ) When functions marked with `[[gnu::warning/error]]` are called through inlined functions, we now show the inlining chain that led to the call when ``-fdiagnostics-show-inlining-chain`` is enabled. With this flag, two modes are possible: - heuristic mode: Uses `srcloc` and `inlined.from` metadata to reconstruct the inlining chain. Functions that are `inline`, `static`, `always_inline`, or in anonymous namespaces get `srcloc` metadata attached. This mode emits a note suggesting `-gline-directives-only` for more accurate locations. - debug mode: Automatically used instead of heuristic when building with at least `-gline-directives-only` (implied by `-g1` or higher). Leverages `DILocation` debug info for reliable source locations. Fixes: https://github.com/ClangBuiltLinux/linux/issues/1571	2026-03-27 13:52:31 -07:00
Justin Stitt	8b395a7755	[Clang] Ensure pattern exclusion priority over OBT (#188390 ) Make sure pattern exclusions have priority over the overflow behavior types when deciding whether or not to emit truncation checks. Accomplish this by carrying an extra field through `ScalarConversionOpts` which we later check before emitting instrumentation.	2026-03-27 13:51:21 -07:00
Jeffrey Byrnes	1c3018b3d6	Revert "[AMDGPU] Add HWUI pressure heuristics to coexec strategy" (#189107 ) Seems to be triggering some issues with the buildbots https://lab.llvm.org/buildbot/#/builders/159/builds/44122 Unused variable + bad debug build.	2026-03-27 13:48:49 -07:00
Jonas Devlieghere	ce1b12ee08	[lldb] Iterate over a copy of the ModuleList in SearchFilter (#189009 ) Avoid a potential deadlock caused by the search filter callback acquiring the target's module lock by iterating over a copy of the list. Fixes #188766	2026-03-27 15:41:14 -05:00
Walter Lee	eb2ff71013	[DA] Mark variable only used in assert as maybe_unused (#189100 ) Fix 00aebbff71ff4e348538708064ba2e033ccd6b2a.	2026-03-27 20:38:07 +00:00
Jeffrey Byrnes	a9f5f93440	[AMDGPU] Add HWUI pressure heuristics to coexec strategy (#184929 ) Adds basic support for new heuristics for the CoExecSchedStrategy. InstructionFlavor provides a way to map instructions to different "Flavors". These "Flavors" all have special scheduling considerations -- either they map to different HarwareUnits, or have unique scheduling properties like fences. HardwareUnitInfo provides a way to track and analyze the usage of some hardware resource across the current scheduling region. CandidateHeuristics holds the state for new heuristics, as well as the implementations. In addition, this adds new heuristics to use the various support pieces listed above. tryCriticalResource attempts to schedule instructions that use the most demanded HardwareUnit. If no such instructions are ready to be scheduled, tryCriticalResourceDependency attempts to schedule instructions which enable instructions that use demanded HardwareUnits. We are incrementally adding the new heuristics. While in the process of this, the state of tryCandidateCoexec may not be great - as is the case after this PR.	2026-03-27 13:34:03 -07:00
Aiden Grossman	560b8c9afd	[CI] Make AArch64 Premerge Job Fail on Errors (#188801 ) Right now we report the errors, but the job does not actually fail. This patch fixes that.	2026-03-27 13:32:15 -07:00
Kazu Hirata	502b5e0bea	[MemProf] Dump inline call stacks as optimization remarks (#188678 ) This patch teaches the MemProf matching pass to dump inline call stacks as analysis remarks like so: frame: 704e4117e6a62739 main:10:5 frame: 273929e54b9f1234 foo:2:12 inline call stack: 704e4117e6a62739,273929e54b9f1234 The output consists of two types of remarks: - "frame": Acts as a dictionary mapping a unique MD5-based FrameID to source information (function name, line offset, and column). - "inline call stack": Provides the full call stack for a call site as a sequence of FrameIDs. Both types of remarks are deduplicated to reduce the output size. This patch is intended to be a debugging aid.	2026-03-27 12:47:16 -07:00
Matt Arsenault	28f24b5029	AMDGPU: Add baseline tests for more fract patterns (#189092 )	2026-03-27 19:38:54 +00:00
Joseph Huber	871d675c52	[compiler-rt] Add PTX feature specifically when CUDA is not available (#189083 ) Summary: People need to be able to build this without a CUDA installation. Long term we should bump up the minimum version as I'm pretty sure every architecture before this has been deprecated by NVIDIA.	2026-03-27 14:28:25 -05:00
Aiden Grossman	df6d6c9cd1	[Scudo] Disable ScudoCombinedTests.NewType (#189070 ) This is failing in some configurations on AArch64 Linux. Given there are a lot of follow-up commits that makes this hard to revert, just disable it for now pending future investigation.	2026-03-27 12:15:46 -07:00
Aditya Goyal	ba44df4b88	[clang-format] Add pre-commit CI env var support to git-clang-format (#188816 ) When git-clang-format is invoked with no explicit commit arguments and both PRE_COMMIT_FROM_REF and PRE_COMMIT_TO_REF are set, the script automatically uses those refs as the diff range and implies --diff. If the variables are absent, existing behavior is fully preserved. This allows projects to use `git-clang-format` directly inside CI pipelines via the [pre-commit](https://pre-commit.com/) framework without any wrapper scripts or extra configuration. Closes: #188813 No existing lit test suite for this script. Verified manually that env vars activate two-commit diff mode, existing behavior is preserved without them, and explicit CLI args always override them.	2026-03-27 20:15:32 +01:00
fineg74	1611a23a5b	[OFFLOAD] Add spirv implementation for named barrier (#180393 ) This change adds implementation for named barriers for SPIRV backend. Since there is no built in API/intrinsics for named barrier in SPIRV, the implementation loosely follows implementation for AMD	2026-03-27 20:14:09 +01:00
Jun Wang	3c625a179f	[AMDGPU][MC] Improving assembler error message for unsupported instructions (#185778 ) The updated error message shows both the instruction name and the GPU target name.	2026-03-27 12:04:58 -07:00
Mehdi Amini	509f181f40	[MLIR][TableGen] Fix ArrayRefParameter in struct format roundtrip (#189065 ) When an ArrayRefParameter (or OptionalArrayRefParameter) appears in a non-last position within a struct() assembly format directive, the printed output is ambiguous: the comma-separated array elements are indistinguishable from the struct-level commas separating key-value pairs. Fix this by wrapping such parameters in square brackets in both the generated printer and parser. The printer emits '[' before and ']' after the array value; the parser calls parseLSquare()/parseRSquare() around the FieldParser call. Parameters with a custom printer or parser are unaffected (the user controls the format in that case). Fixes #156623 Assisted-by: Claude Code	2026-03-27 18:41:46 +00:00
Kewen Meng	a996f2a8db	Revert "AMDGPU: Fold frame indexes into disjoint s_or_b32" (#189074 ) Reverts llvm/llvm-project#102345 unblock bot: https://lab.llvm.org/buildbot/#/builders/10/builds/25403	2026-03-27 18:33:01 +00:00
Md Abdullah Shahneous Bari	88bc265295	[XeVM] Use `ocloc` for binary generation. (#188331 ) XeVM currently doesn't support native binary generation. This PR enables Ahead of Time (AOT) compilation of gpu module to native binary using `ocloc`. Currently, only works with LevelZeroRuntimeWrappers.	2026-03-27 13:29:33 -05:00
fineg74	34a4fe5bc9	[OFFLOAD] Fix a build break (#189076 ) This PR fixes a build break reported after introduction of spirv function declarations	2026-03-27 18:28:51 +00:00
NeKon69	c703ea52be	[HLSL][DirectX][SPIRV] Implement the `fma` API (#185304 ) This PR adds `fma` HLSL intrinsic (with support for matrices) It follows all of the steps from #99117. Closes #99117.	2026-03-27 14:12:48 -04:00
Thurston Dang	3d5a2552c5	[msan] Disambiguate "Strict" vs. "Heuristic" when dumping instructions (#188873 ) When -msan-dump-strict-instructions and -msan-dump-heuristic-instructions are simultaneously enabled, it is unclear from the output whether each instruction is strictly vs. heuristically handled. [] This patch fixes the issue by tagging the output. The actual instrumentation of the code is unaffected by this change. [] A workaround is to compile the code once with only -msan-dump-strict-instructions, and a second time with -msan-dump-heuristic-instructions, but this unnecessarily doubles the compilation time.	2026-03-27 11:00:59 -07:00
Ehsan Amiri	00aebbff71	[DA] Refactor signature of weakCrossingSIVtest and check inputs (NFCI) (#187117 ) Passing SCEVAddRecExpr objects directly to weakCrossingSIVtest and checking the validity of the input operands	2026-03-27 13:57:08 -04:00
Alexey Samsonov	ead9ac8331	[libc] Remove header templates from several C standard headers. (#188878 ) Switches the following headers to hdrgen-produced ones by referencing some macro from C standard and the file containing the declarations in corresponding YAML files: * limits.h (referenced _WIDTH / _MAX / _MIN families). * locale.h (referenced LC_ family). * time.h (referenced CLOCKS_PER_SEC). * wchar.h (referenced WEOF).	2026-03-27 17:55:37 +00:00
Ben Dunbobbin	80b304d14b	[DTLTO] Improve performance of adding files to the link (#186366 ) The in-process ThinLTO backend typically generates object files in memory and adds them directly to the link, except when the ThinLTO cache is in use. DTLTO is unusual in that it adds files to the link from disk in all cases. When the ThinLTO cache is not in use, ThinLTO adds files via an `AddStreamFn` callback provided by the linker, which ultimately appends to a `SmallVector` in LLD. When the cache is in use, the linker supplies an `AddBufferFn` callback that adds files more efficiently (by moving `MemoryBuffer` ownership). This patch adds a mandatory `AddBufferFn` to the DTLTO ThinLTO backend. The backend uses this to add files to the link more efficiently. Additionally: - Move AddStream from CGThinBackend to InProcessThinBackend, for reader clarity. - Modify linker comments that implied the AddBuffer path is cache-specific. For a Clang link (Debug build with sanitizers and instrumentation) using an optimized toolchain (PGO non-LTO, llvmorg-22.1.0), measuring the mean `Add DTLTO files to the link` time trace scope duration: - On Windows (Windows 11 Pro Build 26200, AMD Family 25 @ ~4.5 GHz, 16 cores/32 threads, 64 GB RAM), this patch reduces the mean from 2799.148 ms to 157.972 ms. - On Linux (Ubuntu 24.04.3 LTS Kernel 6.14, Ryzen 9 5950X, 16 cores/32 threads, boost up to 5.09 GHz, 64 GB RAM), this patch reduces the mean from 255.291 ms to 41.630 ms. Based on work by @romanova-ekaterina and @kbelochapka.	2026-03-27 17:51:49 +00:00
Ben Dunbobbin	d271bd37ce	Revert "[DTLTO] Speed up temporary file removal in the ThinLTO backed (#189043 ) This reverts commit 11b439c5c5a07c95d30ce25abd6adf7f5fbb7105. timeTraceProfilerCleanup() can be called before the temporary file deletion has completed in LLD. This causes memory leaks that were flagged up by sanitizer builds, e.g.: https://lab.llvm.org/buildbot/#/builders/24/builds/18840/steps/11/logs/stdio	2026-03-27 17:48:57 +00:00
Pengcheng Wang	7e2f78923c	[RISCV][NFC] Use enum types to improve debuggability (#188418 ) So that we can see the enum values instead of integral values when dumping in debuggers.	2026-03-28 01:42:49 +08:00
Jeff Bailey	030ef70908	[libc][docs] Document libc-shared-tests ninja target (#189062 ) Added a brief description of the libc-shared-tests target to the Building and Testing page. This target allows running tests for shared standalone components like math primitives without the full libc runtime.	2026-03-27 17:39:38 +00:00
Sirraide	bd947ea6fd	[Clang] [Sema] Don't diagnose multidimensional subscript operators on dependent types (#188910 ) I forgot to check for dependent types in #187828; we somehow didn’t have tests for this so CI didn’t catch this...	2026-03-27 18:39:22 +01:00
Mehdi Amini	cb58fe9df5	[MLIR][SCF] Fix loopUnrollByFactor for unsigned loops with narrow integer types (#189001 ) `loopUnrollByFactor` used `getConstantIntValue()` to read loop bounds, which sign-extends the constant to `int64_t`. For unsigned `scf.for` loops with narrow integer types (e.g. i1, i2, i3), this produces wrong results: a bound such as `1 : i1` has `getSExtValue() == -1` but should be treated as `1` (unsigned). Two bugs were introduced by this: 1. Wrong epilogue detection: the comparison `upperBoundUnrolledCst < ubCst` used signed int64, so e.g. `0 < -1` (where ubCst is the sign-extended i1 value 1) evaluated to false, suppressing the epilogue that should execute the remaining iterations. 2. Zero step after overflow: when `tripCountEvenMultiple == 0` (all iterations go to the epilogue), `stepUnrolledCst = stepCst * unrollFactor` can overflow the bound type's bitwidth and wrap to 0. A zero step causes `constantTripCount` to return `nullopt`, preventing the zero-trip main loop from being elided. Fix: - Use zero-extension (`getZExtValue`) instead of sign-extension when reading bounds for unsigned loops. - When `tripCountEvenMultiple == 0`, keep the original step for the main loop to avoid the zero-step issue (the step value is irrelevant for a zero-trip loop anyway). Fixes #163743 Assisted-by: Claude Code	2026-03-27 18:36:51 +01:00
Jianhui Li	28e2fa3247	[MLIR][XeGPU] Extend convert_layout op to support scalar type (#188874 ) This PR adds scalar type to convert_layout op's result and operand. It also enhance convert_layout pattern in wg-to-sg, unrolling, and sg-to-lane distribution. It is to support reduction to scalar, whether currently the layout propagation doesn't support scalar to carry any layout. The design choice to insert convert_layout op after reduction-to-scalar op to record the layout information permanently across the passes.	2026-03-27 10:36:35 -07:00
Petter Berntsson	2af95b2fa2	[libc][docs] Fix POSIX basedefs links for nested headers (#188738 ) Fix broken POSIX basedefs links for nested headers in llvm-libc docs. The docgen script currently emits paths like `sys/wait.h.html`, but the Open Group uses `sys_wait.h.html`, for example: - https://pubs.opengroup.org/onlinepubs/9799919799/basedefs/sys_wait.h.html This updates nested-header link generation while leaving flat headers unchanged.	2026-03-27 17:30:43 +00:00
Sergei Barannikov	22cfe6f39d	[lldb] Make single-argument Address constructor explicit (NFC) (#189035 ) This is to highlight places where we (probably unintentionally) construct an `Address` object from an already resolved address, making it unresolved again. See the changes in `DynamicLoaderDarwin.cpp` for a quick example. Also, use this constructor instead of `Address(lldb::addr_t file_addr, const SectionList *section_list)` when `section_list` is `nullptr`.	2026-03-27 20:22:48 +03:00
Han-Chung Wang	9e44babdaf	[mlir][vector] Add support for dropping inner unit dims for transfer_read/write with masks. (#188841 ) The revision clears a long-due TODO, which supports the lowering when transfer_read/write ops have mask via inserting a vector.shape_cast op for the masked value. --------- Signed-off-by: hanhanW <hanhan0912@gmail.com>	2026-03-27 10:21:20 -07:00
Ryan Buchner	a125d9b5ef	[SLP][NFC] Reapply "Refactor to prepare for constant stride stores" (#188689 ) Refactor to proceed #185964. Much of this is a refactor to address this issues. Instead of iterating over one chain at a time, attempting all VFs for that given change, we now iterate over VFs, trying each chain for the current VF. Includes fix for use after free bug.	2026-03-27 10:11:49 -07:00
vangthao95	87bec47152	AMDGPU/GlobalISel: RegBankLegalize rules for div_fmas/fixup/scale (#188305 )	2026-03-27 10:10:09 -07:00
Joseph Huber	f52797c54d	[compiler-rt] Fix irrelevant warning on the builtins target (#189055 ) Summary: Currently, building through runtimes will yield this warning: ``` CMake Warning at compiler-rt/cmake/Modules/CompilerRTUtils.cmake:335 (message): LLVMTestingSupport not found in LLVM_AVAILABLE_LIBS Call Stack (most recent call first) ``` This is due to the fact that the builtins target does not go through the s tandard runtimes patch and sets them as BUILDTREE_ONLY so they do not show up. These are not used in this case, so just guard the condition to suppress the warning.	2026-03-27 12:07:42 -05:00
Joseph Huber	15bfc06b6b	[Offload][NFC] Various minor changes to Offload CMake (#189029 ) Summary: Most of these just remove some redundancy or rename `openmp` -> `offload` where the variable is purely internal.	2026-03-27 12:06:37 -05:00

1 2 3 4 5 ...

574694 Commits