574694 Commits

Author SHA1 Message Date
Wenju He
7be9972cb2
[libclc] Fix llvm-spirv dependency when llvm-spirv is built in-tree (#188896)
When SPIRV-LLVM-Translator is built in-tree (i.e., placed in
llvm/projects folder), llvm-spirv target exists.

Drop legacy llvm-spirv_target dependency (was for non-runtime build) and
add llvm-spirv to runtimes dependencies.
2026-03-28 07:06:23 +08:00
Farzon Lotfi
89ae675f59
[SPIRV][Matrix] Legalize store of matrix to array of vector memory layout (#188139)
fixes #188131

This change address stylistic changes @bogners requested in
https://github.com/llvm/llvm-project/pull/186215/ It also adds the
`storeMatrixArrayFromVector`. to
SPIRVLegalizePointerCast.cpp when we detect the matrix array of vector
memory layout
Changes to storeArrayFromVector were cleanup

Assisted-by Github Copilot for test case check lines
2026-03-27 19:01:56 -04:00
Ilija Tovilo
1128d74438
[LLD][skip ci] Fix typo in linker_script.rst (#148867) 2026-03-27 15:50:25 -07:00
Luke Wren
efba01ae12
[RISCV] Allocate feature bits for Zifencei and Zmmul (#143306)
As proposed in
https://github.com/riscv-non-isa/riscv-c-api-doc/pull/110.

No real compiler-rt implementation as Linux does not list these
extensions in hwprobe.

Signed-off-by: Luke Wren <wren6991@gmail.com>
2026-03-27 15:47:57 -07:00
Stanislav Mekhanoshin
a2d84b5d8d
[AMDGPU] Remove neg support from 4 more gfx1250 WMMA (#189115)
These are previously covered by AMDGPUWmmaIntrinsicModsAllReuse.
2026-03-27 15:20:14 -07:00
Cristian Assaiante
044876423b
[ProfInfo] Fix integer overflow in getDisjunctionWeights (#189079)
This PR fixes an integer overflow in
[`getDisjunctionWeights`](https://github.com/llvm/llvm-project/blob/main/llvm/include/llvm/IR/ProfDataUtils.h#L241)
and adds a regression test to cover the failing case. Casting branch
weights before the computations solved the issue.

Issue https://github.com/llvm/llvm-project/issues/189021
2026-03-27 15:19:53 -07:00
Jeff Bailey
d6ff5e7778
[libc][docs] Parse inline macro_value from YAML in docgen (#189118)
The docgen script was previously hardcoded to assume all implemented
macros must be placed in a *-macros.h header. This updates docgen to
read inline macro_value properties directly from the source YAML files,
correctly recognizing them as implemented.
2026-03-27 22:19:29 +00:00
Jeff Bailey
0aba82eb70
[libc] Add missing POSIX macros to cpio.h (#188840)
Define the POSIX cpio.h header and its standard macros in the libc build
system. Configure the macros directly in the YAML specification to allow
automated header generation without a custom definition template.
2026-03-27 22:08:13 +00:00
fineg74
563d3f6865
[OFFLOAD] Disable tests that may cause hangs in CI (#189116) 2026-03-27 21:32:25 +00:00
Sergei Barannikov
01768d3b95
[lldb] Fix the order of arguments in the StackFrame constructor call (#189108)
`pc` and `cfa` arguments were swapped.
2026-03-28 00:28:25 +03:00
Matt Arsenault
e825f42427
AMDGPU: Improve fsqrt f64 expansion with ninf (#183695) 2026-03-27 22:25:32 +01:00
Pau Sum
0f81923735
[CIR][AArch64] Upstream vmull_*/vmull_high_* and vmul_p8/vmul_high_p8 Neon builtins (#188371)
Add CIR generation for AArch64 NEON builtins `vmull_*` and
`vmull_high_*`

The accompanying tests from
[AArch64/neon-instrinsics](https://github.com/llvm/llvm-project/blob/main/clang/test/CodeGen/AArch64/neon-intrinsics.c)
were integrated with new checks for CIR codegen.

Part of #185382
2026-03-27 21:18:24 +00:00
Sergei Barannikov
eea4af4c29
[lldb] Use Address(section, offset) constructor in more places (NFC) (#189101)
After this change, Address::SetSection() had only one use left (in a
unit test) and was removed. Address::ClearSection() had no uses, now
also removed. (It is unlikely that someone needs to change the section
without simultaneously changing the section offset, and for that we have
a constructor.)
2026-03-28 00:14:15 +03:00
Sergei Barannikov
26af10f837
[lldb] Use Address::Slide() to simplify code (NFC) (#189097) 2026-03-28 00:12:46 +03:00
Justin Stitt
f9ad232421
[Clang] Show inlining hints for __attribute__((warning/error)) (#174892)
When functions marked with `[[gnu::warning/error]]` are called through inlined functions, we now show the inlining chain that led to the call when ``-fdiagnostics-show-inlining-chain`` is enabled.

With this flag, two modes are possible:

- **heuristic** mode: Uses `srcloc` and `inlined.from` metadata to reconstruct the inlining chain. Functions that are `inline`, `static`, `always_inline`, or in anonymous namespaces get `srcloc` metadata attached. This mode emits a note suggesting `-gline-directives-only` for more accurate locations.

- **debug** mode: Automatically used instead of heuristic when building with at least `-gline-directives-only` (implied by `-g1` or higher). Leverages `DILocation` debug info for reliable source locations.

Fixes: https://github.com/ClangBuiltLinux/linux/issues/1571
2026-03-27 13:52:31 -07:00
Justin Stitt
8b395a7755
[Clang] Ensure pattern exclusion priority over OBT (#188390)
Make sure pattern exclusions have priority over the overflow behavior types when deciding whether or not to emit truncation checks.

Accomplish this by carrying an extra field through `ScalarConversionOpts` which we later check before emitting instrumentation.
2026-03-27 13:51:21 -07:00
Jeffrey Byrnes
1c3018b3d6
Revert "[AMDGPU] Add HWUI pressure heuristics to coexec strategy" (#189107)
Seems to be triggering some issues with the buildbots

https://lab.llvm.org/buildbot/#/builders/159/builds/44122

Unused variable + bad debug build.
2026-03-27 13:48:49 -07:00
Jonas Devlieghere
ce1b12ee08
[lldb] Iterate over a copy of the ModuleList in SearchFilter (#189009)
Avoid a potential deadlock caused by the search filter callback
acquiring the target's module lock by iterating over a copy of the list.

Fixes #188766
2026-03-27 15:41:14 -05:00
Walter Lee
eb2ff71013
[DA] Mark variable only used in assert as maybe_unused (#189100)
Fix 00aebbff71ff4e348538708064ba2e033ccd6b2a.
2026-03-27 20:38:07 +00:00
Jeffrey Byrnes
a9f5f93440
[AMDGPU] Add HWUI pressure heuristics to coexec strategy (#184929)
Adds basic support for new heuristics for the CoExecSchedStrategy.

InstructionFlavor provides a way to map instructions to different
"Flavors". These "Flavors" all have special scheduling considerations --
either they map to different HarwareUnits, or have unique scheduling
properties like fences.

HardwareUnitInfo provides a way to track and analyze the usage of some
hardware resource across the current scheduling region.

CandidateHeuristics holds the state for new heuristics, as well as the
implementations.

In addition, this adds new heuristics to use the various support pieces
listed above. tryCriticalResource attempts to schedule instructions that
use the most demanded HardwareUnit. If no such instructions are ready to
be scheduled, tryCriticalResourceDependency attempts to schedule
instructions which enable instructions that use demanded HardwareUnits.

We are incrementally adding the new heuristics. While in the process of
this, the state of tryCandidateCoexec may not be great - as is the case
after this PR.
2026-03-27 13:34:03 -07:00
Aiden Grossman
560b8c9afd
[CI] Make AArch64 Premerge Job Fail on Errors (#188801)
Right now we report the errors, but the job does not actually fail. This
patch fixes that.
2026-03-27 13:32:15 -07:00
Kazu Hirata
502b5e0bea
[MemProf] Dump inline call stacks as optimization remarks (#188678)
This patch teaches the MemProf matching pass to dump inline call
stacks as analysis remarks like so:

frame: 704e4117e6a62739 main:10:5
frame: 273929e54b9f1234 foo:2:12
inline call stack: 704e4117e6a62739,273929e54b9f1234

The output consists of two types of remarks:

- "frame": Acts as a dictionary mapping a unique MD5-based FrameID
  to source information (function name, line offset, and column).

- "inline call stack": Provides the full call stack for a call site
  as a sequence of FrameIDs.

Both types of remarks are deduplicated to reduce the output size.

This patch is intended to be a debugging aid.
2026-03-27 12:47:16 -07:00
Matt Arsenault
28f24b5029
AMDGPU: Add baseline tests for more fract patterns (#189092) 2026-03-27 19:38:54 +00:00
Joseph Huber
871d675c52
[compiler-rt] Add PTX feature specifically when CUDA is not available (#189083)
Summary:
People need to be able to build this without a CUDA installation.

Long term we should bump up the minimum version as I'm pretty sure every
architecture before this has been deprecated by NVIDIA.
2026-03-27 14:28:25 -05:00
Aiden Grossman
df6d6c9cd1
[Scudo] Disable ScudoCombinedTests.NewType (#189070)
This is failing in some configurations on AArch64 Linux. Given there are
a lot of follow-up commits that makes this hard to revert, just disable
it for now pending future investigation.
2026-03-27 12:15:46 -07:00
Aditya Goyal
ba44df4b88
[clang-format] Add pre-commit CI env var support to git-clang-format (#188816)
When git-clang-format is invoked with no explicit commit arguments and
both PRE_COMMIT_FROM_REF and PRE_COMMIT_TO_REF are set, the script
automatically uses those refs as the diff range and implies --diff. If
the variables are absent, existing behavior is fully preserved.

This allows projects to use `git-clang-format` directly inside CI
pipelines via the [pre-commit](https://pre-commit.com/) framework
without any wrapper scripts or extra configuration.


Closes: #188813

No existing lit test suite for this script. Verified manually that env
vars activate two-commit diff mode, existing behavior is preserved
without them, and explicit CLI args always override them.
2026-03-27 20:15:32 +01:00
fineg74
1611a23a5b
[OFFLOAD] Add spirv implementation for named barrier (#180393)
This change adds implementation for named barriers for SPIRV backend.
Since there is no built in API/intrinsics for named barrier in SPIRV,
the implementation loosely follows implementation for AMD
2026-03-27 20:14:09 +01:00
Jun Wang
3c625a179f
[AMDGPU][MC] Improving assembler error message for unsupported instructions (#185778)
The updated error message shows both the instruction name and the GPU
target name.
2026-03-27 12:04:58 -07:00
Mehdi Amini
509f181f40
[MLIR][TableGen] Fix ArrayRefParameter in struct format roundtrip (#189065)
When an ArrayRefParameter (or OptionalArrayRefParameter) appears in a
non-last position within a struct() assembly format directive, the
printed
output is ambiguous: the comma-separated array elements are
indistinguishable from the struct-level commas separating key-value
pairs.

Fix this by wrapping such parameters in square brackets in both the
generated printer and parser. The printer emits '[' before and ']' after
the array value; the parser calls parseLSquare()/parseRSquare() around
the
FieldParser call. Parameters with a custom printer or parser are
unaffected
(the user controls the format in that case).

Fixes #156623

Assisted-by: Claude Code
2026-03-27 18:41:46 +00:00
Kewen Meng
a996f2a8db
Revert "AMDGPU: Fold frame indexes into disjoint s_or_b32" (#189074)
Reverts llvm/llvm-project#102345

unblock bot: https://lab.llvm.org/buildbot/#/builders/10/builds/25403
2026-03-27 18:33:01 +00:00
Md Abdullah Shahneous Bari
88bc265295
[XeVM] Use ocloc for binary generation. (#188331)
XeVM currently doesn't support native binary generation. This PR enables
Ahead of Time (AOT) compilation of gpu module to native binary using
`ocloc`.

Currently, only works with LevelZeroRuntimeWrappers.
2026-03-27 13:29:33 -05:00
fineg74
34a4fe5bc9
[OFFLOAD] Fix a build break (#189076)
This PR fixes a build break reported after introduction of spirv
function declarations
2026-03-27 18:28:51 +00:00
NeKon69
c703ea52be
[HLSL][DirectX][SPIRV] Implement the fma API (#185304)
This PR adds `fma` HLSL intrinsic (with support for matrices)
It follows all of the steps from #99117.
Closes #99117.
2026-03-27 14:12:48 -04:00
Thurston Dang
3d5a2552c5
[msan] Disambiguate "Strict" vs. "Heuristic" when dumping instructions (#188873)
When -msan-dump-strict-instructions and
-msan-dump-heuristic-instructions are simultaneously enabled, it is
unclear from the output whether each instruction is strictly vs.
heuristically handled. [*] This patch fixes the issue by tagging the
output.

The actual instrumentation of the code is unaffected by this change.

[*] A workaround is to compile the code once with only
-msan-dump-strict-instructions, and a second time with
-msan-dump-heuristic-instructions, but this unnecessarily doubles the
compilation time.
2026-03-27 11:00:59 -07:00
Ehsan Amiri
00aebbff71
[DA] Refactor signature of weakCrossingSIVtest and check inputs (NFCI) (#187117)
Passing SCEVAddRecExpr objects directly to weakCrossingSIVtest and
checking the validity of the input operands
2026-03-27 13:57:08 -04:00
Alexey Samsonov
ead9ac8331
[libc] Remove header templates from several C standard headers. (#188878)
Switches the following headers to hdrgen-produced ones by referencing
some macro from C standard and the file containing the declarations in
corresponding YAML files:

* limits.h (referenced _WIDTH / _MAX / _MIN families).
* locale.h (referenced LC_ family).
* time.h (referenced CLOCKS_PER_SEC).
* wchar.h (referenced WEOF).
2026-03-27 17:55:37 +00:00
Ben Dunbobbin
80b304d14b
[DTLTO] Improve performance of adding files to the link (#186366)
The in-process ThinLTO backend typically generates object files in
memory and adds them directly to the link, except when the ThinLTO cache
is in use. DTLTO is unusual in that it adds files to the link from disk
in all cases.

When the ThinLTO cache is not in use, ThinLTO adds files via an
`AddStreamFn` callback provided by the linker, which ultimately appends
to a `SmallVector` in LLD. When the cache is in use, the linker supplies
an `AddBufferFn` callback that adds files more efficiently (by moving
`MemoryBuffer` ownership).

This patch adds a mandatory `AddBufferFn` to the DTLTO ThinLTO backend.
The backend uses this to add files to the link more efficiently.
Additionally:
- Move AddStream from CGThinBackend to InProcessThinBackend, for reader
  clarity.
- Modify linker comments that implied the AddBuffer path is
  cache-specific.

For a Clang link (Debug build with sanitizers and instrumentation) using
an optimized toolchain (PGO non-LTO, llvmorg-22.1.0), measuring the mean
`Add DTLTO files to the link` time trace scope duration:
- On Windows (Windows 11 Pro Build 26200, AMD Family 25 @ ~4.5 GHz, 16
  cores/32 threads, 64 GB RAM), this patch reduces the mean from
  2799.148 ms to 157.972 ms.
- On Linux (Ubuntu 24.04.3 LTS Kernel 6.14, Ryzen 9 5950X, 16
  cores/32 threads, boost up to 5.09 GHz, 64 GB RAM), this patch reduces
  the mean from 255.291 ms to 41.630 ms.

Based on work by @romanova-ekaterina and @kbelochapka.
2026-03-27 17:51:49 +00:00
Ben Dunbobbin
d271bd37ce
Revert "[DTLTO] Speed up temporary file removal in the ThinLTO backed (#189043)
This reverts commit 11b439c5c5a07c95d30ce25abd6adf7f5fbb7105.

timeTraceProfilerCleanup() can be called before the temporary file
deletion has completed in LLD. This causes memory leaks that were
flagged up by sanitizer builds, e.g.:

https://lab.llvm.org/buildbot/#/builders/24/builds/18840/steps/11/logs/stdio
2026-03-27 17:48:57 +00:00
Pengcheng Wang
7e2f78923c
[RISCV][NFC] Use enum types to improve debuggability (#188418)
So that we can see the enum values instead of integral values when
dumping in debuggers.
2026-03-28 01:42:49 +08:00
Jeff Bailey
030ef70908
[libc][docs] Document libc-shared-tests ninja target (#189062)
Added a brief description of the libc-shared-tests target to the
Building and Testing page.

This target allows running tests for shared standalone components like
math primitives without the full libc runtime.
2026-03-27 17:39:38 +00:00
Sirraide
bd947ea6fd
[Clang] [Sema] Don't diagnose multidimensional subscript operators on dependent types (#188910)
I forgot to check for dependent types in #187828; we somehow didn’t have
tests for this so CI didn’t catch this...
2026-03-27 18:39:22 +01:00
Mehdi Amini
cb58fe9df5
[MLIR][SCF] Fix loopUnrollByFactor for unsigned loops with narrow integer types (#189001)
`loopUnrollByFactor` used `getConstantIntValue()` to read loop bounds,
which sign-extends the constant to `int64_t`. For unsigned `scf.for`
loops with narrow integer types (e.g. i1, i2, i3), this produces wrong
results: a bound such as `1 : i1` has `getSExtValue() == -1` but should
be treated as `1` (unsigned).

Two bugs were introduced by this:

1. **Wrong epilogue detection**: the comparison `upperBoundUnrolledCst <
ubCst` used signed int64, so e.g. `0 < -1` (where ubCst is the
sign-extended i1 value 1) evaluated to false, suppressing the epilogue
that should execute the remaining iterations.

2. **Zero step after overflow**: when `tripCountEvenMultiple == 0` (all
iterations go to the epilogue), `stepUnrolledCst = stepCst *
unrollFactor` can overflow the bound type's bitwidth and wrap to 0. A
zero step causes `constantTripCount` to return `nullopt`, preventing the
zero-trip main loop from being elided.

Fix:
- Use zero-extension (`getZExtValue`) instead of sign-extension when
reading bounds for unsigned loops.
- When `tripCountEvenMultiple == 0`, keep the original step for the main
loop to avoid the zero-step issue (the step value is irrelevant for a
zero-trip loop anyway).

Fixes #163743

Assisted-by: Claude Code
2026-03-27 18:36:51 +01:00
Jianhui Li
28e2fa3247
[MLIR][XeGPU] Extend convert_layout op to support scalar type (#188874)
This PR adds scalar type to convert_layout op's result and operand. It
also enhance convert_layout pattern in wg-to-sg, unrolling, and
sg-to-lane distribution.

It is to support reduction to scalar, whether currently the layout
propagation doesn't support scalar to carry any layout. The design
choice to insert convert_layout op after reduction-to-scalar op to
record the layout information permanently across the passes.
2026-03-27 10:36:35 -07:00
Petter Berntsson
2af95b2fa2
[libc][docs] Fix POSIX basedefs links for nested headers (#188738)
Fix broken POSIX basedefs links for nested headers in llvm-libc docs.

The docgen script currently emits paths like `sys/wait.h.html`, but the
Open Group uses `sys_wait.h.html`, for example:
-
https://pubs.opengroup.org/onlinepubs/9799919799/basedefs/sys_wait.h.html

This updates nested-header link generation while leaving flat headers
unchanged.
2026-03-27 17:30:43 +00:00
Sergei Barannikov
22cfe6f39d
[lldb] Make single-argument Address constructor explicit (NFC) (#189035)
This is to highlight places where we (probably unintentionally)
construct an `Address` object from an already resolved address, making
it unresolved again.
See the changes in `DynamicLoaderDarwin.cpp` for a quick example.

Also, use this constructor instead of `Address(lldb::addr_t file_addr,
const SectionList *section_list)` when `section_list` is `nullptr`.
2026-03-27 20:22:48 +03:00
Han-Chung Wang
9e44babdaf
[mlir][vector] Add support for dropping inner unit dims for transfer_read/write with masks. (#188841)
The revision clears a long-due TODO, which supports the lowering when
transfer_read/write ops have mask via inserting a vector.shape_cast op
for the masked value.

---------

Signed-off-by: hanhanW <hanhan0912@gmail.com>
2026-03-27 10:21:20 -07:00
Ryan Buchner
a125d9b5ef
[SLP][NFC] Reapply "Refactor to prepare for constant stride stores" (#188689)
Refactor to proceed #185964.

Much of this is a refactor to address this issues. Instead of iterating over one chain at a time, attempting all VFs for that given change, we now iterate over VFs, trying each chain for the current VF.

Includes fix for use after free bug.
2026-03-27 10:11:49 -07:00
vangthao95
87bec47152
AMDGPU/GlobalISel: RegBankLegalize rules for div_fmas/fixup/scale (#188305) 2026-03-27 10:10:09 -07:00
Joseph Huber
f52797c54d
[compiler-rt] Fix irrelevant warning on the builtins target (#189055)
Summary:
Currently, building through runtimes will yield this warning:
```
  CMake Warning at compiler-rt/cmake/Modules/CompilerRTUtils.cmake:335 (message):
    LLVMTestingSupport not found in LLVM_AVAILABLE_LIBS
  Call Stack (most recent call first)
```

This is due to the fact that the builtins target does not go through the
s tandard runtimes patch and sets them as BUILDTREE_ONLY so they do not
show up. These are not used in this case, so just guard the condition to
suppress the warning.
2026-03-27 12:07:42 -05:00
Joseph Huber
15bfc06b6b
[Offload][NFC] Various minor changes to Offload CMake (#189029)
Summary:
Most of these just remove some redundancy or rename `openmp` ->
`offload` where the variable is purely internal.
2026-03-27 12:06:37 -05:00