575286 Commits

Author SHA1 Message Date
Michael Kruse
afb80bddf1
[Runtimes] Introduce variables containing resource dir paths (#177953)
Introduce common infrastructure for runtimes that determines compiler
resource path locations. These variables introduced are:

 * RUNTIMES_OUTPUT_RESOURCE_DIR
 * RUNTIMES_INSTALL_RESOURCE_PATH
 
That contain the location for the compiler resource path (typically
`lib/clang/<version>`) in the build tree and the install tree (the
latter relative to CMAKE_INSTALL_PREFIX).

Additionally, define

 * RUNTIMES_OUTPUT_RESOURCE_LIB_DIR
 * RUNTIMES_INSTALL_RESOURCE_LIB_PATH

as for the location of clang/flang version-locked libraries (typically
`lib${LLVM_LIBDIR_SUFFIX}/<targer-triple>`, but also depends on `APPLE`
and `LLVM_ENABLE_PER_TARGET_RUNTIME_DIR`). This code is moved from
flang-rt and initially becomes its only user.

Refactored out of #171610 as requested
[here](https://github.com/llvm/llvm-project/pull/171610#discussion_r2687382481).

Extracted `get_runtimes_target_libdir_common` from compiler-rt as
requested
[here](https://github.com/llvm/llvm-project/pull/171610#discussion_r2689565634).
 
Added TODO comments to all runtimes as requested
[here](https://github.com/llvm/llvm-project/pull/171610#issuecomment-3789598635).
2026-04-02 10:32:14 +00:00
Henrich Lauko
57ee29a2a1
[CIR] Implement isMemcpyEquivalentSpecialMember for trivial copy/move ctors (#186700)
Implements isMemcpyEquivalentSpecialMember in CIR codegen so that
trivial copy/move constructors and defaulted union copy/move ops emit a
cir.copy directly instead of making a real constructor call. The logic
is shared with OG codegen by moving the implementation into ASTContext,
where it also gains the pointer field protection (PFP) check that was
previously missing in CIR.
2026-04-02 12:31:53 +02:00
Nerixyz
91b90652bb
Reland "[CodeView] Generate S_DEFRANGE_REGISTER_REL_INDIR" (#189401)
Initially added in #187709. It was reverted in #188833, because
[llvm-clang-x86_64-sie-win](https://lab.llvm.org/buildbot/#/builders/46/builds/32873)
was failing in
`cross-project-tests/debuginfo-tests/dexter-tests/nrvo.cpp`.

The test passed for me locally. After checking on another machine, I
found that `S_DEFRANGE_REGISTER_REL_INDIR` is only supported by
dbgeng/WinDbg from Windows 10.0 Build 19041 (released 2020) onwards.
SDKs before this will fail to read the value. That buildbot is on
Windows 10.0 Build 17763.

I'm not sure if we should make the generation of that record
conditional. Debuggers that can't read the record will skip it. They'll
still see that there's some local variable, but won't be able to display
the value.

As far as I know, users of older Windows 10 builds should be able to
install a newer Windows SDK and use the WinDbg from that version. But I
haven't tested that.
2026-04-02 12:15:11 +02:00
David Spickett
c329cc59d9
[lldb][test][NFC] Move register command tests (#190144)
For whatever reason we ended up with register/register but the first
register just had the second register folder in it.

Move the files up one level so we have register/<test files>.
2026-04-02 11:13:44 +01:00
Ricardo Jesus
9ff2ef9711
[AArch64][SVE] Define pseudos for arithmetic immediate instructions. (#188579)
This patch uses DestructiveBinaryShImmUnpred (which was previously
unused as far as I could tell) to define pseudos for arithmetic
immediate instructions such as ADD (immediate), which allows using
MOVPRFX with these instructions.
2026-04-02 11:07:46 +01:00
Jiachen Yuan
d0bf354828
[ADT] Reinstate "Refactor Bitset to Be More Constexpr-Usable" (#189497)
Reland of #172062 (a71b1d2), which was reverted in b0234d1.

This patch makes essential Bitset member functions constexpr (`set()`,
`any()`, `none()`, `count()`, `operator==`, `!=`, `<`, `\~`) and adds a
new `all()` method. It also introduces a `maskLastWord()` invariant to
ensure unused high bits in the last word are always zero, which is
required for correctness of `operator~`, `set()`, `all()`, and
comparisons on non-word-aligned sizes (e.g., `Bitset<33>`).

Changes from the original reverted PR:
- Replaced `llvm::any_of` with an inline loop to avoid depending on
constexpr `any_of`/`none_of` from `STLExtras` (#172536), which was also
reverted due to a GCC 15.2.1 bootstrap miscompile.
- The patch is now fully self-contained with no prerequisite changes.

Motivation: This is a prerequisite for making `LaneBitmask` a wrapper
around `Bitset`, enabling scalable lane bitmasks beyond 64 bits
(https://discourse.llvm.org/t/rfc-out-of-lanebitmask-bits-again/88613).
2026-04-02 11:50:10 +02:00
Simi Pallipurath
dc9be4ee30
[LLD][ELF] Skip non-inputsections to avoid invalid cast in Arm BE8 handling (#188154)
This patch fixes https://github.com/llvm/llvm-project/issues/187033

In BE8 mode, instruction bytes are reversed for sections containing
code. This logic currently assumes that arm mapping symbols (e.g. $a,
$t, $d) are always associated with InputSections.

However, mapping symbols can also be defined in other section types such
as mergeable sections (SHF_MERGE). These are not represented as
InputSection, and attempting to cast them using
cast_if_present<InputSection> results in an assertion failure.
2026-04-02 10:16:54 +01:00
Alexandros Lamprineas
4c9a739c5e
[BOLT][AArch64] Strip uneeded labels from FEAT_CMPBR tests. (#189931)
Eliminates the temporary labels so that BOLT does not recognize them as
secondary entry points.
2026-04-02 10:16:41 +01:00
Ramkumar Ramachandra
d835dd2b43
[LV] Strip createStepForVF (NFC) (#185668)
The mul -> shl simplification is already done in VPlan.
2026-04-02 10:04:37 +01:00
Julian Oppermann
018e048daf
[MLIR][Linalg] Generic to category specialization for unary elementwise ops (#187217)
Handle specialization of `linalg.generic` ops representing a unary
elementwise computation to the `linalg.elementwise` category op. This
implements a previously absent path in the linalg morphism.
2026-04-02 10:50:21 +02:00
Elvis Wang
81691d23cd
[RISCV][TTI] Update cost and prevent exceed m8 for vector.extract.last.active (#188160)
This patch contains two parts.
1. Update costs reflect to the codegen changes. This is not that
accurate since the step vector can use smaller type if there is a
vscale_range attribute. But we cannot get that in the type-based query
in TTI.
2. Return invalid cost for the vector.extract.last.active that needs
vector split for the step vector. But currently this is not handled
correctly and will hit the assertion.

For not blocking the FindLast reduction in LV
(https://github.com/llvm/llvm-project/pull/184931). We should land this
first and fix the SelectionDAG for vector.extract.last.active lowering.
2026-04-02 16:49:05 +08:00
Sander de Smalen
703d43ca3b
[CostModel] Move default expand cost for partial reductions to BasicTTIImpl (#189905)
This is a follow-up of the suggestion left here:

https://github.com/llvm/llvm-project/pull/181707#discussion_r2995733831

The override functions in AMDGPU/ARM/SystemZ/X86 are required to avoid
enabling partial reductions where they were previously disabled (I've
added this for all targets that implement getArithmeticReductionCost).
2026-04-02 09:42:53 +01:00
David Spickett
5f6835daf4
[lldb][AArch64][Linux] Qualify uses of user_sve_header (#190130)
Fixes #165413. Where a build failure was reported:
```
/b/s/w/ir/x/w/llvm-llvm-project/lldb/source/Plugins/Process/Linux/NativeRegisterContextLinux_arm64.cpp:1182:9: error: unknown type name 'user_sve_header'; did you mean 'sve::user_sve_header'?
 1182 |         user_sve_header *header =
      |         ^~~~~~~~~~~~~~~
      |         sve::user_sve_header
```
To fix this, add sve:: as we do for all other uses of this.

This is LLDB's copy of a structure that Linux also defines. I think the
build worked on some machines because that version ended up being
included, but with a more isolated build, it may not.

We have our own definition of it so we can be sure what we're using in
case Linux extends it later.
2026-04-02 08:29:34 +00:00
wanglei
76fc936175
[Clang][LoongArch] Align LSX/LASX built-in signatures with intrinsic types to avoid lax conversions (#189900)
Update the built-in signatures in BuiltinsLoongArchLSX.def and
BuiltinsLoongArchLASX.def to precisely match the vector types used in
the corresponding intrinsic headers (lsxintrin.h and lasxintrin.h).

This alignment ensures that these intrinsics can be compiled
successfully even when -flax-vector-conversions=none is specified, since
the built-in arguments no longer rely on implicit vector type
conversions.

Added new test cases to verify the macro-defined LSX/LASX
intrinsic interfaces under -flax-vector-conversions=none.

Fixes #189898
2026-04-02 16:11:22 +08:00
Arseniy Zaostrovnykh
e3cfcf48d0
[clang][analyzer] Forward CTU-import failure conditions
Forward all CTU-import failures as diagnostics (remarks, warnings,
errors), except for `index_error_code::missing_definition` which has the
potential of generating too many diagnostics.

--
CPP-7804
2026-04-02 07:59:52 +00:00
Gabriel Baraldi
5e0a06b34d
Move ExpandMemCmp and MergeIcmp to the middle end (#77370)
Moving these into the middle-end pipeline will allow for additional
optimization of the expansion result, such as CSE of redundant loads
(c.f. https://godbolt.org/z/bEna4Md9r). For now, we conservatively place
the passes at the end of the middle-end pipeline, so we mostly don't
benefit from additional optimizations yet. The pipeline position will be
moved in a future change.

This builds on work done by legrosbuffle in
https://reviews.llvm.org/D60318.

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-02 09:57:00 +02:00
Zorojuro
a599a06e7c
[libc] Indentation consistency in CMake (#190120)
This PR just fixes the indentation/style for the whole CMake file for
consistency.
No other changes.
c698f55b0245ffbaae55c7f854fadba33df16e9d
2026-04-02 08:51:52 +01:00
Weibo He
7ccd1cb9a4
Reland "[CoroSplit] Erase trivially dead allocas after spilling (#189295)" (#190124)
The original PR contained a use-after-delete issue, which has been
resolved in #189521.

Reland #189295, which is reverted in #189311
2026-04-02 07:45:13 +00:00
Nikita Popov
1662c200a5
[Passes][LoopRotate] Move minsize handling fully into pass (#189956)
Make this dependent only on the minsize attribute and drop the pipeline
handling.

Rename the enable-loop-header-duplication option to
enable-loop-header-duplication-at-minsize to clarify that it controls
header duplication at minsize only (in other cases it is enabled by
default, independently of this option).
2026-04-02 09:32:56 +02:00
Nikita Popov
40e7fa632d
[Passes][FuncSpec] Move optsize/minsize handling into pass (#189952)
Instead of using the Os/Oz level during pass pipeline construction,
query the optsize/minsize attribute on the function to determine whether
specialization is allowed to take place. This ensures consistent
behavior for per-function attributes.

It's worth noting that FuncSpec *already* checks for minsize, but at the
call-site level.
2026-04-02 09:32:39 +02:00
Hans Wennborg
3b81be803f
WholeProgramDevirt: Import/export the CVP byte directly in the summary (#188979)
rather than using absolute symbol constants on ELF/x86.

This leads to better codegen as the absolute symbol constants were not
resolved until link time (see bug for example).

Fixes #188470
2026-04-02 09:28:32 +02:00
David Rivera
e3cbd9984a
[CIR][AMDGPU] Lower Language specific address spaces and implement AMDGPU target (#179084) 2026-04-02 03:00:14 -04:00
Fangrui Song
6f9646a598
[ELF] Parallelize --gc-sections mark phase (#189321)
Add `markParallel` using level-synchronized `parallelFor`. Each BFS
level is processed in parallel; newly discovered sections are collected
in per-thread queues and merged for the next level.

The parallel path is used when `!TrackWhyLive && partitions.size()==1`.
`parallelFor` naturally degrades to serial when `--threads=1`.

Uses depth-limited inline recursion (depth<3) and optimistic
load-then-exchange dedup for best performance.

Linking a Release+Asserts clang (--gc-sections, --time-trace) on an old
x86-64:

8 threads: markLive 315ms -> 82ms (-234ms). Total 1562ms -> 1350ms
(1.16x).
16 threads: markLive 199ms -> 50ms (-149ms). Total 1017ms -> 862ms
(1.18x).

and on Apple M4: markLive 61ms -> 13ms. Total 317.3ms -> 272.7ms
(1.16x).
2026-04-02 06:42:00 +00:00
David Green
083f9c158a
[AArch64][GISel] Widen non-power2 element sizes for ctlz. (#189371)
This addresses an illegal mutation kind, where gisel would hit an
assert. It expands vector elements for non-power2 elements or elements
less that i8 to a power of 2.

A fix to handle vector types correctly was needed in LegalizerHandler.

Fixes #185411
2026-04-02 07:27:12 +01:00
Fangrui Song
6a87416162
[ELF] Move Symbol::used to atomic flags field (#190117)
Move the `used` bitfield into the existing `std::atomic<uint16_t>
flags`,
making it safe for concurrent access from parallel GC mark (#189321).
2026-04-01 23:21:13 -07:00
Paul Kirth
802d4631e0
[clang-doc] Update lookup routines for consistency (#190043)
When filtering is enabled, its possible an Info doesn't have a 
Parent USR. Use `find()` to safely handle that case.

Additionally, I noticed the comparison code for the index
poorly reimplemented the existing comparison from StringRef.
We can just use the one from ADT.
2026-04-01 23:17:42 -07:00
Craig Topper
68cbcf7ec2
[RISCV] Check EnsureWholeVectorRegisterMoveValidVTYPE in RISCVInsertVSETVLI::transferBefore. (#190022)
Fixes #189786
2026-04-01 23:14:38 -07:00
Fangrui Song
2118499a89
[ELF] Decouple SharedFile::isNeeded from GC mark. NFC (#190112)
... out of the per-relocation resolveReloc and into a post-GC scan of
global symbols. This decouples the --as-needed logic from the mark
algorithm, simplifying the imminent parallel GC mark.
2026-04-01 22:42:51 -07:00
Luke Lau
2a7ca3a3fa
[RISCV] Remove codegen for vp_ctlz, vp_cttz, vp_ctpop (#189904)
Part of the work to remove trivial VP intrinsics from the RISC-V
backend, see
https://discourse.llvm.org/t/rfc-remove-codegen-support-for-trivial-vp-intrinsics-in-the-risc-v-backend/87999

This splits off 3 intrinsics from #179622.

Note that vp.cttz is the elementwise version, not vp.cttz.elts.
2026-04-02 05:26:41 +00:00
Fangrui Song
0bde74ab04
[ELF] Pass SectionPiece by reference in getSectionPiece. NFC (#190110)
The generated assembly looks more optimized. In addition, this avoids
widened load, which would cause a TSan-detected data race with parallel
--gc-sections (#189321).
2026-04-01 22:07:42 -07:00
Lang Hames
3346a76d32
[JITLink] Remove unnecessary SymbolStringPtr copy. (#190101)
This was probably intended to be a `const SymbolStringPtr&` originally,
but if we were going to copy it anyway it's better to just take the
argument by value and std::move it.
2026-04-02 15:53:42 +11:00
zGoldthorpe
9a354fc5a1
[SelectionDAG] Use KnownBits to determine if an operand may be NaN. (#188606)
Given a bitcast into a fp type, use the known bits of the operand to
infer whether the resulting value can never be NaN.
2026-04-01 22:47:01 -06:00
Chaitanya
dbc206f35d
[CIR][CIRGen] Support for section atttribute on cir.global (#188200)
Upstreaming clangIR PR: https://github.com/llvm/clangir/pull/422

This PR implement support for `__attribute__((section("name")))` on
global variables in ClangIR, matching OGCG behavior.
2026-04-02 09:58:17 +05:30
Diego Novillo
06aae40c6d
[HLSL][SPIRV] Restore support for -g to generate NSDI (#190007)
The original attempt (#187051) produced a regression for
`intel-sycl-gpu` because `SPIRVEmitNonSemanticDI` will now self-activate
whenever `llvm.dbg.cu` is present. This removed the need for the
explicit `--spv-emit-nonsemantic-debug-info` flag.

The pass is now entered unconditionally for all SPIR-V targets, but
`NonSemantic.Shader.DebugInfo.100` requires the
`SPV_KHR_non_semantic_info`. Targets like `spirv64-intel` do not enable
that extension by default. When `checkSatisfiable()` ran on those
targets, it issued a fatal error rather than silently skipping.

Adds an early-out from `emitGlobalDI()`: if
`SPV_KHR_non_semantic_info` is not available for the current target, the
pass returns without emitting anything.
2026-04-01 21:00:36 -07:00
Sudharsan Veeravalli
18a065763d
[RISCV] Move unpaired instruction back in RISCVLoadStoreOptimizer (#189912)
There are cases when the `Xqcilsm` vendor extension is enabled that we
are unable to pair non-adjacent load/store instructions. The
`RISCVLoadStoreOptimizer` moves the instruction adjacent to the other
before attempting to pair them but does not move them back when it
fails. This can sometimes prevent the generation of the `Xqcilsm`
load/store multiple instructions. This patch ensures that we move the
unpaired instruction back to it's original location.
2026-04-02 09:18:58 +05:30
wangjue
8c2feea2f7
[BOLT] Delete unnecessary instructions (#189297) 2026-04-02 06:48:38 +03:00
yebinchon
495e1a4257
[mlir] added a check in the walk to prevent catching a cos in a nested region (#190064)
The walk in SincosFusion may detect a cos within a nested region of the
sin block. This triggers an assertion in `isBeforeInBlock` later on.
Added a check within the walk so it filters operations in nested
regions, which are not in the same block and should not be fused anyway.

---------

Co-authored-by: Yebin Chon <ychon@nvidia.com>
2026-04-01 20:10:56 -07:00
lntue
d52daeac79
[libc] Fix the remaining long double issue in shared_math_test.cpp. (#190098) 2026-04-01 22:47:29 -04:00
Simon Pilgrim
c8c7186b46
[X86] LowerRotate - expand vXi8 non-uniform variable rotates using uniform constant rotates (#189986)
We expand vXi8 non-uniform variable rotates as a sequence of uniform
constant rotates along with a SELECT depending on whether the original
rotate amount needs it

This patch removes premature uniform constant rotate expansion to the
OR(SHL,SRL) sequences to allow GFNI targets to use single VGF2P8AFFINEQB
calls
2026-04-02 02:30:59 +00:00
Fangrui Song
8daaa26efd
[Support] Support nested parallel TaskGroup via work-stealing (#189293)
Nested TaskGroups run serially to prevent deadlock, as documented by
https://reviews.llvm.org/D61115 and refined by
https://reviews.llvm.org/D148984 to use threadIndex.

Enable nested parallelism by having worker threads actively execute
tasks from the work queue while waiting (work-stealing), instead of
just blocking. Root-level TaskGroups (main thread) keep the efficient
blocking Latch::sync(), so there is no overhead for the common
non-nested case.

In lld, https://reviews.llvm.org/D131247 worked around the limitation
by passing a single root TaskGroup into OutputSection::writeTo and
spawning 4MB-chunked tasks into it. However, SyntheticSection::writeTo
calls with internal parallelism (e.g. GdbIndexSection,
MergeNoTailSection) still ran serially on worker threads. With this
change, their internal parallelFor/parallelForEach calls parallelize
automatically via helpSync work-stealing.

The increased parallelism can reorder error messages from parallel
phases (e.g. relocation processing during section writes), so one lld
test is updated to use --threads=1 for deterministic output.
2026-04-01 19:20:16 -07:00
Anshul Nigham
dee982d6c8
[NewPM] Adds a port for AArch64PostCoalescerPass (#189520)
Adds a standard porting for AArch64PostCoalescer to NewPM.
2026-04-01 19:18:18 -07:00
Anshul Nigham
e27e7e4339
[NFC][AAarch64] Remove PreLegalizerCombiner pass dependency on TargetPassConfig (#190073)
This will enable NewPM porting.

Replaced with the definition in
[AArch64PassConfig::getCSEConfig](1d549d9a77/llvm/lib/Target/AArch64/AArch64TargetMachine.cpp (L614))
2026-04-01 19:09:37 -07:00
Chuanqi Xu
c97e08e331
[C++20] [Modules] Add VisiblePromoted module ownership kind (#189903)
This patch adds a new ModuleOwnershipKind::VisiblePromoted to handle
declarations that are not visible to the current TU but are promoted to
be visible to avoid re-parsing.

Originally we set the visible visiblity directly in such cases. But
https://github.com/llvm/llvm-project/issues/188853 shows such decls may
be excluded later if we import #include and then import. So we have to
introduce a new visibility to express the intention that the visibility
of the decl is intentionally promoted.

Close https://github.com/llvm/llvm-project/issues/188853
2026-04-02 10:01:32 +08:00
lntue
096f9d0aa8
[libc] Initial support so that libc-shared-tests can be built with pp64le (#188882) 2026-04-01 20:55:44 -04:00
Zhaoxuan Jiang
fd609e5d33
[lld] Glob-based BP compression sort groups (#185661)
Add
--bp-compression-sort-section=<glob>[=<layout_priority>[=<match_priority>]]
to let users split input sections into multiple compression groups, run
balanced partitioning independently per group, and leave out sections
that are poor candidates for BP. This replaces the old coarse
--bp-compression-sort with a more explicit, user-controlled one.

In ELF, the glob matches input section names (.text.unlikely.cold1). In
Mach-O, it matches the concatenated segment+section name (__TEXT__text).

layout_priority controls group placement in the final layout.
match_priority resolves conflicts when multiple globs match the same
section: explicit priority beats positional matching, and among
positional specs the last match wins.

A CRTP hook getCompressionSubgroupKey() allows backends to further
subdivide glob groups into independent BP instances. This allows Mach-O
backend to separate cold functions via N_COLD_FUNC in the future.

The deprecated --bp-compression-sort option keeps its existing
function/data behavior by assigning sections to fixed legacy groups.
2026-04-01 17:53:08 -07:00
Jim Lin
3d7eedce56
[RISCV] Fix stackmap shadow trimming NOP size for compressed targets (#189774)
The shadow trimming loop in LowerSTACKMAP hardcoded a 4-byte decrement
per instruction, but when Zca is enabled NOPs are 2 bytes. Use NOPBytes
instead of the hardcoded 4 so the shadow is correctly trimmed on
compressed targets.

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-02 08:21:33 +08:00
Jim Lin
b9e01c26f0
[RISCV] Relax VL constraint in convertSameMaskVMergeToVMv (#189797)
When converting a PseudoVMERGE_VVM to PseudoVMV_V_V, we previously
required MIVL <= TrueVL to avoid losing False elements in the tail.

Relax this constraint when the vmerge's False operand equals its
Passthru operand and the True instruction's tail policy is TU
(tail undisturbed). In this case, True's tail lanes preserve its
passthru value (which equals False and Passthru), so the conversion
is safe even when MIVL > TrueVL.

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-02 08:12:48 +08:00
Christopher Ferris
7c260d3966
[scudo] Fix reallocate for MTE. (#190086)
For MTE, we can't use the whole size or we might trigger a segfault.
Therefore, use the exact size when MTE is enabled or the exact usable
size parameter is true.

Also, optimize out the call to getUsableSize and use a simpler
calculation.
2026-04-01 16:44:31 -07:00
Demetrius Kanios
29391328ab
[WebAssembly][GlobalISel] CallLowering lowerFormalArguments (#180263)
Implements `WebAssemblyCallLowering::lowerFormalArguments`

Split from #157161
2026-04-01 16:12:38 -07:00
Zorojuro
52fb23eef8
[libc][math] Remove static from log1pf implementation (#190042)
Reflecting changes according to
823e3e0017
2026-04-01 19:01:44 -04:00