575643 Commits

Author SHA1 Message Date
Jonas Devlieghere
271a08889b
[lldb] Load scripts from code signed dSYM bundles (#189444)
LLDB automatically discovers, but doesn't automatically load, scripts in
the dSYM bundle. This is to prevent running untrusted code. Users can
choose to import the script manually or toggle a global setting to
override this policy. This isn't a great user experience: the former
quickly becomes tedious and the latter leads to decreased security.

This PR offers a middle ground that allows LLDB to automatically load
scripts from trusted dSYM bundles. Trusted here means that the bundle
was signed with a certificate trusted by the system. This can be a
locally created certificate (but not an ad-hoc certificate) or a
certificate from a trusted vendor.
2026-04-03 17:04:47 +00:00
Rafael Auler
7da3a66c06
[BOLT] Check for write errors before keeping output file (#190359)
Summary:
When the disk runs out of space during output file writing, BOLT would
crash with SIGSEGV/SIGABRT because raw_fd_ostream silently records write
errors and only reports them via abort() in its destructor. This made it
difficult to distinguish real BOLT bugs from infrastructure issues in
production monitoring.

Add an explicit error check on the output stream before calling
Out->keep(), so BOLT exits cleanly with exit code 1 and a clear error
message instead.

Test: manually verified with a full filesystem that BOLT now prints
"BOLT-ERROR: failed to write output file: No space left on device" and
exits with code 1.
2026-04-03 10:02:36 -07:00
Osman Yasar
150042141c
[GlobalISel] Add sub(-1, x) -> (xor x, -1) from SelectionDAG (#181014)
This PR adds the pattern `// (sub -1, x) -> (xor x, -1)` to GlobalISel
from SelectionDAG.

Original SelectionDAG rewrite:
5b4811eddb/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp (L4305)

---------

Co-authored-by: Jay Foad <jay.foad@gmail.com>
2026-04-03 17:53:30 +01:00
vangthao95
df1e67b379
AMDGPU/GlobalISel: RegBankLegalize rules for s_memtime, s_get_waveid (#190268) 2026-04-03 09:46:56 -07:00
Sander de Smalen
730a07f225
[LV] Only create partial reductions when profitable. (#181706)
We want the LV cost-model to make the best possible decision of VF and
whether or not to use partial reductions. At the moment, when the LV can
use partial reductions for a given VF range, it assumes those are always
preferred. After transforming the plan to use partial reductions, it
then chooses the most profitable VF. It is possible for a different VF
to have been more profitable, if it wouldn't have chosen to use partial
reductions.

This PR changes that, to first decide whether partial reductions are
more profitable for a given chain. If not, then it won't do the
transform.

This causes some regressions for AArch64 which are addressed in a
follow-up PR to keep this one simple.
2026-04-03 17:42:51 +01:00
Ryotaro Kasuga
85e2a36501
[DA] Remove dead code from the Weak Crossing SIV test (#190355)
The ConstantRange intersection check can now handle cases where the
condition of this branch is satisfied. The check is performed before
entering this function, so this part is no longer necessary.
2026-04-03 16:30:42 +00:00
Jan Svoboda
7f9e4fe708
[clang] Extract in-memory module cache writes from ASTWriter (#190062)
This PR extracts the write to the in-memory module cache from within
`ASTWriter` into `CompilerInstance.` This brings it closer to other
module cache manipulations, making the ordering much more clear and
explicit.
2026-04-03 09:18:33 -07:00
Artemiy
dc83ad2b37
[CIR] Fix incorrect CIR_GlobalOp.global_visibility assembly format (#189673)
Closes #189666 .

Fix incorrect printing and parsing of `cir.global` if
`global_visibility` attribute is present. Incorrect assembly format
```
(`` $global_visibility^)?
```

Resulted in keyword sticking to previous word and producing incorrect
cir like this:
```
cir.globalhidden external dso_local @hidden_var = #cir.int<10> : !s32i {alignment = 4 : i64} loc(#loc22)
cir.global "private"hidden internal dso_local @hidden_static_var = #cir.int<10> : !s32i {alignment = 4 : i64} loc(#loc24)
```

Using custom parser/printer that is used in `cir.func` parser fixes this
issue and makes printed/parsed attribute for functions and global values
consistent.

Also added tests for both global values and functions.
2026-04-03 09:17:44 -07:00
Jonas Devlieghere
fd68fa98fc
[lldb] Remove unnecessary calls to ConstString::AsCString (NFC) (#190298)
Replace calls to `ConstString::AsCString` with
`ConstString::GetString(Ref)` where appropriate.

Assisted-by: Claude Code
2026-04-03 16:11:23 +00:00
Florian Hahn
7edf8a7b51
[SCEV] Replace some hasFlags calls with hasNo(Un)SignedWrap (NFC). (#190352)
This is slightly more compact and reduces diff when switching to enum
class (https://github.com/llvm/llvm-project/pull/190199).

PR: https://github.com/llvm/llvm-project/pull/190352
2026-04-03 16:09:40 +00:00
Joseph Huber
d8ba56ce3f
[compiler-rt] Split the GPU.cmake cache file to AMDGPU and NVPTX (#190349)
Summary:
These will have different functionality going forward. They should be
split so we can more easily support things only feasible in AMDGPU.
2026-04-03 10:44:04 -05:00
Andy Kaylor
641276751d
[CIR] Fix mixing of catch-all and type-specific catch handlers (#190285)
If a try block has a catch-all handler and one or more type-specific
catch handlers, we were failing to generate the null type specifier when
lowering from CIR to LLVM IR. This change fixes that problem.

Assisted-by: Cursor / claude-4.6-opus-high
2026-04-03 08:38:55 -07:00
Andy Kaylor
5b56352757
[CIR] Implement cleanups for temporaries with automatic duration (#189754)
This implements handling for cleanup of temporary variables with
automatic storage duration. This is a simplified implementation that
doesn't yet handle the possibility of exceptions being thrown within
this cleanup scope or the cleanup scope being inside a conditional
operation. Support for those cases will be added later.
2026-04-03 08:38:06 -07:00
Sander de Smalen
62bbe3fffc
Fix buildbot failure by explicitly disabling partial reductions in TTI. (#190165)
Partial reductions were previously disabled by default, but by
implementing a generic cost-model in BasicTTIImpl (#189905) this now
accidentally enables the use of those when vectorising loops for targets
that may not support this yet.
2026-04-03 16:33:39 +01:00
Joseph Huber
1484e0f16a
[libc] Use CMAKE_CROSSCOMPILING_EMULATOR instead searching for `llvm-gpu-loader' (#189417)
Summary:
We already handle this with other targets, we should be able to unify
the handling here.
2026-04-03 09:58:04 -05:00
Erich Keane
11d65dc8c2
Revert "[CIR][NFC] Add NYI for OMPSplitDirective stmt" (#190346)
Reverts llvm/llvm-project#190329

The patch this depends on got reverted.
2026-04-03 14:40:42 +00:00
Craig Topper
5d08beaec8
[TargetLowering] Remove NeedToApplyOffset from prepareSREMEqFold. NFC (#190256)
For a given element, I believe A is only 0 when the divisor is INT_MIN.
The only way for NeedToApplyOffset to be false after processing all
elements, is for all divisors to be INT_MIN. If all divisors are
INT_MIN, then all divisors are a power of 2 and we wouldn't do the
transform.
2026-04-03 07:32:13 -07:00
Matt Arsenault
34ec1870ae
clang/AMDGPU: Refactor triple adjustments (#190343)
Factor this similar to the ARM case for future
expansion. The difference being -mcpu is treated as
an alias for -mcpu instead of something separately
useful.

I don't understand this mutation of the triple into
spirv64. The only test where this appears to matter
does not use -mcpu. Previously this would only match
for -mcpu, but this would change the behavior to prefer
-march before falling back to -mcpu.
2026-04-03 16:17:34 +02:00
Mehdi Amini
8c81064169
[MLIR][Arith] Fix index_cast/index_castui chain folding to check intermediate width (#189042)
The patterns `IndexCastOfIndexCast` and `IndexCastUIOfIndexCastUI` in
ArithCanonicalization.td incorrectly eliminated a pair of index casts
whenever the outer result type equalled the original source type,
without verifying that the intermediate cast was lossless.

For example, the following was wrong folded to `%arg0`:
  %0 = index_castui %arg0 : i64 to index
  %1 = index_castui %0    : index to i8    ← truncates to 8 bits
  %2 = index_castui %1    : i8 to index    ← incorrectly removed

The pattern matched `%1`/`%2` because `i8.to(index)` has the same result
type as `i64.to(index)`, even though the i8 intermediate silently drops
56 bits. The same bug existed for the signed `index_cast` variant.

Fix: move the optimization into the `fold` methods of `IndexCastOp` and
`IndexCastUIOp` with an explicit check that the intermediate type is at
least as wide as the source type (using
`IndexType::kInternalStorageBitWidth` as the representative width for
`index`). Only then is the round-trip guaranteed lossless and the chain
can be collapsed.

Fixes #90238
Fixes #90296


Assisted-by: Claude Code
2026-04-03 16:05:08 +02:00
Willem Kaufmann
e1f6dc4b23
[clang-tidy] Add AllowExplicitObjectParameters option to avoid-capturing-lambda-coroutines (#182916)
Add an off-by-default `AllowExplicitObjectParameters` option to the
existing `cppcoreguidelines-avoid-capturing-lambda-coroutines` check.

When enabled, lambda coroutines that use C++23 "deducing this" (explicit
object parameter) are not flagged, since captures are moved into the
coroutine frame ([1], [2], [3]). In C++23 mode, the check also provides
fix-it hints to add `this auto` as the first parameter for lambdas that
don't use it.

The option is off by default to match the current C++ Core Guidelines,
which do not yet recognize explicit object parameters as a solution
([4]). Once the guidelines adopt the proposal, the default can be
flipped.

[1]:
https://github.com/scylladb/seastar/blob/master/doc/lambda-coroutine-fiasco.md#solution-c23-and-up

[2]: https://www.scs.stanford.edu/~dm/blog/vexing-capture.html

[3]: https://lists.isocpp.org/std-proposals/2020/05/1391.php

[4]:
https://github.com/isocpp/CppCoreGuidelines/pull/2289#issuecomment-3756500251
2026-04-03 16:56:08 +03:00
Yuta Saito
fd65b3ef77
[GlobalISel] Fix UMR in SwiftErrorValueTracking (#190273)
Fix issue reported on
https://github.com/llvm/llvm-project/pull/188296#issuecomment-4179103756

`SwiftErrorValueTracking` holds per-function state used by
`IRTranslator`.

On targets where `TargetLowering::supportSwiftError()` is false, (e.g.
wasm) `SwiftErrorValueTracking::setFunction()` exits early.
Historically, that early return happened before clearing per-function
containers, and pointer members (including `SwiftErrorArg`) had no
in-class initialization.

The bad case is a function with a swifterror argument on such a target:
`IRTranslator` uses `SwiftError.getFunctionArg()` without checking
`supportSwiftError()` and this could read an uninitialized
`SwiftErrorArg` value. (SelectionDAG gates the `getFunctionArg` usages
behind `supportSwiftError()`, so it's specific to GlobalISel)

29391328ab66 added [a first test
case](llvm/test/CodeGen/WebAssembly/GlobalISel/irtranslator/args-swiftcc.ll)
that satisfies:
- the target is `supportSwiftError` = false
- use swiftcc
- use GlobalISel

and it made the issue observable with sanitizer builds. This commit
fixes the per-function container reinitialization and defensively add
explicit pointer member initializations.
2026-04-03 14:33:35 +01:00
Ilia Kuklin
d8d2e3358c
[lldb] Make command-dil-diagnostics.test UNSUPPORTED on Windows (#190341)
The test from #187680 passes on some Windows buildbots, but fails on
others.
2026-04-03 18:27:57 +05:00
Simon Pilgrim
5674755cb6
[DAG] visitMUL - cleanup pattern matchers to use m_Shl and (commutative) m_Mul directly (#190339)
Based on feedback on #190215
2026-04-03 13:21:51 +00:00
Florian Hahn
c963092b0c
[VPlan] Mark VPCanonicalIVPHI as not reading memory (NFCI). (#190338)
The canonical IV does not access any memory. Mark accordingly. This
should be NFC end-to-end.

PR: https://github.com/llvm/llvm-project/pull/190338
2026-04-03 13:12:20 +00:00
Erich Keane
0a3fdd30e5
[CIR] Handle vtable-lowering-with-incomplete types (#190216)
The NYI diagnostic in getFunctionTypeForVTable showed up a few times in
testing, so this patch is attempting to fix that up.

The reproducer here is a function type for a vtable that has an
incomplete type in it(return or parameter). Classic codegen chooses to
represent this as an opaque type.

This patch instead removes the special v-table handling here, so that we
can instead just represent the types as incomplete record types.

At the moment, this patch ends up lowering incomplete types as 'empty'
types in LLVM-IR, which we may find we need to modify in the future,
however at the moment, it seems to work.

This patch ALSO changes the definition of RecordType::isSized to only be
true for complete types, which prevents a number of other things from
attempting to add attributes/check the size of the type/etc, but those
are irrelevant for the purposes of vtable emission.
2026-04-03 05:59:46 -07:00
Erich Keane
2c734b3951
[CIR] Implement top level 'ExportDecl' emission (#190286)
This is a pretty simple one, its just a type of decl-context. The actual
exporty-ness is handled on a per-declaration basis.

This patch just makes sure we emit them, as I suspect this will reveal
quite a bit more issues in module code I suspect.
2026-04-03 05:59:25 -07:00
Amr Hesham
0932472f3b
[CIR][NFC] Add NYI for OMPSplitDirective stmt (#190329)
Fix the warning of missing OMPSplitDirective statement in the emitStmt
switch
2026-04-03 14:45:48 +02:00
alexpaniman
b9924c76da
[clang] Make -dump-tokens option align tokens (#164894)
When using `-Xclang -dump-tokens`, the lexer dump output is currently
difficult to read because the data are misaligned. The existing
implementation simply separates the token name, spelling, flags, and
location using `'\t'`, which results in inconsistent spacing.

For example, the current output looks like this on provided in this
patch example **(BEFORE THIS PR)**:

<img width="2936" height="632" alt="image"
src="https://github.com/user-attachments/assets/ad893958-6d57-4a76-8838-7fc56e37e6a7"
/>

# Changes

This small PR improves the readability of the token dump by:

+ Adding padding after the token name and after the spelling (the
padding amount was chosen empirically to produce good average
alignment).
+ Swapping the order of location and flags (since flags can take up a
lot of space and disrupt alignment).

The result is a more readable output **(AFTER THIS PR)**:

<img width="1470" height="315" alt="image"
src="https://github.com/user-attachments/assets/c24f24e5-a431-42cc-b5b6-232bac5c635e"
/>
2026-04-03 08:33:36 -04:00
Lakreite
a44c15874d
[AMDGPU][CodeGen] Implement SimplifyDemandedBitsForTargetNode for readfirstlane. (#190009)
Propagate demanded bits through readfirstlane intrinsic in
AMDGPUISelLowering with SimplifyDemandedBitsForTargetNode
implementation.

This allows upstream zero/sign extensions to be eliminated when only a
subset of bits is used after the intrinsic.

Partially addresses #128390.
2026-04-03 14:30:47 +02:00
theRonShark
00aede8f19
Revert "[Clang][OpenMP] Implement Loop splitting #pragma omp split directive " (#190335)
Reverts llvm/llvm-project#183261

15 new lit tests failing in openmp
2026-04-03 12:27:07 +00:00
Simon Pilgrim
15ed4f6c49
[DAG] isKnownToBeAPowerOfTwo - add missing DemandedElts handling to ISD::TRUNCATE and hidden m_Neg pattern (#190190)
Use MaskedVectorIsZero to match X & -X pattern when only DemandedElts
match the negation pattern

Fixes #181654 (properly)
2026-04-03 12:03:33 +00:00
Sergei Barannikov
f1d167123c
[lldb] Return 0 instead of false from a function returning size_t (NFC) (#190334) 2026-04-03 11:32:37 +00:00
Ilia Kuklin
e24936b7ad
[lldb] Fix DIL error diagnostics output (#187680)
* Correctly return the result when used from the console, so that
`DiagnosticsRendering` could use it to output the error.
* Add location pointer to `DILDiagnosticError` internal formatting to
show diagnostics when called from the API.
2026-04-03 16:29:33 +05:00
Arseniy Obolenskiy
03b9c7278e
[SPIR-V] Emit builtin variable OpVariable into entry block (#189958) 2026-04-03 13:18:48 +02:00
Mehdi Amini
c2ec012098
[mlir][linalg] Fix crash in tile_reduction when output map has constant exprs (#189166)
`generateInitialTensorForPartialReduction` and the `getInitSliceInfo*`
helpers unconditionally cast every result expression of the partial
result AffineMap to `AffineDimExpr`. When the original output indexing
map contains a constant (e.g. `affine_map<(d0,d1,d2)->(d0,0,d2)>`), the
constant expression propagates into the partial map and the cast
triggers an assertion.


Fixes #173025

Assisted-by: Claude Code
2026-04-03 11:09:26 +00:00
Matt Arsenault
273e8d85fe
DiagnosticInfo: Fix missing LLVM_LIFETIME_BOUND on Twine arguments (#190331)
Fix use after free errors in DiagnosticInfoResourceLimit uses.
2026-04-03 11:08:00 +00:00
Mehdi Amini
73bcfb6824
[mlir][Affine] Fix LICM incorrectly hoisting stores from zero-trip-count loops (#189165)
The affine-loop-invariant-code-motion pass was hoisting side-effectful
operations (e.g. affine.store) out of loops whose trip count is
statically known to be zero. This caused stores to execute
unconditionally even though the loop body should never run, producing
incorrect results.

The fix skips hoisting of non-memory-effect-free ops when
getConstantTripCount returns 0. Pure/side-effect-free ops are still
eligible for hoisting because they cannot change observable program
state.

Fixes #128273

Assisted-by: Claude Code
2026-04-03 13:07:26 +02:00
Ryotaro Kasuga
9e516f5c58
[MachinePipeliner] Remove isLoopCarriedDep and use DDG (#174394)
This patch completely removes `isLoopCarriedDep`, which was used
previously to identify loop-carried dependencies in the DAG. Now that we
have the DDG representation, this special handling is no longer
necessary. Simply replacing its usage with the DDG causes several tests
to fail, since cycle detection takes some of the validation-only edges
in the DDG into account. To address this, this patch introduces extra
edges in the DDG, which are used only for cycle detection and not for
other parts of the pass (e.g., scheduling). The extra edges are
determined to preserve the existing behavior of the pass as closely as
possible, which makes the predicates for adding them somewhat complex.

Split off from #135148, and the final patch in the series for #135148
2026-04-03 10:36:34 +00:00
Robert Imschweiler
a2d3783b45
[offload][libc] Adapt test to changes in #190239 (#190330) 2026-04-03 12:03:28 +02:00
Mehdi Amini
ff86be21de
[MLIR][MemRef] Fix AllocOp/AllocaOp flattening domination violation (#188980)
The generic MemRefRewritePattern handles AllocOp/AllocaOp by calling
getFlattenMemrefAndOffset with the op's own result as the source memref.
This inserts ExtractStridedMetadataOp and ReinterpretCastOp that consume
op.result before the alloc op itself in the block. After
replaceOpWithNewOp, op.result is RAUW'd to the new ReinterpretCastOp
result, leaving those earlier ops with forward references — a domination
violation caught by MLIR_ENABLE_EXPENSIVE_PATTERN_API_CHECKS.

Replace the AllocOp/AllocaOp cases in MemRefRewritePattern with a
dedicated AllocLikeFlattenPattern that never touches op.result until the
final replaceOpWithNewOp:
- sizes come from op.getMixedSizes() (operands, not the result)
- strides come from getStridesAndOffset on the MemRefType
- the flat allocation size is computed via
getLinearizedMemRefOffsetAndSize plus the static base offset so the
buffer covers [0, offset+extent)
- castAllocResult is simplified to take the pre-computed sizes and
strides rather than inserting an ExtractStridedMetadataOp on the
original op
- non-zero static base offsets are now correctly preserved in the
reinterpret_cast (the old code hardcoded offset=0, which was a verifier
error for layouts with offset \!= 0)
- dynamic offsets or strides bail out via notifyMatchFailure

Also remove the now-dead AllocOp/AllocaOp branches from replaceOp() and
the constexpr specialisation in getIndices().

Assisted-by: Claude Code
2026-04-03 11:21:00 +02:00
Harald van Dijk
7c1d91c435
[BOLT] Move extern "C" out of unnamed namespace (#190282)
GCC 15 changes how it interprets extern "C" in unnamed namespaces and
gives the variable internal linkage.
2026-04-03 09:51:55 +01:00
Mehdi Amini
d725513e7d
[MLIR][Affine] Fix null operands in simplifyConstrainedMinMaxOp (#189246)
`mlir::affine::simplifyConstrainedMinMaxOp` called
`canonicalizeMapAndOperands` with `newOperands` that could contain null
`Value()`s. These nulls came from
`unpackOptionalValues(constraints.getMaybeValues(), newOperands)` where
internal constraint variables added by `appendDimVar` (for `dimOp`,
`dimOpBound`, and `resultDimStart*`) have no associated SSA values.

Passing null Values to `canonicalizeMapAndOperands` risks undefined
behavior:
- `seenDims.find(null_value)` in the DenseMap causes all null operands
to collide at the same key, producing incorrect dim remapping.
- Any null operand that remains referenced in the result map would
propagate as a null Value into `AffineValueMap`, crashing callers that
try to use those operands to create ops.

Fix: Before calling `canonicalizeMapAndOperands`, filter null operands
from `newOperands` by replacing their dim/symbol positions in `newMap`
with constant 0 (safe because internal constraint dims should not appear
in the bound map expression) and compacting `newOperands` to contain
only non-null Values.

Fixes #127436

Assisted-by: Claude Code
2026-04-03 10:17:50 +02:00
Zhewen Yu
a7bf24919f
[mlir][IntRangeAnalysis] Fix assertion in inferAffineExpr for mod with range crossing modulus boundary (#188842)
The "small range with constant divisor" optimization in
`inferAffineExpr` for `AffineExprKind::Mod` assumed that if the dividend
range span (`lhsMax - lhsMin`) is less than the divisor, then the mod
results form a contiguous range. This is not always true, as the range
can straddle a modulus boundary.

For example, `[14, 17] mod 8`:
- Span is 3 < 8, so the old condition passed
- But `14%8=6` and `17%8=1` (wraps at 16)
- `umin=6, umax=1` → assertion `umin.ule(umax)` fails

The fix adds a same-quotient check (`lhsMin/rhs == lhsMax/rhs`) to
ensure both endpoints fall within the same modular period. When they
don't, we fall back to the conservative `[0, divisor-1]` range.

Assisted-by: Cursor (Claude)

Signed-off-by: Yu-Zhewen <zhewenyu@amd.com>
2026-04-03 10:15:52 +02:00
Donát Nagy
c80443cd37
[NFC][analyzer] Eliminate SwitchNodeBuilder (#188096)
This commit removes the class `SwitchNodeBuilder` because it just
obscured the logic of switch handling by hiding some parts of it in
another source file.
2026-04-03 09:46:06 +02:00
David Green
e46c5a831e
[AArch64] Regenerate arm64-stur.ll. NFC (#190317) 2026-04-03 08:27:29 +01:00
Michael Buch
f91124a55b
[lldb][Module] Only call LoadScriptingResourceInTarget via ModuleList (#190136)
This patch is motivated by
https://github.com/llvm/llvm-project/pull/189943, where we would like to
print the "these module scripts weren't loaded" warning for *all*
modules batched together. I.e., we want to print the warning *after* all
the script loading attempts, not from within each attempt.

To do so we want to hoist the `ReportWarning` calls in
`Module::LoadScriptingResourceInTarget` out into the callsites. But if
we do that, the callers have to remember to print the warnings. To avoid
this, we redirect all callsites to use
`ModuleList::LoadScriptingResourceInTarget`, which will be responsible
for printing the warnings.

To avoid future accidental uses of
`Module::LoadScriptingResourceInTarget` I moved the API into
`ModuleList` and made it `private`.
2026-04-03 07:03:11 +00:00
lonely eagle
8db1f6492a
[mlir][reducer] Remove the restriction that OptReductionPass must be a ModuleOp (#189038)
This PR aims to make the pass more generic by removing the ModuleOp
restriction. This PR reimplements the logic using a standalone
PassManager. Additionally, the isInteresting method has been updated to
accept Operation* for better flexibility. Finally, a dedicated test
directory has been added to improve the organization of OptReductionPass
tests.
2026-04-03 14:49:01 +08:00
michaelselehov
df48719df3
[AMDGPU] Add !noalias metadata to mem-accessing calls w/o pointer args (#188949)
addAliasScopeMetadata in AMDGPULowerKernelArguments skips instructions
with empty PtrArgs, including memory-accessing calls that have no
pointer arguments (e.g. builtins like threadIdx()). Because these calls
never receive !noalias metadata, ScopedNoAliasAA cannot prove they don't
alias noalias kernel arguments. MemorySSA then conservatively reports
them as clobbers, which prevents AMDGPUAnnotateUniformValues from
marking loads as noclobber, blocking scalarization (s_load) and forcing
expensive vector loads (global_load) instead.

Fix by adding all noalias kernel argument scopes to !noalias metadata
for memory-accessing instructions with no pointer arguments. Since such
instructions cannot access memory through any kernel pointer argument,
all noalias scopes are safe to apply.

This fixes a performance regression in rocFFT introduced by bd9668df0f00
("[AMDGPU] Propagate alias information in AMDGPULowerKernelArguments").

Assisted-by: Claude Opus
2026-04-03 08:41:05 +02:00
Ramkumar Ramachandra
e09d1e3ff1
[VPlan] Use not_equal_to to improve code (NFC) (#190262) 2026-04-03 07:32:34 +01:00
Paul Kirth
a52a504e69
[clang-doc] Prepare Info types for Arena allocation (#190046)
To allocate Info structures directly in an Arena, they cannot have
members with nontrivial destructors, or we will leak memory. Before we
migrate them, we can replace growable vector types with intrusive lists.

This introduces some slight overhead as these types now have new pointer
members for use in ilists in later patches.

| Metric | Baseline | Prev | This | Culm% | Seq% |
| :--- | :--- | :--- | :--- | :--- | :--- |
| Time | 920.5s | 1005.7s | 1010.5s | +9.8% | +0.5% |
| Memory | 86.0G | 42.1G | 42.9G | -50.2% | +1.8% |

| Benchmark | Baseline | Prev | This | Culm% | Seq% |
| :--- | :--- | :--- | :--- | :--- | :--- |
| BM_BitcodeReader_Scale/10 | 67.9us | 68.6us | 69.2us | +1.9% | +0.9% |
| BM_BitcodeReader_Scale/10000 | 70.5ms | 21.3ms | 21.9ms | -68.9% |
+2.8% |
| BM_BitcodeReader_Scale/4096 | 23.2ms | 4.6ms | 4.6ms | -80.0% | +0.8%
|
| BM_BitcodeReader_Scale/512 | 509.4us | 546.3us | 541.8us | +6.4% |
-0.8% |
| BM_BitcodeReader_Scale/64 | 114.8us | 117.9us | 117.6us | +2.5% |
-0.2% |
| BM_EmitInfoFunction | 1.6us | 1.5us | 1.6us | -1.9% | +3.9% |
| BM_Index_Insertion/10 | 2.3us | 3.9us | 4.0us | +75.3% | +3.0% |
| BM_Index_Insertion/10000 | 3.1ms | 5.3ms | 5.4ms | +72.7% | +2.4% |
| BM_Index_Insertion/4096 | 1.3ms | 2.1ms | 2.1ms | +67.1% | +1.8% |
| BM_Index_Insertion/512 | 153.6us | 253.0us | 259.0us | +68.6% | +2.4%
|
| BM_Index_Insertion/64 | 18.1us | 30.1us | 30.3us | +67.8% | +0.5% |
| BM_JSONGenerator_Scale/10 | 36.8us | 37.0us | 38.2us | +3.6% | +3.2% |
| BM_JSONGenerator_Scale/10000 | 89.6ms | 91.7ms | 90.7ms | +1.2% |
-1.1% |
| BM_JSONGenerator_Scale/4096 | 33.7ms | 35.1ms | 34.7ms | +2.9% | -1.1%
|
| BM_JSONGenerator_Scale/512 | 1.9ms | 1.9ms | 2.0ms | +3.9% | +4.0% |
| BM_JSONGenerator_Scale/64 | 222.4us | 223.3us | 230.1us | +3.5% |
+3.1% |
| BM_Mapper_Scale/10000 | 104.3ms | 105.6ms | 100.9ms | -3.2% | -4.4% |
| BM_Mapper_Scale/4096 | 44.3ms | 44.8ms | 42.8ms | -3.5% | -4.4% |
| BM_Mapper_Scale/512 | 7.6ms | 7.6ms | 7.4ms | -2.6% | -3.2% |
| BM_Mapper_Scale/64 | 3.1ms | 3.0ms | 3.0ms | -2.0% | -1.3% |
| BM_MergeInfos_Scale/10000 | 12.2ms | 1.4ms | 1.6ms | -86.7% | +12.5% |
| BM_MergeInfos_Scale/2 | 1.9us | 1.7us | 1.7us | -10.2% | -1.9% |
| BM_MergeInfos_Scale/4096 | 2.8ms | 487.3us | 503.4us | -81.9% | +3.3%
|
| BM_MergeInfos_Scale/512 | 68.9us | 38.7us | 38.1us | -44.6% | -1.4% |
| BM_MergeInfos_Scale/64 | 10.3us | 6.4us | 6.4us | -37.6% | -0.4% |
| BM_MergeInfos_Scale/8 | 2.8us | 2.2us | 2.2us | -21.7% | -1.5% |
| BM_SerializeFunctionInfo | 25.5us | 25.9us | 26.0us | +1.9% | +0.4% |
2026-04-03 06:02:32 +00:00