573546 Commits

Author SHA1 Message Date
Joseph Huber
d18a784d41
[compiler-rt] Define GPU specific handling of profiling functions (#185763)
Summary:
The changes in https://www.github.com/llvm/llvm-project/pull/185552
allowed us to
start building the standard `libclang_rt.profile.a` for GPU targets.
This PR expands this by adding an optimized GPU routine for counter
increment and removing the special-case handling of these functions in
the OpenMP runtime.

Vast majority of these functions are boilerplate, but we should be able
to do more interesting things with this in the future, like value or
memory profiling.
2026-03-19 10:51:48 -05:00
Joseph Huber
923cc2d43b
[AMDGPU] Fix alias handling in module splitting functionality (#187295)
Summary:
The module splitting used for `-flto-partitions=8` support (which is
passed by default) did not correctly handle aliases. We mainly need to
do two things: keep the aliases in the they are used in and externalize
them. Internalize linkage needs to be handled conservatively.

This is needed because these aliases show up in PGO contexts.

---------

Co-authored-by: Shilei Tian <i@tianshilei.me>
2026-03-19 10:51:39 -05:00
Arseniy Obolenskiy
d8a83a1123
[NFC][SPIR-V] Disable tests failed after spirv-val update (#187028)
Issues:
- https://github.com/llvm/llvm-project/issues/186344
- https://github.com/llvm/llvm-project/issues/186756
2026-03-19 16:38:41 +01:00
Simon Pilgrim
d049eef4b5
[DAG] Use value tracking to detect or_disjoint patterns and add a add_like pattern matcher (#187478)
Extend the generic or_disjoint pattern to call haveNoCommonBitsSet, this
allows us to remove the similar x86 or_is_add pattern, use or_disjoint
directly and merge some add/or_is_add matching patterns to use a
add_like wrapper pattern instead
2026-03-19 15:37:43 +00:00
Jay Foad
4199bb1a81
[AMDGPU] Simplify loop in AMDGPULowerVGPREncoding::handleCoissue. NFC. (#187511) 2026-03-19 15:36:37 +00:00
ioana ghiban
c5c0b8348e
[mlir][memref] Rewrite scalar memref.copy through reinterpret_cast into load/store (#186118)
This change adds a rewrite that simplifies `memref.copy` operations whose
destination is a scalar view produced by `memref.reinterpret_cast`.

The pattern matches cases where a reinterpret cast creates a scalar view
(`sizes = [1, ..., 1]`) into a memref that has a single non-unit dimension. In
this situation the view refers to exactly one element in the base buffer, so
the accessed address depends only on the base pointer and the offset.

The stride information of the view does not affect the accessed element,
because the only valid index into the view is `[0, ..., 0]`.

Therefore the copy can be rewritten into a direct load from the source and a
store into the base memref using the offset from the reinterpret cast.

This makes the `memref.reinterpret_cast` redundant for the copy and simplifies
the IR.

Assisted-by: ChatGPT (refine implementation + tests). I reviewed all code and
tests before submission.

### Example
Before:
```mlir
func.func private @concat() {
  %src = memref.alloc() : memref<1x1xf32>
  %base = memref.alloc() : memref<1x108xf32>

  %view = memref.reinterpret_cast %base
    to offset: [0], sizes: [1, 1], strides: [108, 1]
    : memref<1x108xf32>
      to memref<1x1xf32, strided<[108, 1]>>

  memref.copy %src, %view
    : memref<1x1xf32>
      to memref<1x1xf32, strided<[108, 1]>>
}
```
After:
```mlir
func.func private @concat() {
  %src = memref.alloc() : memref<1x1xf32>
  %base = memref.alloc() : memref<1x108xf32>

  %c0 = arith.constant 0 : index
  %v = memref.load %src[%c0, %c0] : memref<1x1xf32>
  memref.store %v, %base[%c0, %c0] : memref<1x108xf32>
}
```

### Motivation
This rewrite simplifies IR and helps eliminate `memref.reinterpret_cast`
operations in preparation for later lowerings (e.g. EmitC lowering), where
pointer-based access patterns are easier to handle once scalar accesses are
explicit.

### Scope
This rewrite is intentionally narrow:
- It only applies when both source and destination reduce to scalar accesses.
- It does not attempt to rewrite general `memref.copy` operations.
- It does not introduce loops or handle multi-element copies.

The pass currently performs only this transformation, so it is expected to be
used intentionally rather than as part of a broad optimization pipeline.

### Why not use `memref.copy` directly?
`memref.copy` requires source and destination memrefs to have the same shape.
The destination of the copy here is a scalar view derived from a larger memref,
so copying directly into the base memref would violate this requirement.

Instead, the rewrite loads the scalar value from the source and stores it into
the base memref, at the index determined by the reinterpret cast offset.
2026-03-19 15:34:59 +00:00
ambergorzynski
c63ce62f7c
[NFC][AMDGPU] New test for untested case in SILowerI1Copies (#186127)
[This
line](https://github.com/ambergorzynski/llvm-project/blob/main/llvm/lib/Target/AMDGPU/SILowerI1Copies.cpp#L646)
is untested by the existing LLVM test suite (checked using code coverage
and by inserting an `abort`).

We propose a new test that exercises this case. The test is demonstrated
by adding an abort to show that it is the only test that fails (the
abort is removed before merging).
2026-03-19 16:33:15 +01:00
ioana ghiban
2754e35f73
[mlir][EmitC] Support pointer-based memrefs in load/store lowering (#186828)
## Problem  
  
In the MemRef → EmitC conversion, `memref.load` and `memref.store`
assume that the converted memref operand is an `emitc.array`, as defined
by the type conversion in `populateMemRefToEmitCTypeConversion`.
  
However, `memref.alloc` is lowered to a `malloc` call returning
`emitc.ptr`. When such values are used by `memref.load` or
`memref.store`, the conversion framework inserts a bridging
`builtin.unrealized_conversion_cast` from `emitc.ptr` to `emitc.array`.
  
These casts have no EmitC representation and therefore remain in the IR
after conversion, preventing valid C/C++ emission.

## Solution  
  
Extend the `memref.load` and `memref.store` conversions to handle
pointer-backed buffers.
  
If the memref operand is defined by an `UnrealizedConversionCastOp`
whose input is an `emitc.ptr`, the cast is stripped and the underlying
pointer operand is used directly. Since pointer subscripting in EmitC is
one-dimensional, the multi-dimensional memref indices are converted to a
row-major linear index (matching the default memref layout) using the
original `MemRefType` shape before emitting `emitc.subscript`.
  
The existing array-based lowering path remains unchanged.  
  
This patch intentionally does ***not*** modify the MemRef → EmitC type
conversion rule (`memref → emitc.array`). Instead, the mismatch
introduced by `memref.alloc` returning a pointer is handled locally in
the `LoadOp` and `StoreOp` conversions.

## Example 1: Single-dimensional store  
  
### Input  
```mlir
func.func @alloc_store(%arg0: i32, %i: index) {
  %alloc = memref.alloc() : memref<999xi32>
  memref.store %arg0, %alloc[%i] : memref<999xi32>
  return
}
```
### Current lowering  
```mlir
// AllocOp conversion unchanged -> excluded for brevity
%5 = builtin.unrealized_conversion_cast %4 : !emitc.ptr<i32> to !emitc.array<999xi32>
%6 = subscript %5[%arg1]
assign %arg0 to %6 : <i32>
```
  
The `unrealized_conversion_cast` remains in the IR.  
### Lowering after this patch  
```mlir
%5 = subscript %4[%arg1] : (!emitc.ptr<i32>, !emitc.size_t) -> !emitc.lvalue<i32>
assign %arg0 : i32 to %5 : <i32>
```

The cast is eliminated and pointer subscripting is used directly.  
  
## Example 2: Multi-dimensional store  
  
### Input  
```mlir
func.func @memref_alloc_store(%v : f32, %i : index, %j : index) {
  %alloc = memref.alloc() : memref<4x8xf32>
  memref.store %v, %alloc[%i, %j] : memref<4x8xf32>
  return
}
```
### Current lowering  
```mlir
// AllocOp conversion unchanged -> excluded for brevity
%5 = builtin.unrealized_conversion_cast %4 : !emitc.ptr<f32> to !emitc.array<4x8xf32>
%6 = subscript %5[%arg1, %arg2] : (!emitc.array<4x8xf32>, !emitc.size_t, !emitc.size_t) -> !emitc.lvalue<f32>
assign %arg0 : f32 to %6 : <f32>
```
### Lowering after this patch  
```mlir
%5 = "emitc.constant"() <{value = 8 : index}> : () -> !emitc.size_t
%6 = mul %arg1, %5 : (!emitc.size_t, !emitc.size_t) -> !emitc.size_t
%7 = add %6, %arg2 : (!emitc.size_t, !emitc.size_t) -> !emitc.size_t
%8 = subscript %4[%7] : (!emitc.ptr<f32>, !emitc.size_t) -> !emitc.lvalue<f32>
assign %arg0 : f32 to %8 : <f32>
```

The multi-dimensional indices are converted into a linear row-major
index before pointer subscripting.


Assisted-by: ChatGPT (refine implementation + tests). I reviewed all
code and tests before submission.
2026-03-19 15:31:00 +00:00
Igor Wodiany
201d3547cc
[AMDGPU] Clean up LowerFP_TO_INT_SAT in AMDGPUTargetLowering (#187486)
This addresses the rest of post-commit comments from #174726.
2026-03-19 15:27:24 +00:00
Louis Dionne
e1aef9e227
[libc++] Fix missing availability check for visionOS in apple_availability.h (#187015)
Without this, we were assuming that __ulock was unavailable on visionOS
and falling back to the manual implementation, when in reality we can
always rely on the existence of ulock.

Fixes #186467
2026-03-19 11:14:12 -04:00
Nikita Popov
70bb9e2452
[CycleInfo] Index using block numbers instead of pointers (#187500)
Replace the DenseMap from block pointer to cycle with a vector indexed
by block number, which makes the lookup more efficient.
2026-03-19 16:12:21 +01:00
Ryotaro Kasuga
5ae5f9df42
[DA] Check nsw flags for addrecs in the Exact SIV test (#186387)
This patch adds a check to ensure that the addrecs have nsw flags at the
beginning of the Exact SIV test. If either of them doesn't have, the
analysis bails out. This check is necessary because the subsequent
process in the Exact SIV test assumes that they don't wrap.
2026-03-19 15:07:08 +00:00
Leonard Grey
bc2a8ef6f5
[lldb][NativePDB] Remove cantFail uses (1 out of ?) (#187158)
This is a follow-up to
https://github.com/swiftlang/llvm-project/pull/12317#discussion_r2850297229

Per that discussion, given that deserializers *can* fail given a corrupt
PDB, it's preferable to handle the error instead of crashing.

This specific change is limited to "easy" changes (read: I have high
confidence in their correctness). The ideal end state is funneling all
errors to a few central places in `SymbolFileNativePDB`.
2026-03-19 11:03:58 -04:00
Jianhui Li
989ea0e2d7
[MLIR][XeGPU] Lowering 2-Dimensional Reductions of N-D Tensors into Chained 1-D Reductions (#186034)
This PR relaxes the 2d reduction lowering in the peephole optimization
pass to allow source tensor to have n-d shape.
It also fixes a minor bug of accumulator lowering in the current
implementation.
2026-03-19 07:53:46 -07:00
Nikita Popov
8ca7a336fb [SCEV] Generate test checks (NFC) 2026-03-19 15:47:21 +01:00
Vladislav Dzhidzhoev
cf92512e09
[DebugInfo] Add Verifier check for local imports in CU's imports field (#187118)
Since https://reviews.llvm.org/D144004, DwarfDebug asserts if
function-local imported entities are present in the imports field of
DICompileUnit.
This patch adds a Verifier check to detect such invalid IR earlier.

Incorrect occurrences of imported entities in DICompileUnit's imports
field in llvm/test/Bitcode/DIImportedEntity_elements.ll,
llvm/test/Bitcode/DIModule-fortran-external-module.ll are fixed.

This change is extracted from https://reviews.llvm.org/D144008.
2026-03-19 15:44:03 +01:00
Nikita Popov
807377492e [MemorySSA] Fix EXPENSIVE_CHECKS build 2026-03-19 15:40:04 +01:00
Florian Hahn
cdaf29f84d
Revert "[LV] Simplify and unify resume value handling for epilogue vec." (#187504)
Reverts llvm/llvm-project#185969

This is suspected to cause a miscompile in 549.fotonik3d_r from SPEC 2017 FP
2026-03-19 14:38:37 +00:00
Aviral Goel
b55f6dbb35
[clang][ssaf] Improve layout of clang-ssaf-format --list by adding a separator between name and description 2026-03-19 07:36:29 -07:00
Haohai Wen
153c230446
[PDB] Fix and simplify module index lookup (#179869) 2026-03-19 22:35:08 +08:00
Balázs Benics
ef4f87425c
[analyzer] Fix [[clang::suppress]] for friend function templates with namespace-scope forward-declarations (#187043)
When a friend function template is defined inline inside a
[[clang::suppress]]-annotated class but was forward-declared at
namespace scope, the instantiation's lexical DeclContext was the
namespace (from the forward-declaration), not the class.
The lexical parent chain walk in BugSuppression::isSuppressed therefore
never reached the class and suppression did not apply.

Fix by extending preferTemplateDefinitionForTemplateSpecializations to
handle FunctionDecl instances: calling getTemplateInstantiationPattern()
that maps the instantiation back to the primary template FunctionDecl,
whose lexical DC is the class where the friend was defined inline.

So the existing parent-chain walk then finds the suppression attribute.

Assisted-By: claude
2026-03-19 14:32:25 +00:00
Razvan Lupusoru
da92bc06ff
[mlir][acc] Support call target handling for bind(name) (#187390)
The OpenACC `routine` directive may specify a `bind(name)` clause to
associate the routine with a different symbol for device code. This pass
`ACCBindRoutine` finds calls inside offload regions that target such
routines and rewrites the callee to the bound symbol.

---------

Co-authored-by: Delaram Talaashrafi <dtalaashrafi@nvidia.com>
2026-03-19 07:30:28 -07:00
Joseph Huber
44e306ecdb
[Clang] Correctly link and handle PGO options on the GPU (#185761)
Summary:
Currently, the GPU targets ignore the standard profiling arguments. This
PR changes the behavior to use the standard handling, which links the in
the now-present `libclang_rt.profile.a` if the user built with the
compiler-rt support enabled. If it is not present this is a linker error
and we can always suppress with `-Xarch_host` and `-Xarch_device`.
Hopefully this doesn't cause some people pain if they're used to doing
`-fprofile-generate` on a CPU unguarded since it was a stange mix of a
no-op and not a no-op on the GPU until now.
2026-03-19 09:18:10 -05:00
Graham Hunter
b227fab5a6
[NFC][LV] Introduce enums for uncountable exit detail and style (#184808)
Recursively splitting out some work from #183318; this covers
the enums for early exit loop type (none, readonly, readwrite)
and the style used (just readonly and
masked-handle-ee-in-scalar-tail for now) and refactoring for
basic use of those enums.
2026-03-19 14:17:25 +00:00
Pengxiang Huang
bed9fa2de5
[libc][sys/sem] Add sys v sem headers and syscall wrapper implementation (#185914)
Fix #182161
Based on the last PR #182700 implementing sys/ipc.
2026-03-19 10:12:06 -04:00
estewart08
0e7262407c
[offload] - Remove standalone build in favor of 'runtimes' (#170693)
Summary:
Follow up on removal of OPENMP_STANDALONE_BUILD in openmp (#149878).
This
build method is redundant and can be accomplished via runtimes.

Removes support for:
`cmake -S <llvm-project>/offload ...`

 Switches over to:
`make -S <llvm-project>/runtimes -DLLVM_ENABLE_RUNTIMES=openmp;offload
...`

Libomptarget has a dependency on libomp.so and requires the omp cmake
target to exist at build time, which is why both runtimes are listed.

Updates cmake compiler logic in offload/CMakeLists.txt to mirror openmp
changes:
 [openmp] Allow testing OpenMP without a full clang build tree (#182470)

User will still need to have a separate invocation to build openmp
DeviceRTL via:
`-DLLVM_ENABLE_RUNTIMES=openmp`
`-DLLVM_DEFAULT_TARGET_TRIPLE=<amdgcn-amd-amdhsa|nvptx64-nvidia-cuda>`
2026-03-19 09:00:40 -05:00
John Brawn
e8556ff6b6
[NFC] Remove fractional part of costs in maxbandwidth-regpressure.ll (#187498)
This test is failing on the llvm-clang-x-aarch64 buildbot due to what
looks like a difference in rounding behaviour when printing estimated
cost per lane. Solve this by removing the fractional part, which is what
we've done in the past when this has happened (e.g. commit aeb88f677).
2026-03-19 13:50:56 +00:00
Jay Foad
b91c5a7701
[AMDGPU] Test saturated f32 to i8 conversion on vectors (#187487) 2026-03-19 13:48:58 +00:00
mcbarton
068176a503
[Analysis] Remove LLVM_ABI annotations from llvm/lib/Analysis/BranchProbabilityInfo.cpp which cause build errors (#187388)
In llvm/lib/Analysis/BranchProbabilityInfo.cpp several LLVM_ABI
annotations were added which cause build errors, when trying to build
LLVM and Clang as a shared library on windows (see
https://github.com/compiler-research/ci-workflows/actions/runs/22754706570/job/67436382142#step:6:1141
for some of the errors) . With the changes in this PR these build errors
are fixed.

After this patch this is how far you get with the build
https://github.com/compiler-research/ci-workflows/actions/runs/23257495426/job/67635570161#step:6:4601.
These errors were introduced sometime in the last month, but I couldn't
work out how to fix them.
2026-03-19 14:44:10 +01:00
Michael Klemm
e3415da3cd
[Flang][OpenMP] Permit THREADPRIVATE variables in EQUIVALENCE statements (#186696)
The OpenMP API does not allow to have THREADPRIVATE variable appear in
an EQUIVALENCE statement. It has been requested by the community to
extend Flang such that it permits these non-conforming patterns. This PR
changes Flang to inherit the DSA of the base object of the EQUIVALENCE
statement to the equivalenced variables. The orginal error message is
turned into a warning.

This PR contains code from downstream PR
https://github.com/arm/arm-toolchain/pull/755 that @tblah pointed to
during the review.

Fixes https://github.com/llvm/llvm-project/issues/180493

Assisted-by: Claude Code, Opus 4.6
2026-03-19 14:37:41 +01:00
Vimarsh Sathia
a32d2695c3
[bazel] Gate GPU parsers behind llvm_targets (#187213)
Ideally fixes #63135

---------

Signed-off-by: Vimarsh Sathia <vsathia2@illinois.edu>
2026-03-19 06:33:05 -07:00
Alexis Engelke
a3e3fed088
[CodeGen] Declare MachineCycleInfo in headers (#187494)
Transform MachineCycleInfo into a class that can be declared and remove
include from many source files.

Similar to 810ba55de9159932d498e9387d031f362b93fbea.
2026-03-19 13:32:59 +00:00
Jay Foad
2e2bcf7855 [AMDGPU] Remove unused forward declaration 2026-03-19 13:15:18 +00:00
Pengcheng Wang
dddf01cc14
[RISCV] Relax out of range Zibi conditional branches (#186965)
If `.Label` is not within +-4KiB range, we convert

```
beqi/bnei reg, imm, .Label
```

to

```
bnei/beqi reg, imm, 8
j .Label
```

This is similar to what is done for the RISCV conditional branches
and `Xqcibi` conditional branches.

---------

Co-authored-by: Sudharsan Veeravalli <svs@qti.qualcomm.com>
2026-03-19 21:07:53 +08:00
jeanPerier
76f7252571
[FastISel] generate FAKE_USE for llvm.fake.use (#187116)
FastISel was dropping llvm.fake.use because they are not meant to be
generated at O0 with clang.

This patch adds support in FastISel to generate FAKE_USE for llvm.fake.use.
The handling is simpler than in SelectionDagBuilder because no attempt is made to
get rid of useless FAKE_USE (e.g. for constant SSA values) to keep FastISel simple.

The motivation is that flang will generate llvm.fake.use for function arguments under
`-g` (and O0) because Fortran arguments are not copied to the stack (they are
reference like arguments in most cases) and one should be able to access these
variables from the debugger at any point of the function, even after their last use in the
function.
2026-03-19 14:06:26 +01:00
Lucas Colley
d641186cb6
[clang-cl] test that -Xlinker works, update supported options docs (#187395)
closes #119179
2026-03-19 14:03:20 +01:00
Simon Pilgrim
18ed1a9414
[X86] Add bitrevese/bswap i128/i256/i512 test coverage for #187353 (#187492) 2026-03-19 12:53:51 +00:00
Florian Hahn
78a8f00977
Revert "[VPlan] Create header phis once regions have been created (NFC)."
This reverts commit 91b928f919364b29e241821fc639b9ef56dab1a5.

This complicates some analysis that need the happen on the scalar VPlan,
before regions have been created, e.g.
https://github.com/llvm/llvm-project/pull/185323/.
2026-03-19 12:53:12 +00:00
Jaydeep Chauhan
289c588231
[X86] Optimize load-trunc-store for v4i16/v2i32/v2i16 vectors (#186676)
This patch transform 
IR 
```
define void @cast_i16x4_to_u8x4(ptr %a0, ptr %a1) {
  %1 = load <4 x i16>, ptr %a1
  %2 = trunc <4 x i16> %1 to <4 x i8>
  store <4 x i8> %2, ptr %a0
  ret void
}
```
From Assembly
```
cast_i16x4_to_u8x4:                     # @cast_i16x4_to_u8x4
        vmovq   (%rsi), %xmm0                   # xmm0 = mem[0],zero
        vpmovwb %xmm0, %xmm0
        vmovd   %xmm0, (%rdi)
        retq
```
to 
```
cast_i16x4_to_u8x4:                     # @cast_i16x4_to_u8x4
        vpmovzxwd {{.*#+}} xmm0 = mem[0],zero,mem[1],zero,mem[2],zero,mem[3],zero
        vpmovdb %xmm0, (%rdi)
        retq
```		
Also,implemented similar patterns for below vector types

**Patterns supported:**
- **v4i16 -> v4i8 (via v4i32)**
- **v2i32 -> v2i8 (via v2i64)**
- **v2i32 -> v2i16 (via v2i64)**
- **v2i16 -> v2i8 (via v2i64)**

Fixes #83403
2026-03-19 12:47:37 +00:00
Abu
1078a1dabd
Lowering ~x | (x - 1) to ~blsi(x) (#186722)
Alive2 proof: 
https://alive2.llvm.org/ce/z/bK93Cn

I've implemented a fold in `InstCombineAndOrXor.cpp` to canonicalize `~x
| (x - 1)` to `~(x & -x)` which enables the CodeGen to emit the `blsi`
instruction.

I've also added a test in `CodeGen/X86`.

Fixes #184055

---------

Co-authored-by: Tim Gymnich <tim@gymni.ch>
2026-03-19 20:45:56 +08:00
Nikita Popov
49a5192e5d
[CycleInfo] Don't store top-level cycle per block (#187488)
CycleInfo currently has a second map, that stores the top-level cycle
for a block. I don't think storing this per-block makes a lot of sense,
because the top-level cycle is always the same for all blocks in a
cycle.

So instead store it as a member of the cycle.
2026-03-19 12:34:00 +00:00
jeanPerier
7d02ca610b
[mlir][LLVM] add llvm.fake.use to LLVM dialect (#187026)
Add llvm.fake.use to the LLVM dialect intrinsics.
See https://llvm.org/docs/LangRef.html#llvm-fake-use-intrinsic.
2026-03-19 13:33:03 +01:00
Shivam Gupta
796b218edd
[LegalizeTypes] Expand UDIV/UREM by constant via chunk summation (#146238)
This patch improves the lowering of 128-bit unsigned division and
remainder by constants (UDIV/UREM) by avoiding a fallback to libcall
(__udivti3/uremti3) for specific divisors.

When a divisor D satisfies the condition (1 << ChunkWidth) % D == 1, the
128-bit value is split into fixed-width chunks (e.g., 30-bit) and summed
before applying a smaller UDIV/UREM. This transformation is based on the
"remainder by summing digits" trick described in Hacker’s Delight.

This fixes #137514 for some constants.
2026-03-19 17:58:54 +05:30
Alexey Bataev
582fa78753
[SLP]Do not match buildvector node, if current node is part of its combined nodes
If current buildvector node is part of the combined nodes of the
matching candidate node, this matching candidate must be considered as
non-matching to prevent wrong def-use chain

Reviewers: 

Pull Request: https://github.com/llvm/llvm-project/pull/187491
2026-03-19 08:15:32 -04:00
John Brawn
191c84b822
[VPlan] Permit derived IV in isHeaderMask (#187360)
When matching scalar steps of the canonical IV, also match a derived IV
of the canonical IV if the derivation is essentially a no-op. Fixes a
failure in the mve-reg-pressure-spills.ll test when expensive checks are
enabled.
2026-03-19 12:05:07 +00:00
Koakuma
6aeeae676a
[SPARC][Tests] Add lit.local.cfg to SPARC LoopVectorize tests (#187489) 2026-03-19 18:59:15 +07:00
Simon Pilgrim
b029b98797
[X86] Add i128 bit manipulation pattern test coverage (#187480) 2026-03-19 11:53:48 +00:00
Koakuma
23af867e6d
[SPARC] Add TTI implementation for getting register numbers and widths (#180660)
Correctly inform transform passes about our registers; this prevents the
issue with the `find-last` test where the loop vectorizer pass
mistakenly thinks that the backend has vector capabilities and generates
vector types, which causes the backend to crash.

See also: https://github.com/sparclinux/issues/issues/69
2026-03-19 18:37:46 +07:00
Nico Weber
c3e7624ac4
[clang] Add implicit std::align_val_t to std namespace DeclContext for module merging (#187347)
When a virtual destructor is encountered before any module providing
std::align_val_t is loaded, DeclareGlobalNewDelete() implicitly creates
a std::align_val_t EnumDecl. However, this EnumDecl was not added to the
std namespace's DeclContext -- it was only stored in the
Sema::StdAlignValT field.

Later, when a module containing an explicit std::align_val_t definition
is loaded, ASTReaderDecl::findExisting() attempts to find the implicit
decl via DeclContext::noload_lookup() on the std namespace. Since the
implicit EnumDecl was never added to that DeclContext, the lookup fails,
and the two align_val_t declarations are not merged into a single
redeclaration chain. This results in two distinct types both named
std::align_val_t.

The implicitly declared operator delete overloads (also created by
DeclareGlobalNewDelete) use the implicit align_val_t type for their
aligned-deallocation parameter. When module code (e.g. std::allocator::
deallocate) calls __builtin_operator_delete with the module's
align_val_t, overload resolution fails because the two align_val_t types
are not the same, producing:

  error: no matching function for call to 'operator delete'
note: no known conversion from 'std::align_val_t' to 'std::align_val_t'

The fix adds the implicit align_val_t EnumDecl to the std namespace
DeclContext via getOrCreateStdNamespace()->addDecl(AlignValT), so the
module merger can find it via noload_lookup and merge the two
declarations.

This bug was exposed by a libc++ change (2b01e7cf2b70) that removed the
#include <__new/global_new_delete.h> line from allocate.h, which meant
modules no longer had explicit operator delete declarations to paper
over the type mismatch.

Assisted-by: Claude Code
2026-03-19 07:34:12 -04:00
Juan Manuel Martinez Caamaño
f104b7355c
[NFC][SPIRV] Run spirv-val on tests related to SPV_ALTERA_arbitrary_precision_integers (#187464)
https://github.com/KhronosGroup/SPIRV-Tools/pull/6232 landed support for
this extension in `spirv-val`.

This PR updates some relevant tests to run `spirv-val` on their output.
2026-03-19 12:25:45 +01:00