548363 Commits

Author SHA1 Message Date
Michael Buch
c68b4d64dd [lldb][ClangASTImporter][NFC] Create helper for CanImport
Upstreams a `CanImport` helper for `clang::Decl`s.
2025-08-13 10:22:23 +01:00
Michael Buch
89681839e3 [lldb][ClangASTImporter][NFC] Factor out completion logic out of ClangASTImporterDelegate
Upstreams two helpers that make this more readable.
2025-08-13 10:22:23 +01:00
Ahmed Bougacha
8c8f3286a7
[compiler-rt] Don't run arm64e builtins tests on darwin. (#153312)
The compiler-rt build gradually learned to target arm64e. With that, we
build builtins for arm64e, but running their tests usually isn't
possible, because most versions of macOS so far restrict arm64e (on
account of its unstable ABI).

Starting with macOS 26, arm64e executables can be run, because the
aligned linker automatically targets ptrauth ABI version 1. Without
that, (at ABI version 0) these can't be executed.

We can't rely or require new linkers (and we elsewhere explicitly
fallback to ld classic anyway), so in the meantime one way to execute
these would be to explicitly ask for ABI version 1, which we generally
try to avoid, and don't support in our llvm (which unconditionally
targets ABI version 0).

This is also an uncommon situation; sanitizer runtime tests aren't run
on arm64e today, because we haven't listed arm64e as a supported arch
yet.
Everything other than builtins also tests for execution in cmake first;
we should consider that, but it has its own problems.

So we can simply disable arm64e from tests, by filtering it out as a
valid darwin host arch, which accurately reflects reality.

When we try to add arm64e sanitizer runtime build and test support,
we'll want to change that, but that's a bigger problem than builtins.
2025-08-13 10:21:34 +01:00
Adam Siemieniuk
7d1b9cad87
[mlir][amx] Vector to AMX conversion pass (#151121)
Adds a pass for Vector to AMX operation conversion.

Initially, a direct rewrite for vector contraction in packed VNNI layout
is supported. Operations are expected to already be in shapes which are
AMX-compatible for the rewriting to occur.
2025-08-13 11:08:52 +02:00
Nikita Popov
240c454c4d
[CodeGen] Remove default ctors for InputArg and OutputArg (#153205)
These make it easy to forget to initialize some members, like the newly
added OrigTy. Force these to always go through the ctor instead.
2025-08-13 10:51:43 +02:00
David Spickett
b563b274b8
[lldb] Convert registers values into target endian for expressions (#148836)
Relates to https://github.com/llvm/llvm-project/issues/135707

Where it was reported that reading the PC using "register read" had
different results to an expression "$pc".

This was happening because registers are treated in lldb as pure
"values" that don't really have an endian. We have to store them
somewhere on the host of course, so the endian becomes host endian.

When you want to use a register as a value in an expression you're
pretending that it's a variable in memory. In target memory. Therefore
we must convert the register value to that endian before use.

The test I have added is based on the one used for XML register flags.
Where I fake an AArch64 little endian and an s390x big endian target. I
set up the data in such a way the pc value should print the same for
both, either with register read or an expression.

I considered just adding a live process test that checks the two are the
same but with on one doing cross endian testing, I doubt it would have
ever caught this bug.

Simulating this means most of the time, little endian hosts will test
little to little and little to big. In the minority of cases with a big
endian host, they'll check the reverse. Covering all the combinations.
2025-08-13 09:48:29 +01:00
David Spickett
dc41571cd8
[llvm][docs] Update CMake commands for cross compiling Arm builtins (#151544)
This does a few things:
* LLVM_CONFIG_PATH is deprecated, use LLVM_CMAKE_DIR instead.
* Don't use $ before command examples. I would normally, but the key
cmake commands didn't use it so I removed it from all commands.
* Makes the commands shown full commands, so you don't have to piece
them together.
* Uses shell variables to cut down on repetition and make this easier to
port to other targets.
* Adds a few options to disable more compiler-rt things.
* Use the built in cmake options for sysroot and toolchains.
* Include test options in the first cmake command, so you don't have to
re-do the whole thing after you read the testing section.
* Removes the section about using BaremetalARM.cmake.

The closest I got to getting that cache to work was:
```
SYSROOT=/home/david.spickett/arm-gnu-toolchain-14.3.rel1-x86_64-arm-none-eabi/arm-none-eabi/libc
LLVM_TOOLCHAIN=/home/david.spickett/LLVM-20.1.8-Linux-X64/

cmake \
  -G Ninja \
  -DCMAKE_C_COMPILER=${LLVM_TOOLCHAIN}/bin/clang \
  -DBAREMETAL_ARMV6M_SYSROOT=${SYSROOT} \
  -DBAREMETAL_ARMV7M_SYSROOT=${SYSROOT} \
  -DBAREMETAL_ARMV7EM_SYSROOT=${SYSROOT} \
  -DCMAKE_BUILD_TYPE=Release \
  -DLLVM_ENABLE_RUNTIMES="compiler-rt" \
  -C ../llvm-project/clang/cmake/caches/BaremetalARM.cmake \
  -DCOMPILER_RT_BUILD_BUILTINS=ON \
  -DCOMPILER_RT_BUILD_LIBFUZZER=OFF \
  -DCOMPILER_RT_BUILD_MEMPROF=OFF \
  -DCOMPILER_RT_BUILD_PROFILE=OFF \
  -DCOMPILER_RT_BUILD_CTX_PROFILE=OFF \
  -DCOMPILER_RT_BUILD_SANITIZERS=OFF \
  -DCOMPILER_RT_BUILD_XRAY=OFF \
  -DCOMPILER_RT_BUILD_ORC=OFF \
  -DCOMPILER_RT_BUILD_CRT=OFF \
  ../llvm-project/runtimes
```
All this does is build the x86 builtins. I tried forcing the issue with:
```
  -DBUILTIN_SUPPORTED_ARCH="armv7m;armv6m;armv7em" \
```
But again, just x86.

It's probably something deep in compiler-rt failing a compiler check for
the Arm targets. Even if that's the case, fixing that means adding more
options to the cmake command.

I can't find evidence of a full command using this cache file since the
commit that introduced it and that command no longer works.

I think if you ever got this to work again the command would be as long
and complex as the ones already shown in the document.

I would also argue that some of the other caches, for example Fuschia's,
are much better example of multi-target runtimes builds. If what's in
this document isn't enough, folks should be learning from those files
and about the runtimes build overall before attempting anything complex
(though it does not take much to be "complex").
2025-08-13 09:47:43 +01:00
Diana Picus
420a5de1a4
[AMDGPU] Ignore inactive VGPRs in .vgpr_count (#149052)
When using the `amdgcn.init.whole.wave` intrinsic, we add dummy VGPR
arguments with the purpose of preserving their inactive lanes. The
pattern may look something like this:

```
entry:
  call amdgcn.init.whole.wave
  branch to shader or tail

shader:
  $vInactive = IMPLICIT_DEF ; Tells regalloc it's safe to use the active lanes
  actual code...

tail:
  call amdgcn.cs.chain [...], implicit $vInactive
```

We should not report these VGPRs in the `.vgpr_count` metadata. This
patch achieves that goal by ignoring meta instructions and calls. This should
be safe since if those registers are actually used in any other context,
they will be counted there. The same reasoning applies in the general
case, so we don't explicitly check for the existence of `init.whole.wave`.

This is a reworked version of #133242, which was reverted in #144039
and split into smaller bits.
2025-08-13 10:47:00 +02:00
Ryotaro Kasuga
bf6796fa8f
[DA] Extract duplicated logic from exactSIVtest and exactRDIVtest (NFC) (#152712)
This patch refactors `exactSIVtest` and `exactRDIVtest` by consolidating
duplicated logic into a single function. Same as #152688, the main goal
is to improve code maintainability, since extra validation logic (as
written in TODO comments) may be necessary.
2025-08-13 17:45:28 +09:00
Timm Baeder
56131e3959
[clang][bytecode] Diagnose incomplete types more consistently (#153368)
To match the diagnostics of the current interpreter.
2025-08-13 10:40:21 +02:00
Nikolas Klauser
78636be4d6
[libc++] Move more tests into test/extensions (#152975)
This should be the last set of tests moved to `test/extensions` for now.
2025-08-13 10:14:24 +02:00
Nikolas Klauser
3ca414b63a
[libc++] Move some standard tests from test/libcxx (#152982)
This also removes some tests which were redundant, wrong, or never run.
Specifically,

- `libcxx/utilities/meta/stress_tests/*` were never run and are of
questionable usefulness
- `libcxx/utilities/template.bitset/includes.pass.cpp` is completely
redundant and partially incorrect

Also notably,
`libcxx/language.support/support.c.headers/support.c.headers.other/math.lerp.verify.cpp`
has been refactored to only test the standard mandate.
2025-08-13 10:13:46 +02:00
Simon Pilgrim
267f592ca0
[Headers][X86] Allow _mm_cmov_si128/_mm256_cmov_si256 intrinsics to be used in constexpr (#153236) 2025-08-13 08:53:26 +01:00
Benjamin Maxwell
271688b87a
[AArch64][SME] Port all SME routines to RuntimeLibcalls (#152505)
This updates everywhere we emit/check an SME routines to use
RuntimeLibcalls to get the function name and calling convention.

Note: RuntimeLibcallEmitter had some issues with emitting non-unique
variable names for sets of libcalls, so I tweaked the output to avoid
the need for variables.
2025-08-13 08:48:59 +01:00
Mel Chen
b9138bde35
[LV][EVL] More lit tests for interleaved access. nfc (#152959)
Add test cases for reverse interleaved access and interleaved access
with gap.
2025-08-13 15:43:39 +08:00
Jasmine Tang
d32793ca6e
Revert "[WebAssembly] Combine i128 to v16i8 for setcc & expand memcmp for 16 byte loads with simd128" (#153360)
Reverts llvm/llvm-project#149461

The first test w/ memcmp in `test/neon/test_neon_wasm_simd.cpp` in the
Emscripten test suite has failed. This PR applies a revert so I can take
a closer look at it

Test case link:
https://github.com/emscripten-core/emscripten/blob/main/test/neon/test_neon_wasm_simd.cpp

Compile option: `em++ test_neon_wasm_simd.cpp -O2 -mfpu=neon -msimd128
-o something.js`

Original comment report:
https://github.com/llvm/llvm-project/pull/149461#issuecomment-3181652746
2025-08-13 07:41:44 +00:00
Florian Hahn
48bfaa4c06
[VPlan] Replace VPBB for vector.ph during skeleton creation (NFC)
Shift replacement of regular VPBB for vector.ph with the VPIRBB wrapping
the created IR block directly to skeleton creation, to be consistent
with how the scalar preheader is handled.
2025-08-13 08:30:18 +01:00
Aiden Grossman
dfe18b1a0e
[libcxx] Bump clang version to v22 (#153264)
Clang tip of tree is now v22, so bump the versions based on that now
that we have an updated container image.

---------

Co-authored-by: Nikolas Klauser <nikolasklauser@berlin.de>
2025-08-13 09:26:42 +02:00
Abhishek Kaushik
2415e3b3bf
[NFC][MC][GOFF] Use llvm_unreachable for unreachable case (#152930) 2025-08-13 12:56:12 +05:30
Aiden Grossman
7f4d201db4
[libcxx] Bump container image to 77cb098 (#153095)
Switch to the next runner set to evaluate switching the container image
to 77cb098.
2025-08-13 09:24:02 +02:00
Matt Arsenault
db126d8004
CodeGen: Make MachineFunction's subtarget member a reference (#153352) 2025-08-13 16:22:32 +09:00
yanming
02ab6f358c [flang][fir][NFC] unify flang's code style with the rest. 2025-08-13 15:11:06 +08:00
Sergei Barannikov
8f3254aa4a
[TableGen][DecoderEmitter] Returns insn_t / std::vector<Islands> by value (NFC) (#153354)
The containers passed by reference are always empty on entry to the
functions that fill them. Return them by value instead and let the
compiler do the return value optimization.
2025-08-13 07:09:13 +00:00
Valentin Clement (バレンタイン クレメン)
2ae4e95dda
[flang][cuda] Add bind name for __ddiv_XX interfaces (#153271) 2025-08-12 23:30:43 -07:00
Valentin Clement (バレンタイン クレメン)
60170f92a3
[flang][cuda] Add missing interface for __powf (#153294)
`__powf` is defined in the CUDA Fortran programming guide but it's
missing from our cudadevice module. Add the interface and bind name to
`__nv_powf`


https://docs.nvidia.com/hpc-sdk/compilers/cuda-fortran-prog-guide/index.html#fortran-device-modules


https://docs.nvidia.com/cuda/libdevice-users-guide/__nv_powf.html#__nv_powf
2025-08-12 23:08:41 -07:00
Ryotaro Kasuga
bce0f9d2bf
[DA] Extract duplicated logic from gcdMIVtest (NFCI) (#152688)
This patch refactors `gcdMIVtest` by consolidating duplicated logic into
a single function. The main goal of this change is to improve code
maintainability rather than readability, especially since we may need to
revise this logic for correctness (as noted in the added TODO comments).

I hope this patch is NFC, but I've also added several new assertions,
which may cause some previously passing cases to fail.
2025-08-13 15:07:50 +09:00
Valentin Clement (バレンタイン クレメン)
09505b11e5
[flang][cuda] Add missing interface for __cosf (#153306)
`__cosf` is mentioned to be supported here:
https://docs.nvidia.com/hpc-sdk/compilers/cuda-fortran-prog-guide/index.html#fortran-device-modules

Add the missing interface with a bind c name linking it to `__nv_cosf`
2025-08-12 22:48:45 -07:00
Fangrui Song
04eb5e0cd4 test: Add REQUIRES: riscv 2025-08-12 22:39:57 -07:00
Fangrui Song
94655dc8ae [ELF] -r: Synthesize R_RISCV_ALIGN at input section start" (#151639)
Clear `synthesizedAligns` to prevent stray relocations to an unrelated
text section. Enhance the test to check llvm-readelf -r output.

---

Without linker relaxation enabled for a particular relocatable file or
section (e.g., using .option norelax), the assembler will not generate
R_RISCV_ALIGN relocations for alignment directives. This becomes
problematic in a two-stage linking process:

```
ld -r a.o b.o -o ab.o
// b.o is norelax. Its alignment information is lost in ab.o.
ld ab.o -o ab
```

When ab.o is linked into an executable, the preceding relaxed section
(a.o's content) might shrink. Since there's no R_RISCV_ALIGN relocation
in b.o for the linker to act upon, the `.word 0x3a393837` data in b.o
may end up unaligned in the final executable.

To address the issue, this patch inserts NOP bytes and synthesizes an
R_RISCV_ALIGN relocation at the beginning of a text section when the
alignment >= 4.

For simplicity, when RVC is disabled, we synthesize an ALIGN relocation
(addend: 2) for a 4-byte aligned section, allowing the linker to trim
the excess 2 bytes.

See also https://sourceware.org/bugzilla/show_bug.cgi?id=33236
2025-08-12 22:38:17 -07:00
Valentin Clement (バレンタイン クレメン)
136c5586bd
[flang][cuda] Add bind name for __clz interface (#153268) 2025-08-12 22:28:20 -07:00
Fangrui Song
98164d4706 Revert "[ELF] -r: Synthesize R_RISCV_ALIGN at input section start" (#151639)
This reverts commit 6f53f1c8d2bdd13e30da7d1b85ed6a3ae4c4a856.

synthesiedAligns is not cleared, leading to stray relocations for
unrelated sections. Revert for now.
2025-08-12 22:18:15 -07:00
Fangrui Song
856290d1c1 Revert "Add REQUIRES: riscv to test added in 151639 to skip the test when riscv is not built. (#152858)"
This reverts commit d1827f040f6e056e62cf4158bdf90d0acdf3d287.
2025-08-12 22:18:14 -07:00
Matheus Izvekov
73feab502e
[clang] fix getTrivialTemplateArgumentLoc template template argument (#153344)
This fixes a regression reported here
https://github.com/llvm/llvm-project/pull/147835#issuecomment-3181811371,
where getTrivialTemplateArgumentLoc can't see through template name
sugar when producing a trivial TemplateArgumentLoc for template template
arguments.

Since this regression was never released, there are no release notes.
2025-08-13 02:09:08 -03:00
Valentin Clement (バレンタイン クレメン)
587b6ce6b9
[flang][cuda] Add bind name for __mul24 and __umul24 (#153307) 2025-08-12 22:02:11 -07:00
Jin Huang
91de0a2c43
[libc] Refactor libc code to improve readability. (#153308)
The PR is going to improve the readability for the files under
`llvm-project/libc/src/wchar` directory.

---------

Co-authored-by: Jin Huang <jingold@google.com>
2025-08-12 21:41:21 -07:00
Thurston Dang
cf002847a4
Revert "[msan] Improve packed multiply-add instrumentation" (#153343)
Reverts llvm/llvm-project#152941

Buildbot breakage:
https://lab.llvm.org/buildbot/#/builders/66/builds/17843
2025-08-12 21:32:07 -07:00
Longsheng Mou
2edee0bc79
[mlir][gpu] Support outlining nested gpu.launch (#152696)
This PR fixes a crash in `GpuKernelOutliningPass` that occurred when
encountering a symbol that was not a `FlatSymbolRefAttr`, enabling
outlining of nested `gpu.launch` operations. Fixes #149318.
2025-08-13 11:42:52 +08:00
Alexey Samsonov
04081caa09
[libc] Remove LIBC_ERRNO_MODE_SYSTEM mode. (#153077)
Use LIBC_ERRNO_MODE_SYSTEM_INLINE instead as the default for the "public
packaging" (i.e. release mode) of an overlay build. The Bazel build has
already switched to use it by default in
5ccc734fa0355f971f8f515457a0bece33ab6642. This should be a safe change,
as LIBC_ERRNO_MODE_SYSTEM_INLINE works a drop-in (but simpler)
LIBC_ERRNO_MODE_SYSTEM replacement. Remove the associated code paths and
config settings.

Fixes issue #143454.
2025-08-12 19:52:40 -07:00
Shoreshen
db96363c0a
[AMDGPU] Avoid put implicit_def into bundle that break reg's liveness (#142563)
Cause:
1. `implicit_def` inside bundle does not count for define of reg in
machineinst verifier
2. Including `implicit_def` will cause relative reg not define, result
in `Bad machine code: Using an undefined physical register` in the
machineinst verifier

Fixes https://github.com/llvm/llvm-project/issues/139102

---------

Co-authored-by: Matt Arsenault <Matthew.Arsenault@amd.com>
2025-08-13 10:41:44 +08:00
Matt Arsenault
d40d04f9d6
AArch64: Remove int128 compiler-rt calls from arm64ec renames (#153124)
It might have been a bug that these were previously not included,
but they don't appear to have ever been used:
https://godbolt.org/z/zE6zs8xxa

If these really exist, they probably should be included. Removes 4
unused entries from the set of libcall impls.
2025-08-13 11:41:32 +09:00
Luke Lau
9217b6ab2e
[VPlan] Enforce that there is only ever one header mask. NFC (#152489)
We almost only ever have one header mask, except with the data tail
folding style, i.e. with VPInstruction::ActiveLaneMask.

All we need to do is to make sure to erase the old header icmp based
header mask when replacing it.
2025-08-13 02:39:04 +00:00
Maksim Levental
2b842e5600
[mlir][python] fix PyThreadState_GetFrame again (#153333)
add more APIs missing from 3.8 (fix rocm builder)
2025-08-12 21:29:23 -05:00
Thurston Dang
ba603b5e4d
[msan] Improve packed multiply-add instrumentation (#152941)
The current instrumentation has false positives: if there is a single uninitialized bit in any of the operands, the entire output is poisoned. This does not take into account that multiplying an uninitialized value with zero results in an initialized zero value.

This step allows elements that are zero to clear the corresponding shadow during the multiplication step. The horizontal add step and accumulation step (if any) are modeled using bitwise OR.

Future work can apply this improved handler to the AVX512 equivalent intrinsics (x86_avx512_pmaddw_d_512, x86_avx512_pmaddubs_w_512.) and AVX VNNI intrinsics.
2025-08-12 19:13:48 -07:00
Connector Switch
f4dd442395
[flang] Optimize tanpi precision (#153215)
Part of #150452.
2025-08-13 10:07:17 +08:00
Connector Switch
12e0d524bc
[flang] Optimize sinpi precision (#153211)
Part of #150452.
2025-08-13 10:06:29 +08:00
Connector Switch
d9074db137
[flang] Optimize cospi precision (#153208)
Part of #150452.
2025-08-13 10:06:09 +08:00
Connector Switch
4537f0ee61
[flang] Optimize atanpi precision (#153207)
Part of #150452.
2025-08-13 10:05:48 +08:00
Connector Switch
c664ce49e3
[flang] Optimize asinpi precision (#153203)
Part of #150452.
2025-08-13 10:05:25 +08:00
Felipe de Azevedo Piovezan
a203546496 Revert "[lldb] Call FixUpPointer in WritePointerToMemory"
This reverts commit 085a53cb89c4021da2e32d1757a1ee44668e8596.

This patch is hitting a corner case tested by
`TestScriptedProcessEmptyMemoryRegion.py`.
2025-08-12 18:51:00 -07:00
Jonas Devlieghere
84c5b9525e
[lldb] Use numeric_limits for all overflow checks in ObjectFileWasm (#153332)
Use std::numeric_limits<uint32_t>::max() for all overflow checks in
ObjectFileWasm and fix a few locations where I incorrectly used `>=`
instead of `>`.
2025-08-13 01:49:03 +00:00