The compiler-rt build gradually learned to target arm64e. With that, we
build builtins for arm64e, but running their tests usually isn't
possible, because most versions of macOS so far restrict arm64e (on
account of its unstable ABI).
Starting with macOS 26, arm64e executables can be run, because the
aligned linker automatically targets ptrauth ABI version 1. Without
that, (at ABI version 0) these can't be executed.
We can't rely or require new linkers (and we elsewhere explicitly
fallback to ld classic anyway), so in the meantime one way to execute
these would be to explicitly ask for ABI version 1, which we generally
try to avoid, and don't support in our llvm (which unconditionally
targets ABI version 0).
This is also an uncommon situation; sanitizer runtime tests aren't run
on arm64e today, because we haven't listed arm64e as a supported arch
yet.
Everything other than builtins also tests for execution in cmake first;
we should consider that, but it has its own problems.
So we can simply disable arm64e from tests, by filtering it out as a
valid darwin host arch, which accurately reflects reality.
When we try to add arm64e sanitizer runtime build and test support,
we'll want to change that, but that's a bigger problem than builtins.
Adds a pass for Vector to AMX operation conversion.
Initially, a direct rewrite for vector contraction in packed VNNI layout
is supported. Operations are expected to already be in shapes which are
AMX-compatible for the rewriting to occur.
Relates to https://github.com/llvm/llvm-project/issues/135707
Where it was reported that reading the PC using "register read" had
different results to an expression "$pc".
This was happening because registers are treated in lldb as pure
"values" that don't really have an endian. We have to store them
somewhere on the host of course, so the endian becomes host endian.
When you want to use a register as a value in an expression you're
pretending that it's a variable in memory. In target memory. Therefore
we must convert the register value to that endian before use.
The test I have added is based on the one used for XML register flags.
Where I fake an AArch64 little endian and an s390x big endian target. I
set up the data in such a way the pc value should print the same for
both, either with register read or an expression.
I considered just adding a live process test that checks the two are the
same but with on one doing cross endian testing, I doubt it would have
ever caught this bug.
Simulating this means most of the time, little endian hosts will test
little to little and little to big. In the minority of cases with a big
endian host, they'll check the reverse. Covering all the combinations.
This does a few things:
* LLVM_CONFIG_PATH is deprecated, use LLVM_CMAKE_DIR instead.
* Don't use $ before command examples. I would normally, but the key
cmake commands didn't use it so I removed it from all commands.
* Makes the commands shown full commands, so you don't have to piece
them together.
* Uses shell variables to cut down on repetition and make this easier to
port to other targets.
* Adds a few options to disable more compiler-rt things.
* Use the built in cmake options for sysroot and toolchains.
* Include test options in the first cmake command, so you don't have to
re-do the whole thing after you read the testing section.
* Removes the section about using BaremetalARM.cmake.
The closest I got to getting that cache to work was:
```
SYSROOT=/home/david.spickett/arm-gnu-toolchain-14.3.rel1-x86_64-arm-none-eabi/arm-none-eabi/libc
LLVM_TOOLCHAIN=/home/david.spickett/LLVM-20.1.8-Linux-X64/
cmake \
-G Ninja \
-DCMAKE_C_COMPILER=${LLVM_TOOLCHAIN}/bin/clang \
-DBAREMETAL_ARMV6M_SYSROOT=${SYSROOT} \
-DBAREMETAL_ARMV7M_SYSROOT=${SYSROOT} \
-DBAREMETAL_ARMV7EM_SYSROOT=${SYSROOT} \
-DCMAKE_BUILD_TYPE=Release \
-DLLVM_ENABLE_RUNTIMES="compiler-rt" \
-C ../llvm-project/clang/cmake/caches/BaremetalARM.cmake \
-DCOMPILER_RT_BUILD_BUILTINS=ON \
-DCOMPILER_RT_BUILD_LIBFUZZER=OFF \
-DCOMPILER_RT_BUILD_MEMPROF=OFF \
-DCOMPILER_RT_BUILD_PROFILE=OFF \
-DCOMPILER_RT_BUILD_CTX_PROFILE=OFF \
-DCOMPILER_RT_BUILD_SANITIZERS=OFF \
-DCOMPILER_RT_BUILD_XRAY=OFF \
-DCOMPILER_RT_BUILD_ORC=OFF \
-DCOMPILER_RT_BUILD_CRT=OFF \
../llvm-project/runtimes
```
All this does is build the x86 builtins. I tried forcing the issue with:
```
-DBUILTIN_SUPPORTED_ARCH="armv7m;armv6m;armv7em" \
```
But again, just x86.
It's probably something deep in compiler-rt failing a compiler check for
the Arm targets. Even if that's the case, fixing that means adding more
options to the cmake command.
I can't find evidence of a full command using this cache file since the
commit that introduced it and that command no longer works.
I think if you ever got this to work again the command would be as long
and complex as the ones already shown in the document.
I would also argue that some of the other caches, for example Fuschia's,
are much better example of multi-target runtimes builds. If what's in
this document isn't enough, folks should be learning from those files
and about the runtimes build overall before attempting anything complex
(though it does not take much to be "complex").
When using the `amdgcn.init.whole.wave` intrinsic, we add dummy VGPR
arguments with the purpose of preserving their inactive lanes. The
pattern may look something like this:
```
entry:
call amdgcn.init.whole.wave
branch to shader or tail
shader:
$vInactive = IMPLICIT_DEF ; Tells regalloc it's safe to use the active lanes
actual code...
tail:
call amdgcn.cs.chain [...], implicit $vInactive
```
We should not report these VGPRs in the `.vgpr_count` metadata. This
patch achieves that goal by ignoring meta instructions and calls. This should
be safe since if those registers are actually used in any other context,
they will be counted there. The same reasoning applies in the general
case, so we don't explicitly check for the existence of `init.whole.wave`.
This is a reworked version of #133242, which was reverted in #144039
and split into smaller bits.
This patch refactors `exactSIVtest` and `exactRDIVtest` by consolidating
duplicated logic into a single function. Same as #152688, the main goal
is to improve code maintainability, since extra validation logic (as
written in TODO comments) may be necessary.
This also removes some tests which were redundant, wrong, or never run.
Specifically,
- `libcxx/utilities/meta/stress_tests/*` were never run and are of
questionable usefulness
- `libcxx/utilities/template.bitset/includes.pass.cpp` is completely
redundant and partially incorrect
Also notably,
`libcxx/language.support/support.c.headers/support.c.headers.other/math.lerp.verify.cpp`
has been refactored to only test the standard mandate.
This updates everywhere we emit/check an SME routines to use
RuntimeLibcalls to get the function name and calling convention.
Note: RuntimeLibcallEmitter had some issues with emitting non-unique
variable names for sets of libcalls, so I tweaked the output to avoid
the need for variables.
Shift replacement of regular VPBB for vector.ph with the VPIRBB wrapping
the created IR block directly to skeleton creation, to be consistent
with how the scalar preheader is handled.
Clang tip of tree is now v22, so bump the versions based on that now
that we have an updated container image.
---------
Co-authored-by: Nikolas Klauser <nikolasklauser@berlin.de>
The containers passed by reference are always empty on entry to the
functions that fill them. Return them by value instead and let the
compiler do the return value optimization.
This patch refactors `gcdMIVtest` by consolidating duplicated logic into
a single function. The main goal of this change is to improve code
maintainability rather than readability, especially since we may need to
revise this logic for correctness (as noted in the added TODO comments).
I hope this patch is NFC, but I've also added several new assertions,
which may cause some previously passing cases to fail.
Clear `synthesizedAligns` to prevent stray relocations to an unrelated
text section. Enhance the test to check llvm-readelf -r output.
---
Without linker relaxation enabled for a particular relocatable file or
section (e.g., using .option norelax), the assembler will not generate
R_RISCV_ALIGN relocations for alignment directives. This becomes
problematic in a two-stage linking process:
```
ld -r a.o b.o -o ab.o
// b.o is norelax. Its alignment information is lost in ab.o.
ld ab.o -o ab
```
When ab.o is linked into an executable, the preceding relaxed section
(a.o's content) might shrink. Since there's no R_RISCV_ALIGN relocation
in b.o for the linker to act upon, the `.word 0x3a393837` data in b.o
may end up unaligned in the final executable.
To address the issue, this patch inserts NOP bytes and synthesizes an
R_RISCV_ALIGN relocation at the beginning of a text section when the
alignment >= 4.
For simplicity, when RVC is disabled, we synthesize an ALIGN relocation
(addend: 2) for a 4-byte aligned section, allowing the linker to trim
the excess 2 bytes.
See also https://sourceware.org/bugzilla/show_bug.cgi?id=33236
This reverts commit 6f53f1c8d2bdd13e30da7d1b85ed6a3ae4c4a856.
synthesiedAligns is not cleared, leading to stray relocations for
unrelated sections. Revert for now.
This fixes a regression reported here
https://github.com/llvm/llvm-project/pull/147835#issuecomment-3181811371,
where getTrivialTemplateArgumentLoc can't see through template name
sugar when producing a trivial TemplateArgumentLoc for template template
arguments.
Since this regression was never released, there are no release notes.
The PR is going to improve the readability for the files under
`llvm-project/libc/src/wchar` directory.
---------
Co-authored-by: Jin Huang <jingold@google.com>
This PR fixes a crash in `GpuKernelOutliningPass` that occurred when
encountering a symbol that was not a `FlatSymbolRefAttr`, enabling
outlining of nested `gpu.launch` operations. Fixes#149318.
Use LIBC_ERRNO_MODE_SYSTEM_INLINE instead as the default for the "public
packaging" (i.e. release mode) of an overlay build. The Bazel build has
already switched to use it by default in
5ccc734fa0355f971f8f515457a0bece33ab6642. This should be a safe change,
as LIBC_ERRNO_MODE_SYSTEM_INLINE works a drop-in (but simpler)
LIBC_ERRNO_MODE_SYSTEM replacement. Remove the associated code paths and
config settings.
Fixes issue #143454.
Cause:
1. `implicit_def` inside bundle does not count for define of reg in
machineinst verifier
2. Including `implicit_def` will cause relative reg not define, result
in `Bad machine code: Using an undefined physical register` in the
machineinst verifier
Fixes https://github.com/llvm/llvm-project/issues/139102
---------
Co-authored-by: Matt Arsenault <Matthew.Arsenault@amd.com>
It might have been a bug that these were previously not included,
but they don't appear to have ever been used:
https://godbolt.org/z/zE6zs8xxa
If these really exist, they probably should be included. Removes 4
unused entries from the set of libcall impls.
We almost only ever have one header mask, except with the data tail
folding style, i.e. with VPInstruction::ActiveLaneMask.
All we need to do is to make sure to erase the old header icmp based
header mask when replacing it.
The current instrumentation has false positives: if there is a single uninitialized bit in any of the operands, the entire output is poisoned. This does not take into account that multiplying an uninitialized value with zero results in an initialized zero value.
This step allows elements that are zero to clear the corresponding shadow during the multiplication step. The horizontal add step and accumulation step (if any) are modeled using bitwise OR.
Future work can apply this improved handler to the AVX512 equivalent intrinsics (x86_avx512_pmaddw_d_512, x86_avx512_pmaddubs_w_512.) and AVX VNNI intrinsics.
Use std::numeric_limits<uint32_t>::max() for all overflow checks in
ObjectFileWasm and fix a few locations where I incorrectly used `>=`
instead of `>`.