In streaming mode, both the @llvm.aarch64.sme.cnts and @llvm.aarch64.sve.cnt
intrinsics are equivalent. For SVE, cnt* is lowered in instCombineIntrinsic
to @llvm.sme.vscale(). This patch lowers the SME intrinsic similarly when
in streaming-mode.
This patch works towards consolidating all Clang debug-info into the
`clang/test/DebugInfo` directory
(https://discourse.llvm.org/t/clang-test-location-of-clang-debug-info-tests/87958).
Here we move only the `clang/test/CodeGen` tests.
The list of files i came up with is:
1. searched for anything with `*debug-info*` in the filename
2. searched for occurrences of `debug-info-kind` in the tests
I created a couple of subdirectories in `clang/test/DebugInfo` where I
thought it made sense (mostly when the tests were target-specific).
There's a couple of tests in `clang/test/CodeGen` that still set
`-debug-info-kind`. They probably don't need to do that, but I'm not
changing that as part of this PR.
Now that #149310 has restricted lifetime intrinsics to only work on
allocas, we can also drop the explicit size argument. Instead, the size
is implied by the alloca.
This removes the ability to only mark a prefix of an alloca alive/dead.
We never used that capability, so we should remove the need to handle
that possibility everywhere (though many key places, including stack
coloring, did not actually respect this).
Issue originally raised in
https://github.com/llvm/llvm-project/issues/71362#issuecomment-3028515618.
Certain NEON intrinsics that operate on poly types (e.g. poly8x8_t)
failed to compile with the -fno-lax-vector-conversions flag. This patch
updates NeonEmitter.cpp to insert an explicit __builtin_bit_cast from
poly types to the required signed integer vector types when generating
lane-related intrinsics. A test 'neon-bitcast-poly.ll' is included.
Previously, the unsigned NEON intrinsic variants of 'vqshrun_high_n' and
'vqrshrun_high_n' were using signed integer types for their first
argument and return values.
These should be unsigned according to developer.arm.com, however.
Adjust the test cases accordingly.
Let Clang emit `dead_on_return` attribute on pointer arguments
that are passed indirectly, namely, large aggregates that the
ABI mandates be passed by value; thus, the parameter is destroyed
within the callee. Writes to such arguments are not observable by
the caller after the callee returns.
This should desirably enable further MemCpyOpt/DSE optimizations.
Previous discussion: https://discourse.llvm.org/t/rfc-add-dead-on-return-attribute/86871.
Feature flags protect instructions not datatypes. This means only
builtins associated with +bf16 protected instructions must be guarded.
Those that treat the data as opaque 16-bit values (e.g. loads, store and
shuffles) should be freely available with the underlying SVE feature.
The quadword vector instructions introduced by SVE2p1/SME2p1 but the
builtins were not available to streaming mode.
RAX1 is available in streaming mode when SME2p1 is available.
Builtins that are enabled via +sve2p1 in non-streaming mode and +sme{2}
in streaming mode should also be enabled via +sve+sme{2} in
non-streaming mode and +sme+sve2p1 in streaming mode.
llvm.aarch64.set.fpmr only writes to inaccessible memory. Tag it with
the IntrWriteMem and IntrInaccessibleMemOnly properties so the optimiser
can treat it as a pure write.
The original patch did not add this property, causing the intrinsic to
be conservatively treated as readwrite. This commit fixes that.
Adds sve-sha3 to reference FEAT_SVE_SHA3 without specifically enabling
SVE2. The SVE2 requirement for AES, SHA3 and Bitperm is replaced with
SVE for non-streaming function.
As reported on #135064, the generic pointer coercion code in
CoerceIntOrPtrToIntOrPtr cannot handle address space casts (it tries to bitcast
the pointers). This bails out if an address space qualifier is found on the
pointer.
This updates the element types used in the new __Int8x8_t types added in
#126945, mostly to allow C++ name mangling in ItaniumMangling
mangleAArch64VectorBase to work correctly. Char is replaced by
SignedCharTy or UnsignedCharTy as required and Float16Ty is better using
HalfTy to match the vector types. Same for Long types.
The aim here is to avoid a ptrtoint->inttoptr round-trip through the function
argument whilst keeping the calling convention the same. Given a struct which
is <= 128bits in size, which can only contain either 1 or 2 pointers, we
convert to a ptr or [2 x ptr] as opposed to the old coercion that uses i64 or
[2 x i64]. This helps alias analysis produce more accurate results.
With the current behavior the following example yields a linker error:
"multiple definition of `foo.default'"
// Translation Unit 1
__attribute__((target_clones("dotprod, sve"))) int foo(void) { return 1; }
// Translation Unit 2
int foo(void) { return 0; }
__attribute__((target_version("dotprod"))) int foo(void);
__attribute__((target_version("sve"))) int foo(void);
int bar(void) { return foo(); }
That is because foo.default is generated twice. As a user I don't find
this particularly intuitive. If I wanted the default to be generated in
TU1 I'd rather write target_clones("dotprod, sve", "default")
explicitly.
When changing the code I noticed that the RISC-V target defers the
resolver emission when encountering a target_version definition. This
seems accidental since it only makes sense for AArch64, where we only
emit a resolver once we've processed the entire TU, and only if the
default version is present. I've changed this so that RISC-V immediately
emmits the resolver. I adjusted the codegen tests since the functions
now appear in a different order.
Implements https://github.com/ARM-software/acle/pull/377
The "target-features" function attribute is not currently considered
when adding vscale_range to a function. When +sve/+sme are pushed onto
functions with "#pragma attribute push(+sve/+sme)", the function
potentially misses out on optimizations that rely on vscale_range being
present.
This patch adds fp8 variants to existing intrinsics, whose operation
doesn't depend on arguments being a specific type.
It also changes mfloat8 type representation in memory from `i8` to
`<1xi8>`
Similarly to #135016, refactor getPTrue to return splat (1) for
all-active patterns. The main motivation for this is to improve
code gen for fixed-length vector loads/stores that are converted to SVE
masked memory ops when the vectors are wider than Neon. Emitting the
mask as a splat helps DAGCombiner simplify all-active masked
loads/stores into unmaked ones, for which it already has suitable
combines and ISel has suitable patterns.
The original __gcsss intrinsic was implemented based on:
https://github.com/ARM-software/acle/pull/260
with the signature: const void *__gcsss(const void *)
Per the updated specification in:
https://github.com/ARM-software/acle/pull/364
both const qualifiers have been removed. This commit updates the
signature accordingly to: void *__gcsss(void *)
This aligns the implementation with the latest ACLE definition.
Implement all mf8 FMOP4A instructions in clang and llvm following the
acle in https://github.com/ARM-software/acle/pull/381/files.
It also updates previous mop4 instructions from IntrNoMem to
IntrInaccessibleMemOnly
RecordLayout::UnadjustedAlignment was documented as "Maximum of the
alignments of the record members in characters", but
RecordLayout::getUnadjustedAlignment(), which just returns
UnadjustedAlignment, was documented as getting "the record alignment in
characters, before alignment adjustement." These are not the same thing:
the former excludes alignment of base classes, the latter takes it into
account. ItaniumRecordLayoutBuilder::LayoutBase was setting it according
to the former, but the AAPCS calling convention handling, currently the
only user, relies on it being set according to the latter.
Fixes#135551.
This patch relands https://github.com/llvm/llvm-project/pull/130990.
If the check value is passed by reference, add `memory(read)`.
Original PR description:
This patch adds `memory(argmem: read, inaccessiblemem: readwrite)` to
**recoverable** ubsan handlers in order to unblock some
memory/loop optimizations. It provides an average of 3% performance
improvement on llvm-test-suite (except for 49 test failures due to ubsan
diagnostics).
SVE Operations such as predicated loads become canonicalized to LLVM
masked loads, and doing the same for ptrue(all) to splat(1) creates
further optimization opportunities from generic LLVM IR passes.
Implement all single-multi {BF/F/S/U/SU/US}MOP4{A/S} instructions in
clang and llvm following the acle in
https://github.com/ARM-software/acle/pull/381/files.
This PR depends on https://github.com/llvm/llvm-project/pull/127797
This patch updates the semantics of template arguments in intrinsic
names for clarity and ease of use. Previously, template argument numbers
indicated which character in the prototype string determined the final
type suffix, which was confusing—especially for intrinsics using
multiple prototype modifiers per operand (e.g., intrinsics operating on
arrays of vectors). The number had to reference the correct character in
the prototype (e.g., the ‘u’ in “2.u”), making the system cumbersome and
error-prone.
With this patch, template argument numbers now refer to the operand
number that determines the final type suffix, providing a more intuitive
and consistent approach.
Currently arm_neon.h emits C-style casts to do vector type casts. This
relies on implicit conversion between vector types to be enabled, which
is currently deprecated behaviour and soon will disappear. To ensure
NEON code will keep working afterwards, this patch changes all this
vector type casts into bitcasts.
Co-authored-by: Momchil Velikov <momchil.velikov@arm.com>
Removes attr-target-version.c which doesn't have a clear purpose.
Introduces AArch64/fmv-detection.c to check detection bitmasks.
Adds coverage in AArch64/fmv-resolver-emission.c
When compiling VLS SVE, the compiler often replaces VL-based offsets
with immediate-based ones. This leads to a mismatch in the allowed
addressing modes due to SVE loads/stores generally expecting immediate
offsets relative to VL. For example, given:
```c
svfloat64_t foo(const double *x) {
svbool_t pg = svptrue_b64();
return svld1_f64(pg, x+svcntd());
}
```
When compiled with `-msve-vector-bits=128`, we currently generate:
```gas
foo:
ptrue p0.d
mov x8, #2
ld1d { z0.d }, p0/z, [x0, x8, lsl #3]
ret
```
Instead, we could be generating:
```gas
foo:
ldr z0, [x0, #1, mul vl]
ret
```
Likewise for other types, stores, and other VLS lengths.
This patch achieves the above by extending `SelectAddrModeIndexedSVE`
to let constants through when `vscale` is known.