Make sure that empty structs are treated as if it has a size of one bit
in function parameters and return types so that it occupies a full
argument and/or return register slot.
This fixes crashes and miscompilations when passing and/or returning
empty structs.
Reviewed by: @s-barannikov
According to the specification in
https://github.com/ARM-software/acle/pull/309 this adds the intrinsics
void svmopa_za16[_f16]_m(uint64_t tile, svbool_t pn, svbool_t pm,
svfloat16_t zn, svfloat16_t zm)
__arm_streaming __arm_inout("za");
void svmops_za16[_f16]_m(uint64_t tile, svbool_t pn,
svbool_t pm, svfloat16_t zn, svfloat16_t zm)
__arm_streaming __arm_inout("za");
as well as the corresponding `bf16` variants.
Adds a builtin and intrinsic for the f32.store_f16 instruction.
The instruction stores an f32 value as an f16 memory. Specified at:
29a9b9462c/proposals/half-precision/Overview.md
Note: the current spec has f32.store_f16 as opcode 0xFD0121, but this is
incorrect and will be changed to 0xFC31 soon.
Add a couple of tests to make it clear:
- when FMV should be enabled and disabled by the driver.
- which extensions are enabled/disabled based on the dependencies
specified in TargetParser.
Reverts llvm/llvm-project#88266 due to test failures
error: 'expected-error' diagnostics seen but not expected:
(frontend): '-fsyntax-only' action ignored; '-emit-llvm' action
specified previously
According to the specification in
https://github.com/ARM-software/acle/pull/309 this adds the intrinsics
void_svadd_za16_vg1x2_f16(uint32_t slice, svfloat16x2_t zn)
__arm_streaming __arm_inout("za");
void_svadd_za16_vg1x4_f16(uint32_t slice, svfloat16x4_t zn)
__arm_streaming __arm_inout("za");
void_svsub_za16_vg1x2_f16(uint32_t slice, svfloat16x2_t zn)
__arm_streaming __arm_inout("za");
void_svsub_za16_vg1x4_f16(uint32_t slice, svfloat16x4_t zn)
__arm_streaming __arm_inout("za");
as well as the corresponding `bf16` variants.
Depends on #87545
Emit PAuth ABI compatibility tag values as llvm module flags:
- `aarch64-elf-pauthabi-platform`
- `aarch64-elf-pauthabi-version`
For platform 0x10000002 (llvm_linux), the version value bits correspond
to the following LangOptions defined in #85232:
- bit 0: `PointerAuthIntrinsics`;
- bit 1: `PointerAuthCalls`;
- bit 2: `PointerAuthReturns`;
- bit 3: `PointerAuthAuthTraps`;
- bit 4: `PointerAuthVTPtrAddressDiscrimination`;
- bit 5: `PointerAuthVTPtrTypeDiscrimination`;
- bit 6: `PointerAuthInitFini`.
---------
Co-authored-by: Ahmed Bougacha <ahmed@bougacha.org>
This patch makes determining alignment and width of BitInt to be target
ABI specific and makes it consistent with [Procedure Call Standard for
the Arm® 64-bit Architecture
(AArch64)](https://github.com/ARM-software/abi-aa/blob/main/aapcs64/aapcs64.rst)
for AArch64 targets.
Currently, when the -relink-builtin-bitcodes-postop option is used we
link builtin bitcodes twice: once before optimization, and again after
optimization.
With this change, we omit the pre-opt linking when the option is set,
and we rename the option to the following:
-Xclang -mlink-builtin-bitcodes-postopt
(-Xclang -mno-link-builtin-bitcodes-postopt)
The goal of this change is to reduce compile time. We do lose the
theoretical benefits of pre-opt linking, but in practice these are small
than the overhead of linking twice. However we may be able to address
this in a future patch by adjusting the position of the builtin-bitcode
linking pass.
Compilations not setting the option are unaffected
This change is an implementation of #87367's investigation on supporting
IEEE math operations as intrinsics.
Which was discussed in this RFC:
https://discourse.llvm.org/t/rfc-all-the-math-intrinsics/78294
If you want an overarching view of how this will all connect see:
https://github.com/llvm/llvm-project/pull/90088
Changes:
- `clang/docs/LanguageExtensions.rst` - Document the new elementwise tan
builtin.
- `clang/include/clang/Basic/Builtins.td` - Implement the tan builtin.
- `clang/lib/CodeGen/CGBuiltin.cpp` - invoke the tan intrinsic on uses
of the builtin
- `clang/lib/Headers/hlsl/hlsl_intrinsics.h` - Associate the tan builtin
with the equivalent hlsl apis
- `clang/lib/Sema/SemaChecking.cpp` - Add generic sema checks as well as
HLSL specifc sema checks to the tan builtin
- `llvm/include/llvm/IR/Intrinsics.td` - Create the tan intrinsic
- `llvm/docs/LangRef.rst` - Document the tan intrinsic
For global functions and static methods the MSVC ABI returns
structs/classes with a deleted copy assignment operator indirectly.
From local testing this ABI holds true for all currently supported
architectures including ARM64EC.
These test cases are testing features not available when either
targeting the s390x-ibm-zos target or use tools/features not available
on the z/OS operating system. In a couple cases the lit test had a
number of subtests with one or two that aren't supported on z/OS. Rather
than mark the entire test as unsupported I split out the unsupported
tests into a separate test case.
Adds a builtin and intrinsic for the f32.load_f16 instruction.
The instruction loads an f16 value from memory and puts it in an f32.
Specified at:
29a9b9462c/proposals/half-precision/Overview.md
Note: the current spec has f32.load_f16 as opcode 0xFD0120, but this is
incorrect and will be changed to 0xFC30 soon.
When set, the compiler will use separate unique sections for global
symbols in named special sections (e.g. symbols that are annotated with
__attribute__((section(...)))). Doing so enables linker GC to collect
unused symbols without having to use a different section per-symbol.
When using a hard-float ABI for a target without FP registers, it's not
possible to correctly generate code for functions with arguments which
must be passed in floating-point registers. This is diagnosed in CodeGen
instead of Sema, to more closely match GCC's behaviour around inline
functions, which is relied on by the Linux kernel.
Previously, this only checked function signatures as they were
code-generated, but this missed some cases:
* Calls to functions not defined in this translation unit.
* Calls through function pointers.
* Calls to variadic functions, where the variadic arguments have a
floating-point type.
This adds checks to function calls, as well as definitions, so that
these cases are correctly diagnosed.
This addresses an issue where the explicit alignment of 2 (for C++ ABI
reasons) was being propagated to the back end and causing under-aligned
functions (in special sections).
This is an alternate approach suggested by @efriedma-quic in PR #90415.
Fixes#90358
This is a fix for the issue #87758 where fast-math flags are not
propagated all builtins.
It seems like pragmas with fast math flags was only propagated to calls
of unary floating point builtins. This patch propagate them also for
binary and ternary floating point builtins.
For arch64 features, such as Branch Target Identification or MTE (Memory
Tagging Extension), compatible with targets that lack their support we
may encounter scenarios where a binary compiled with MTE for example is
executed on both MTE and non-MTE hardware and we still need to detect at
runtime whether the MTE feature is available to choose the appropriate
function version.
So, we cannot optimize the function multi versioning resolver by
removing checks for these features enabled for the target during
compilation.
https://wg21.link/P2809R3
This is applied as a DR to C++11 (C++98 did not guarantee forward
progress and is left untouched)
As an extension (and to preserve existing behavior in C), we consider
all controlling expression that can be constant folded
in the front end, not just standard constant expressions.
Make sure that the result from the popcnt/ctlz/cttz intrinsics is
unsigned casted to int, rather than casted as a signed value, when
expanding the __builtin_popcountg/__builtin_ctzg/__builtin_clzg
builtins.
An example would be
unsigned _BitInt(1) x = ...;
int y = __builtin_popcountg(x);
which previously was incorrectly expanded to
%1 = call i1 @llvm.ctpop.i1(i1 %0)
%cast = sext i1 %1 to i32
Since the input type is generic for those "g" versions of the builtins
the intrinsic call may return a value for which the sign bit is set
(that could typically for BitInt of size 1 and 2). So we need to emit a
zext rather than a sext to avoid negative results.
The PR implements a subset of features of function
__builtin_cpu_support() for AIX OS based on the information which AIX
kernel runtime variable `_system_configuration` and function call `getsystemcfg()` of
/usr/include/sys/systemcfg.h in AIX OS can provide.
Following subset of features are supported in the PR
"arch_3_00", "arch_3_1","booke","cellbe","darn","dfp","dscr" ,"ebb","efpsingle","efpdouble","fpu","htm","isel",
"mma","mmu","pa6t","power4","power5","power5+","power6x","ppc32","ppc601","ppc64","ppcle","smt",
"spe","tar","true_le","ucache","vsx"
Canonicalize getelementptr instructions for scalable vector types into
ptradd representation with an explicit llvm.vscale call. This
representation has better support in BasicAA, which can reason about
llvm.vscale, but not plain scalable GEPs.
The implementation only enables when the `-enable-tlsdesc` option is
passed and the TLS model is `dynamic`.
LoongArch's GCC has the same option(-mtls-dialet=) as RISC-V.
Reviewers: heiher, MaskRay, SixWeining
Reviewed By: SixWeining, MaskRay
Pull Request: https://github.com/llvm/llvm-project/pull/90159
When building the Linux kernel for i386, the -mregparm=3 option is
enabled. Crashes were observed in the sanitizer handler functions, and
the problem was found to be mismatched calling convention.
As was fixed in commit c167c0a4dcdb ("[BuildLibCalls] infer inreg param
attrs from NumRegisterParameters"), call arguments need to be marked as
"in register" when -mregparm is set. Use the same helper developed there
to update the function arguments.
Since CreateRuntimeFunction() is actually part of CodeGenModule, storage
of the -mregparm value is also moved to the constructor, as doing this
in Release() is too late.
Fixes: https://github.com/llvm/llvm-project/issues/89670
Currently, a lot of `__builtin_reduce_*` function do not support
scalable vectors, i.e., ARM SVE and RISCV V. This PR adds support for
them. The main code change is to use a different path to extract the
type from the vectors, the rest is the same and LLVM supports the reduce
functions for `vscale` vectors.
This PR adds scalable vector support for:
- `__builtin_reduce_add`
- `__builtin_reduce_mul`
- `__builtin_reduce_xor`
- `__builtin_reduce_or`
- `__builtin_reduce_and`
- `__builtin_reduce_min`
- `__builtin_reduce_max`
Note: For all except `min/max`, the element type must still be an
integer value. Adding floating point support for `add` and `mul` is
still an open TODO.
Since all optimizations that use range metadata now also handle range attribute, this patch replaces writes of
range metadata for call instructions to range attributes.
These tests just don't check the output written to the current directory. The
current directory may be write protected e.g. in a sandboxed environment.
The Testcases that use -emit-llvm and -verify only care about stdout/stderr
and are in this patch changed to use -emit-llvm-only to avoid writing to an
output file. The verify-inlineasmbr.mir testcase that also only care about
stdout/stderr is in this patch changed to throw away the output file and just
write to /dev/null.
A struct that declares an inner struct, but no fields, won't have a
field count. So getting the offset of the inner struct fails. This
happens in both C and C++:
struct foo {
struct bar {
int Quantizermatrix[];
};
};
Here 'struct foo' has no fields.
Closes: https://github.com/llvm/llvm-project/issues/88931