Alas reflection pushed p2719 out of C++26, so this PR changes the
diagnostics to reflect that for now type aware allocation is
functionally a clang extension.
`std::invoke` is currently quite heavy compared to a function call,
since it involves quite heavy SFINAE. This can be done significantly
more efficient by the compiler, since most calls to `std::invoke` are
simple function calls and 6 out of the seven overloads for `std::invoke`
exist only to support member pointers. Even these boil down to a few
relatively simple checks.
Some real-world testing with this patch revealed some significant
results. For example, instantiating `std::format("Banane")` (and its
callees) went down from ~125ms on my system to ~104ms.
The LoongArch psABI recently added __bf16 type support. Now we can
enable this new type in clang.
Currently, bf16 operations are automatically supported by promoting to
float. This patch adds bf16 support by ensuring that load extension /
truncate store operations are properly expanded.
And this commit implements support for bf16 truncate/extend on hard FP
targets. The extend operation is implemented by a shift just as in the
standard legalization. This requires custom lowering of the truncate
libcall on hard float ABIs (the normal libcall code path is used on soft
ABIs).
With pointer authentication it becomes non-trivial to correctly load the
vtable pointer of a polymorphic object.
__builtin_get_vtable_pointer is a function that performs the load and
performs the appropriate authentication operations if necessary.
In 'asm goto' statements ('callbr' in LLVM IR), you can specify one or
more labels / basic blocks in the containing function which the assembly
code might jump to. If you're also compiling with branch target
enforcement via BTI, then previously listing a basic block as a possible
jump destination of an asm goto would cause a BTI instruction to be
placed at the start of the block, in case the assembly code used an
_indirect_ branch instruction (i.e. to a destination address read from a
register) to jump to that location. Now it doesn't do that any more:
branches to destination labels from the assembly code are assumed to be
direct branches (to a relative offset encoded in the instruction), which
don't require a BTI at their destination.
This change was proposed in https://discourse.llvm.org/t/85845 and there
seemed to be no disagreement. The rationale is:
1. it brings clang's handling of asm goto in Arm and AArch64 in line
with gcc's, which didn't generate BTIs at the target labels in the first
place.
2. it improves performance in the Linux kernel, which uses a lot of 'asm
goto' in which the assembly language just contains a NOP, and the
label's address is saved elsewhere to let the kernel self-modify at run
time to swap between the original NOP and a direct branch to the label.
This allows hot code paths to be instrumented for debugging, at only the
cost of a NOP when the instrumentation is turned off, instead of the
larger cost of an indirect branch. In this situation a BTI is
unnecessary (if the branch happens it's direct), and since the code
paths are hot, also a noticeable performance hit.
Implementation:
`SelectionDAGBuilder::visitCallBr` is the place where 'asm goto' target
labels are handled. It calls `setIsInlineAsmBrIndirectTarget()` on each
target `MachineBasicBlock`. Previously it also called
`setMachineBlockAddressTaken()`, which made `hasAddressTaken()` return
true, which caused a BTI to be added in the Arm backends.
Now `visitCallBr` doesn't call `setMachineBlockAddressTaken()` any more
on asm goto targets, but `hasAddressTaken()` also checks the flag set by
`setIsInlineAsmBrIndirectTarget()`. So call sites that were using
`hasAddressTaken()` don't need to be modified. But the Arm backends
don't call `hasAddressTaken()` any more: instead they test two more
specific query functions that cover all the reasons `hasAddressTaken()`
might have returned true _except_ being an asm goto target.
Testing:
The new test `AArch64/callbr-asm-label-bti.ll` is testing the actual
change, where it expects not to see a `bti` instruction after
`[[LABEL]]`. The rest of the test changes are all churn, due to the
flags on basic blocks changing. Actual output code hasn't changed in any
of the existing tests, only comments and diagnostics.
Further work:
`RISCVIndirectBranchTracking.cpp` and `X86IndirectBranchTracking.cpp`
also call `hasAddressTaken()` in a way that might benefit from using the
same more specific check I've put in `ARMBranchTargets.cpp` and
`AArch64BranchTargets.cpp`. But I'm not sure of that, so in this commit
I've only changed the Arm backends, and left those alone.
Enable _Float16 for LoongArch target. Additionally, this change fixes
incorrect ABI lowering of _Float16 in the case of structs containing
fp16 that are eligible for passing via GPR+FPR or FPR+FPR. Finally, it
also fixes int16 -> __fp16 conversion code gen, which uses generic LLVM
IR rather than llvm.convert.to.fp16 intrinsics.
The C++26 standard relocatable type traits has slightly different
semantics, so we introduced a new
``__builtin_is_cpp_trivially_relocatable``
when implementing trivial relocation in #127636.
However, having multiple relocatable traits would be confusing
in the long run, so we deprecate the old trait.
As discussed in #127636
`__builtin_is_cpp_trivially_relocatable` should be used instead.
The C++26 standard relocatable type traits has slightly different
semantics, so we introduced a new
``__builtin_is_cpp_trivially_relocatable`` when implementing trivial
relocation in #127636.
However, having multiple relocatable traits would be confusing in the
long run, so we deprecate the old trait.
As discussed in #127636
`__builtin_is_cpp_trivially_relocatable` should be used instead.
---------
Co-authored-by: Aaron Ballman <aaron@aaronballman.com>
This adds
- The parsing of `trivially_relocatable_if_eligible`,
`replaceable_if_eligible` keywords
- `__builtin_trivially_relocate`, implemented in terms of memmove. In
the future this should
- Add the appropriate start/end lifetime markers that llvm does not have
(`start_lifetime_as`)
- Add support for ptrauth when that's upstreamed
- the `__builtin_is_cpp_trivially_relocatable` and
`__builtin_is_replaceable` traits
Fixes#127609
- _Float16 is now accepted by Clang.
- The half IR type is fully handled by the backend.
- These values are passed in FP registers and converted to/from float around
each operation.
- Compiler-rt conversion functions are now built for s390x including the missing
extendhfdf2 which was added.
Fixes#50374
With https://github.com/llvm/llvm-project/pull/112852, we claimed that
llvm.minnum and llvm.maxnum should treat +0.0>-0.0, while libc doesn't
require fmin(3)/fmax(3) for it.
To make llvm.minnum/llvm.maxnum easy to use, we define the builtin
functions for them, include
__builtin_elementwise_minnum
__builtin_elementwise_maxnum
All of them support _Float16, __bf16, float, double, long double.
C2y adds the `_Countof` operator which returns the number of elements in
an array. As with `sizeof`, `_Countof` either accepts a parenthesized
type name or an expression. Its operand must be (of) an array type. When
passed a constant-size array operand, the operator is a constant
expression which is valid for use as an integer constant expression.
This is being exposed as an extension in earlier C language modes, but
not in C++. C++ already has `std::extent` and `std::size` to cover these
needs, so the operator doesn't seem to get the user enough benefit to
warrant carrying this as an extension.
Fixes#102836
Introduce a trait to determine the number of bindings that would be
produced by
```cpp
auto [...p] = expr;
```
This is necessary to implement P2300
(https://eel.is/c++draft/exec#snd.concepts-5), but can also be used to
implement a general get<N> function that supports aggregates
`__builtin_structured_binding_size` is a unary type trait that evaluates
to the number of bindings in a decomposition
If the argument cannot be decomposed, a sfinae-friendly error is
produced.
A type is considered a valid tuple if `std::tuple_size_v<T>` is a valid
expression, even if there is no valid `std::tuple_element`
specialization or suitable `get` function for that type.
Fixes#46049
WG14 N3353 added support for 0o and 0O as octal literal prefixes. It
also deprecates use of octal literals without a prefix, except for the
literal 0.
This feature is being exposed as an extension in older C language modes
as well as in all C++ language modes.
This builtin is supported by GCC and is a way to improve diagnostic
behavior for va_start in C23 mode. C23 no longer requires a second
argument to the va_start macro in support of variadic functions with no
leading parameters. However, we still want to diagnose passing more than
two arguments, or diagnose when passing something other than the last
parameter in the variadic function.
This also updates the freestanding <stdarg.h> header to use the new
builtin, same as how GCC works.
Fixes#124031
Clang has __builtin_elementwise_exp and __builtin_elementwise_exp2
intrinsics, but no __builtin_elementwise_exp10.
There doesn't seem to be a good reason not to expose the exp10 flavour
of this intrinsic too.
This commit introduces this intrinsic following the same pattern as the
exp and exp2 versions.
Fixes: SWDEV-519541
This implements the R2 semantics of P0963.
The R1 semantics, as outlined in the paper, were introduced in Clang 6.
In addition to that, the paper proposes swapping the evaluation order of
condition expressions and the initialization of binding declarations
(i.e. std::tuple-like decompositions).
Add option and statement attribute for controlling emitting of
target-specific
metadata to atomicrmw instructions in IR.
The RFC for this attribute and option is
https://discourse.llvm.org/t/rfc-add-clang-atomic-control-options-and-pragmas/80641,
Originally a pragma was proposed, then it was changed to clang
attribute.
This attribute allows users to specify one, two, or all three options
and must be applied
to a compound statement. The attribute can also be nested, with inner
attributes
overriding the options specified by outer attributes or the target's
default
options. These options will then determine the target-specific metadata
added to atomic
instructions in the IR.
In addition to the attribute, three new compiler options are introduced:
`-f[no-]atomic-remote-memory`, `-f[no-]atomic-fine-grained-memory`,
`-f[no-]atomic-ignore-denormal-mode`.
These compiler options allow users to override the default options
through the
Clang driver and front end. `-m[no-]unsafe-fp-atomics` is aliased to
`-f[no-]ignore-denormal-mode`.
In terms of implementation, the atomic attribute is represented in the
AST by the
existing AttributedStmt, with minimal changes to AST and Sema.
During code generation in Clang, the CodeGenModule maintains the current
atomic options,
which are used to emit the relevant metadata for atomic instructions.
RAII is used
to manage the saving and restoring of atomic options when entering
and exiting nested AttributedStmt.
This patch adds a new __builtin_assume_dereferenceable to encode
dereferenceability of a pointer using llvm.assume with an operand
bundle.
For now the builtin only accepts constant sizes, I am planning to drop
this restriction in a follow-up change.
This can be used to better optimize cases where a pointer is known to be
dereferenceable, e.g. unconditionally loading from p2 when vectorizing
the loop.
int *get_ptr();
void foo(int* src, int x) {
int *p2 = get_ptr();
__builtin_assume_aligned(p2, 4);
__builtin_assume_dereferenceable(p2, 4000);
for (unsigned I = 0; I != 1000; ++I) {
int x = src[I];
if (x == 0)
x = p2[I];
src[I] = x;
}
}
PR: https://github.com/llvm/llvm-project/pull/121789
`__is_referenceable` is almost unused in the wild, and the few cases I
was able to find had checks around them. Since the places in the
standard library where `__is_referenceable` is used have bespoke
builtins, it doesn't make a ton of sense to keep this builtin around.
`__is_referenceable` has been documented as deprecated in Clang 20.
This commit restricts the use of scalar types in vector math builtins,
particularly the `__builtin_elementwise_*` builtins.
Previously, small scalar integer types would be promoted to `int`, as
per the usual conversions. This would silently do the wrong thing for
certain operations, such as `add_sat`, `popcount`, `bitreverse`, and
others. Similarly, since unsigned integer types were promoted to `int`,
something like `add_sat(unsigned char, unsigned char)` would perform a
*signed* operation.
With this patch, promotable scalar integer types are not promoted to
int, and are kept intact. If any of the types differ in the binary and
ternary builtins, an error is issued. Similarly an error is issued if
builtins are supplied integer types of different signs. Mixing enums of
different types in binary/ternary builtins now consistently raises an
error in all language modes.
This brings the behaviour surrounding scalar types more in line with
that of vector types. No change is made to vector types, which are both
not promoted and whose element types must match.
Fixes#84047.
RFC:
https://discourse.llvm.org/t/rfc-change-behaviour-of-elementwise-builtins-on-scalar-integer-types/83725
`__is_referenceable` is almost unused in the wild, and the few cases I
was able to find had checks around them. Since The places in the
standard library where `__is_referenceable` is used have bespoke
builtins, it doesn't make a ton of sense to keep this builtin around.
See #123078
This commit migrates the contents of 'annotations.html' in the old
HTML-based documentation of the Clang static analyzer to the new
RST-based documentation.
During this conversion I reordered the sections of this documentation
file by placing the section "Custom Assertion Handlers" as a subsection
of "Annotations to Enhance Generic Checks". (The primary motivation was
that Sphinx complained about inconsistent section levels; with this
change I preserved that sections describing individual annotations are
all on the same level.)
Apart from this change and the format conversion, I didn't review,
validate or edit the contents of this documentation file because I think
it would be better to place any additional changes in separate commits.
PR for issue #78657
Updated clang/docs/LanguageExtensions.rst to detail the return value of
__builtin_COLUMN for this implementation.
--
Fyi, this is my first contribution, so please bear with me.
There already appears to be a unit test for __builtin_COLUMN in
clang/test/SemaCXX/source_location.cpp.
This PR addresses #116880
Updated
[LanguageExtensions.rst](https://github.com/llvm/llvm-project/blob/main/clang/docs/LanguageExtensions.rst)
to include support for C++11 enumerations with a fixed underlying type
in C. Included a note that this is only a language extension prior to
C23.
Updated
[Features.def](https://github.com/llvm-mirror/clang/blob/master/include/clang/Basic/Features.def)
to support for `__has_extension(c_fixed_enum)` by added it as a feature
(for C23) and an extension (for <C23).
Updated
[enum.c](https://github.com/llvm/llvm-project/blob/main/clang/test/Sema/enum.c)
to ensure support of C++11 enumerations with a fixed underlying type in
both <C23 and C23, as well as the functionality of
`__has_extension(c_fixed_enum)`.
---
In enum.c, I encountered a warning when testing enumerations with a
fixed underlying type in pre-C23 modes. Specifically, the test produces
the warning: `enumeration types with a fixed underlying type are a C23
extension`. I am unsure if this warning is expected behavior, as
enumerations with a fixed underlying type should behave identically in
pre-C23 and C23 modes. I expected that adding `c_fixed_enum` as an
extension would fix this warning. Feedback on whether this is correct or
requires adjustment would be appreciated.
I was also unsure of the best location for the additions in
`Features.def`, I would appreciate advice on this as well
Note that this is my first PR to LLVM, so please liberally critique it!
This paper made empty structures and unions implementation-defined. We
have always supported this as a GNU extension, so now we're documenting
our behavior and removing the extension warning in C2y mode.
This paper made qualified function types implementation-defined. We have
always supported this as an extension, so now we're documenting our
behavior.
Note, we still warn about this by default even in C2y mode because a
qualified function type is a sign of programmer confusion.
The __builtin_counted_by_ref builtin is used on a flexible array
pointer and returns a pointer to the "counted_by" attribute's COUNT
argument, which is a field in the same non-anonymous struct as the
flexible array member. This is useful for automatically setting the
count field without needing the programmer's intervention. Otherwise
it's possible to get this anti-pattern:
ptr = alloc(<ty>, ..., COUNT);
ptr->FAM[9] = 42; /* <<< Sanitizer will complain */
ptr->count = COUNT;
To prevent this anti-pattern, the user can create an allocator that
automatically performs the assignment:
#define alloc(TY, FAM, COUNT) ({ \
TY __p = alloc(get_size(TY, COUNT)); \
if (__builtin_counted_by_ref(__p->FAM)) \
*__builtin_counted_by_ref(__p->FAM) = COUNT; \
__p; \
})
The builtin's behavior is heavily dependent upon the "counted_by"
attribute existing. It's main utility is during allocation to avoid
the above anti-pattern. If the flexible array member doesn't have that
attribute, the builtin becomes a no-op. Therefore, if the flexible
array member has a "count" field not referenced by "counted_by", it
must be set explicitly after the allocation as this builtin will
return a "nullptr" and the assignment will most likely be elided.
---------
Co-authored-by: Bill Wendling <isanbard@gmail.com>
Co-authored-by: Aaron Ballman <aaron@aaronballman.com>