In my previous PR (#123656) to update the names of AVX10.2 intrinsics
and mnemonics, I have erroneously deleted `_ph` from few intrinsics.
This PR corrects this.
In Continuous instrumentation profiling mode, profile or coverage data
collected via compiler instrumentation is continuously synced to the
profile file. This feature has existed for a while, and is documented
here:
https://clang.llvm.org/docs/SourceBasedCodeCoverage.html#running-the-instrumented-program
This PR creates a user facing option to enable the feature.
---------
Co-authored-by: Wael Yehia <wyehia@ca.ibm.com>
This is the 2nd part to
https://github.com/llvm/llvm-project/pull/124752. Here we make sure to
set the `DICompositeType` `EnumKind` if the enum was declared with
`__attribute__((enum_extensibility(...)))`. In DWARF this will be
rendered as `DW_AT_APPLE_enum_kind` and will be used by LLDB when
creating `clang::EnumDecl`s from debug-info.
Depends on https://github.com/llvm/llvm-project/pull/126044
Kernel Control Flow Integrity (kCFI) is a feature that hardens indirect
calls by comparing a 32-bit hash of the function pointer's type against
a hash of the target function's type. If the hashes do not match, the
kernel may panic (or log the hash check failure, depending on the
kernel's configuration). These hashes are computed at compile time by
applying the xxHash64 algorithm to each mangled canonical function (or
function pointer) type, then truncating the result to 32 bits. This hash
is written into each indirect-callable function header by encoding it as
the 32-bit immediate operand to a `MOVri` instruction, e.g.:
```
__cfi_foo:
nop
nop
nop
nop
nop
nop
nop
nop
nop
nop
nop
movl $199571451, %eax # hash of foo's type = 0xBE537FB
foo:
...
```
This PR extends x86-based kCFI with a 3-bit arity indicator encoded in
the `MOVri` instruction's register (reg) field as follows:
| Arity Indicator | Description | Encoding in reg field |
| --------------- | --------------- | --------------- |
| 0 | 0 parameters | EAX |
| 1 | 1 parameter in RDI | ECX |
| 2 | 2 parameters in RDI and RSI | EDX |
| 3 | 3 parameters in RDI, RSI, and RDX | EBX |
| 4 | 4 parameters in RDI, RSI, RDX, and RCX | ESP |
| 5 | 5 parameters in RDI, RSI, RDX, RCX, and R8 | EBP |
| 6 | 6 parameters in RDI, RSI, RDX, RCX, R8, and R9 | ESI |
| 7 | At least one parameter may be passed on the stack | EDI |
For example, if `foo` takes 3 register arguments and no stack arguments
then the `MOVri` instruction in its kCFI header would instead be written
as:
```
movl $199571451, %ebx # hash of foo's type = 0xBE537FB
```
This PR will benefit other CFI approaches that build on kCFI, such as
FineIBT. For example, this proposed enhancement to FineIBT must be able
to infer (at kernel init time) which registers are live at an indirect
call target: https://lkml.org/lkml/2024/9/27/982. If the arity bits are
available in the kCFI function header, then this information is trivial
to infer.
Note that there is another existing PR proposal that includes the 3-bit
arity within the existing 32-bit immediate field, which introduces
different security properties:
https://github.com/llvm/llvm-project/pull/117121.
When building on Windows, dealing with the BlocksRuntime is slightly
more complicated. As we are not guaranteed a formward declaration for
the blocks runtime ABI symbols, we may generate the declarations for
them. In order to properly link against the well-known types, we always
annotated them as `__declspec(dllimport)`. This would require the
dynamic linking of the blocks runtime under all conditions. However,
this is the only the only possible way to us the library. We may be
building a fully sealed (static) executable. In such a case, the well
known symbols should not be marked as `dllimport` as they are assumed to
be statically available with the static linking to the BlocksRuntime.
Introduce a new driver/cc1 option `-static-libclosure` which mirrors the
myriad of similar options (`-static-libgcc`, `-static-libstdc++`,
-static-libsan`, etc).
This patch adds intrinsics for the tcgen05 alloc/dealloc
family of PTX instructions. This patch also adds an
addrspace 6 for tensor memory which is used by
these intrinsics.
lit tests are added and verified with a ptxas-12.8 executable.
Documentation for these additions is also added in NVPTXUsage.rst.
Signed-off-by: Durgadoss R <durgadossr@nvidia.com>
This seems consistent with the documentation, which claims it:
```
/// Looks through the Decl's underlying type to extract a FunctionType
/// when possible. Will return null if the type underlying the Decl does not
/// have a FunctionType.
const FunctionType *getFunctionType(bool BlocksToo = true) const;
```
Note: This patch rewords this doc comment to clarify it includes various
function pointer types.
Without this, attaching attributes (which use `HasFunctionProto`) to
member function pointers errors with:
```
error: '<attr>' only applies to non-K&R-style functions
```
...which does not really make sense, since member functions are not K&C
functions.
With this change the Arm SME TypeAttrs work correctly on member function
pointers.
Note, however, that not all attributes work correctly when applied to
function pointers or member function pointers. For example,
`alloc_align` crashes when applied to a function pointer (on truck):
https://godbolt.org/z/YvMhnhKfx (as it only expects a `FunctionDecl` not
a `ParmVarDecl`). The same crash applies to member function pointers
(for the same reason).
For C++ (but not C), empty structs should be passed to functions as if
they are a 1 byte object with 1 byte alignment.
This is defined in Arm's CPPABI32:
https://github.com/ARM-software/abi-aa/blob/main/cppabi32/cppabi32.rst
For the purposes of parameter passing in AAPCS32, a parameter whose
type is an empty class shall be treated as if its type were an
aggregate with a single member of type unsigned byte.
The AArch64 equivalent of this has an exception for structs containing
an array of size zero, I've kept that logic for ARM. I've not found a
reason for this exception, but I've checked that GCC does have the same
behaviour for ARM as it does for AArch64.
The AArch64 version has an Apple ABI with different rules, which ignores
empty structs in both C and C++. This is documented at
https://developer.apple.com/documentation/xcode/writing-arm64-code-for-apple-platforms.
The ARM equivalent of that appears to be AAPCS16_VFP, used for WatchOS,
but I can't find any documentation for that ABI, so I'm not sure what
rules it should follow. For now I've left it following the AArch64 Apple
rules.
If we have +sme but not +sve, we would not set vscale_range on
functions. It should be valid to apply it with the same range with just
+sme, which can help mitigate some performance regressions in cases such
as scalable vector bitcasts (https://godbolt.org/z/exhe4jd8d).
Refactoring of how __builtin_dynamic_object_size() is calculated for
flexible array members (in preparation for adding support for the
'counted_by' attribute on pointers in structs).
The only functionality change is that we use the already emitted Expr
code to build our calculations off of rather than re-emitting the Expr.
That allows the 'StructFieldAccess' visitor to sift through all casts
and ArraySubscriptExprs to find the first MemberExpr. We build our GEPs
and calculate offsets based off of relative distances from that
MemberExpr.
The testcase passes execution tests.
Calculate the flexible array member's object size using these formulae
(note: if the calculation is negative, we return 0.):
struct p;
struct s {
/* ... */
int count;
struct p *array[] __attribute__((counted_by(count)));
};
1) 'ptr->array':
count = ptr->count;
flexible_array_member_base_size = sizeof (*ptr->array);
flexible_array_member_size =
count * flexible_array_member_base_size;
if (flexible_array_member_size < 0)
return 0;
return flexible_array_member_size;
2) '&ptr->array[idx]':
count = ptr->count;
index = idx;
flexible_array_member_base_size = sizeof (*ptr->array);
flexible_array_member_size =
count * flexible_array_member_base_size;
index_size = index * flexible_array_member_base_size;
if (flexible_array_member_size < 0 || index < 0)
return 0;
return flexible_array_member_size - index_size;
3) '&ptr->field':
count = ptr->count;
sizeof_struct = sizeof (struct s);
flexible_array_member_base_size = sizeof (*ptr->array);
flexible_array_member_size =
count * flexible_array_member_base_size;
field_offset = offsetof (struct s, field);
offset_diff = sizeof_struct - field_offset;
if (flexible_array_member_size < 0)
return 0;
return offset_diff + flexible_array_member_size;
4) '&ptr->field_array[idx]':
count = ptr->count;
index = idx;
sizeof_struct = sizeof (struct s);
flexible_array_member_base_size = sizeof (*ptr->array);
flexible_array_member_size =
count * flexible_array_member_base_size;
field_base_size = sizeof (*ptr->field_array);
field_offset = offsetof (struct s, field)
field_offset += index * field_base_size;
offset_diff = sizeof_struct - field_offset;
if (flexible_array_member_size < 0 || index < 0)
return 0;
return offset_diff + flexible_array_member_size;
---------
Signed-off-by: Bill Wendling <morbo@google.com>
This reverts commit 928cad49beec0120686478f502899222e836b545 i.e.,
relands dccd27112722109d2e2f03e8da9ce8690f06e11b, with a fix to avoid
use-after-scope by changing the lambda to capture by value.
This adds the plumbing between -fsanitize-skip-hot-cutoff (introduced in
https://github.com/llvm/llvm-project/pull/121619) and
LowerAllowCheckPass<cutoffs> (introduced in
https://github.com/llvm/llvm-project/pull/124211).
The net effect is that -fsanitize-skip-hot-cutoff now combines the
functionality of -ubsan-guard-checks and
-lower-allow-check-percentile-cutoff (though this patch does not remove
those yet), and generalizes the latter to allow per-sanitizer cutoffs.
Note: this patch replaces Intrinsic::allow_ubsan_check's
SanitizerHandler parameter with SanitizerOrdinal; this is necessary
because the hot cutoffs are specified in terms of SanitizerOrdinal
(e.g., null, alignment), not SanitizerHandler (e.g., TypeMismatch).
Likewise, CodeGenFunction::EmitCheck is changed to emit
allow_ubsan_check() for each individual check.
---------
Co-authored-by: Vitaly Buka <vitalybuka@gmail.com>
Co-authored-by: Vitaly Buka <vitalybuka@google.com>
This PR removes the old `nocapture` attribute, replacing it with the new
`captures` attribute introduced in #116990. This change is
intended to be essentially NFC, replacing existing uses of `nocapture`
with `captures(none)` without adding any new analysis capabilities.
Making use of non-`none` values is left for a followup.
Some notes:
* `nocapture` will be upgraded to `captures(none)` by the bitcode
reader.
* `nocapture` will also be upgraded by the textual IR reader. This is to
make it easier to use old IR files and somewhat reduce the test churn in
this PR.
* Helper APIs like `doesNotCapture()` will check for `captures(none)`.
* MLIR import will convert `captures(none)` into an `llvm.nocapture`
attribute. The representation in the LLVM IR dialect should be updated
separately.
We had a test claiming that this empty struct type consumes a register
slot when passing it to a function with GCC, but that does not appear to
be the case, at least with GCC versions going back to 4.8.
This also caused a miscompilation when passing one of these structs to a
variadic function, but it turned out that our implementation of `va_arg`
matches GCC's ABI, so the one change fixes both bugs.
This commit restricts the use of scalar types in vector math builtins,
particularly the `__builtin_elementwise_*` builtins.
Previously, small scalar integer types would be promoted to `int`, as
per the usual conversions. This would silently do the wrong thing for
certain operations, such as `add_sat`, `popcount`, `bitreverse`, and
others. Similarly, since unsigned integer types were promoted to `int`,
something like `add_sat(unsigned char, unsigned char)` would perform a
*signed* operation.
With this patch, promotable scalar integer types are not promoted to
int, and are kept intact. If any of the types differ in the binary and
ternary builtins, an error is issued. Similarly an error is issued if
builtins are supplied integer types of different signs. Mixing enums of
different types in binary/ternary builtins now consistently raises an
error in all language modes.
This brings the behaviour surrounding scalar types more in line with
that of vector types. No change is made to vector types, which are both
not promoted and whose element types must match.
Fixes#84047.
RFC:
https://discourse.llvm.org/t/rfc-change-behaviour-of-elementwise-builtins-on-scalar-integer-types/83725
This patch contains a number of changes relating to the above flag;
primarily it updates comment references to the old flag names,
"-fextend-lifetimes" and "-fextend-this-ptr" to refer to the new names,
"-fextend-variable-liveness[={all,this}]". These changes are all NFC.
This patch also removes the explicit -fextend-this-ptr-liveness flag
alias, and shortens the help-text for the main flag; these are both
changes that were meant to be applied in the initial PR (#110000), but
due to some user-error on my part they were not included in the merged
commit.
This change makes it consistent with other uses of ubsantrap.
This also updates the tests. Notably, BoundsChecking/runtimes.ll had guard=3 which passed only because the method of calculating the parameter (`IRB.GetInsertBlock()->getParent()->size()`) happened to give the same answer.
Fixes two buildbot errors caused by 4424c44c (#110102):
The first error, seen on some sanitizer bots:
https://lab.llvm.org/buildbot/#/builders/51/builds/9901
The initial commit used the deprecated getDeclaration intrinsic instead
of the non-deprecated getOrInsert- equivalent. This patch trivially
updates the code in question to use the new intrinsic.
The second error, seen on the clang-armv8-quick bot:
https://lab.llvm.org/buildbot/#/builders/154/builds/10983
One of the tests depends on a particular triple to get the exact output
expected by the test, but did not specify this triple; this patch adds
the triple in question.
Following the previous patch which adds the "extend lifetimes" flag
without (almost) any functionality, this patch adds the real feature by
allowing Clang to emit fake uses. These are emitted as a new form of cleanup,
set for variable addresses, which just emits a fake use intrinsic when the
variable falls out of scope. The code for achieving this is simple, with most
of the logic centered on determining whether to emit a fake use for a given
address, and on ensuring that fake uses are ignored in a few cases.
Co-authored-by: Stephen Tozer <stephen.tozer@sony.com>
GCC supports three flags related to overflow behavior:
* `-fwrapv`: Makes signed integer overflow well-defined.
* `-fwrapv-pointer`: Makes pointer overflow well-defined.
* `-fno-strict-overflow`: Implies `-fwrapv -fwrapv-pointer`, making both
signed integer overflow and pointer overflow well-defined.
Clang currently only supports `-fno-strict-overflow` and `-fwrapv`, but
not `-fwrapv-pointer`.
This PR proposes to introduce `-fwrapv-pointer` and adjust the semantics
of `-fwrapv` to match GCC.
This allows signed integer overflow and pointer overflow to be
controlled independently, while `-fno-strict-overflow` still exists to
control both at the same time (and that option is consistent across GCC
and Clang).
This switches them to use tho common TableGen layer, extending it to
support the missing features needed by the NVPTX backend.
The biggest thing was to build a TableGen system that computes the
cumulative SM and PTX feature sets the same way the macros did. That's
done with some string concatenation tricks in TableGen, but they worked
out pretty neatly and are very comparable in complexity to the macro
version.
Then the actual defines were mapped over using a very hacky Python
script. It was never productionized or intended to work in the future,
but for posterity:
https://gist.github.com/chandlerc/10bdf8fb1312e252b4a501bace184b66
Last but not least, there was a very odd "bug" in one of the converted
builtins' prototype in the TableGen model: it didn't handle uses of `Z`
and `U` both as *qualifiers* of a single type, treating `Z` as its own
`int32_t` type. So my hacky Python script converted `ZUi` into two
types, an `int32_t` and an `unsigned int`. This produced a very wrong
prototype. But the tests caught this nicely and I fixed it manually
rather than trying to improve the Python script as it occurred in
exactly one place I could find.
This should provide direct benefits of allowing future refactorings to
more directly leverage TableGen to express builtins more structurally
rather than textually. It will also make my efforts to move builtins to
string tables significantly more effective for the NVPTX backend where
the X-macro approach resulted in *significantly* less efficient string
tables than other targets due to the long repeated feature strings.
Reimplement Neon FP8 vector types using attribute `neon_vector_type`
instead of having them as builtin types.
This allows to implement FP8 Neon intrinsics without the need to add
special cases for these types when using `__builtin_shufflevector`
or bitcast (using C-style cast operator) between vectors, both
extensively used in the generated code in `arm_neon.h`.
These cannot be detected by reading the ID_AA64ISAR1_EL1 register since
their corresponding bitfields are hidden. Additionally the instructions
that these features enable are unusable from EL0.
ACLE: https://github.com/ARM-software/acle/pull/382
Re-write the sema and codegen for the atomic_test_and_set and
atomic_clear builtin functions to go via AtomicExpr, like the other
atomic builtins do. This simplifies the code, because AtomicExpr already
handles things like generating code for to dynamically select the memory
ordering, which was duplicated for these builtins. This also fixes a few
crash bugs, one when passing an integer to the pointer argument, and one
when using an array.
This also adds diagnostics for the memory orderings which are not valid
for atomic_clear according to
https://gcc.gnu.org/onlinedocs/gcc/_005f_005fatomic-Builtins.html, which
were missing before.
Fixes https://github.com/llvm/llvm-project/issues/111293.
This is a re-land of #120449, modified to allow any non-const pointer
type for the first argument.
The FEAT_SPEv1p2 feature (known to LLVM as FeatureSPE_EEF and +spe-eef)
was incorrectly marked as a required feature of Armv8.7-A (and later),
which is incorrect because it is optional, and some CPUs do not
implement it. This moves it to the default features list, so that it is
still enabled by -march=armv8.7-a, but can be configured individually
for each processor.
For Cortex-A520 and Cortex-A520AE, I've checked that these do not have any of
the FEAT_SPE* features, so updated the tests accordingly. All other
Arm-designed v8.7A+ and v9.2A+ CPUs should continue to have it enabled. For
Ampere1B and Fujitsu Monaka, these CPUs do not have the feature, so I've
removed it from their tests. For Apple M4, I haven't found any reference for
whether that CPU should have this feature, so I've added it to the CPU
definition to avoid this being a functional change.