As discussed in https://github.com/llvm/llvm-project/issues/139128, this
PR moves =sanitize handling from `ASTContext::isTypeIgnoredBySanitizer`
to `NoSanitizeList::containsType`.
Before this PR: "=sanitize" has priority regardless of the order
After this PR: If multiple entries match the source, than the latest
entry takes the precedence.
This adds support under LoongArch for the target("..") attributes.
The supported formats are:
- "arch=<arch>" strings, that specify the architecture features for a
function as per the -march=arch option.
- "tune=<cpu>" strings, that specify the tune-cpu cpu for a function as
per -mtune.
- "<feature>", "no-<feature>" enabled/disables the specific feature.
See: https://github.com/llvm/llvm-project/issues/139128 and
https://github.com/llvm/llvm-project/pull/140529 for the background.
The introduction of these new tests (ubsan-src-ignorelist-category.test)
`-fsanitize-ignorelist=%t/src.ignorelist
-fsanitize-ignorelist=%t/src.ignorelist.contradict9` in this PR will not
lead to failures in the previous implementation (without this PR). This
is because the existing logic distinguishes between Sections in
different ignorelists, even if their names are identical. The order of
these Sections is preserved using a `vector`.
Background: https://github.com/llvm/llvm-project/issues/139128
It is a draft implementation for "src:*=sanitize". It should be applied
to all sanitizers.
Any srcs assigned to the sanitize category will have their sanitizer
instrumentation remained ignored by "src:". For example,
```
src:*
src:*/test1.cc=sanitize
```
`test1.cc` will still have the UBSan instrumented.
Conflicting entries are resolved by the latest entry, which takes
precedence.
```
src:*
src:*/mylib/*=sanitize
src:*/mylib/test.cc
```
`test.cc` does not have the UBSan check (In this case,
`src:*/mylib/test.cc` overrides `src:*/mylib/*=sanitize` for `test.cc`).
```
src:*
src:*/mylib/test.cc
src:*/mylib/*=sanitize
```
`test1.cc` has the UBSan instrumented (In this case,
`src:*/mylib/*=sanitize` overrides `src:*/mylib/test.cc`).
Documents update will be in a new PR.
RISC-V does not use address spaces and leaves them available for user
code to make use of. Intrinsics, however, required pointer types to use
the default address space, complicating handling during lowering to
handle non-default address spaces. When the intrinsics are overloaded,
this is handled without extra effort.
This commit does not yet update Clang builtin functions to also permit
pointers to non-default address spaces.
Note: This relands #140615 adding a ".count" suffix to the non-".all"
variants.
Our current intrinsic support for barrier intrinsics is confusing and
incomplete, with multiple intrinsics mapping to the same instruction and
intrinsic names not clearly conveying intrinsic semantics. Further, we
lack support for some variants. This change unifies the IR
representation to a single consistently named set of intrinsics.
- llvm.nvvm.barrier.cta.sync.aligned.all(i32)
- llvm.nvvm.barrier.cta.sync.aligned.count(i32, i32)
- llvm.nvvm.barrier.cta.arrive.aligned.count(i32, i32)
- llvm.nvvm.barrier.cta.sync.all(i32)
- llvm.nvvm.barrier.cta.sync.count(i32, i32)
- llvm.nvvm.barrier.cta.arrive.count(i32, i32)
The following Auto-Upgrade rules are used to maintain compatibility with
IR using the legacy intrinsics:
* llvm.nvvm.barrier0 --> llvm.nvvm.barrier.cta.sync.aligned.all(0)
* llvm.nvvm.barrier.n --> llvm.nvvm.barrier.cta.sync.aligned.all(x)
* llvm.nvvm.bar.sync --> llvm.nvvm.barrier.cta.sync.aligned.all(x)
* llvm.nvvm.barrier --> llvm.nvvm.barrier.cta.sync.aligned.count(x, y)
* llvm.nvvm.barrier.sync --> llvm.nvvm.barrier.cta.sync.all(x)
* llvm.nvvm.barrier.sync.cnt --> llvm.nvvm.barrier.cta.sync.count(x, y)
Our current intrinsic support for barrier intrinsics is confusing and
incomplete, with multiple intrinsics mapping to the same instruction and
intrinsic names not clearly conveying intrinsic semantics. Further, we
lack support for some variants. This change unifies the IR
representation to a single consistently named set of intrinsics.
- llvm.nvvm.barrier.cta.sync.aligned.all(i32)
- llvm.nvvm.barrier.cta.sync.aligned(i32, i32)
- llvm.nvvm.barrier.cta.arrive.aligned(i32, i32)
- llvm.nvvm.barrier.cta.sync.all(i32)
- llvm.nvvm.barrier.cta.sync(i32, i32)
- llvm.nvvm.barrier.cta.arrive(i32, i32)
The following Auto-Upgrade rules are used to maintain compatibility with
IR using the legacy intrinsics:
* llvm.nvvm.barrier0 --> llvm.nvvm.barrier.cta.sync.aligned.all(0)
* llvm.nvvm.barrier.n --> llvm.nvvm.barrier.cta.sync.aligned.all(x)
* llvm.nvvm.bar.sync --> llvm.nvvm.barrier.cta.sync.aligned.all(x)
* llvm.nvvm.barrier --> llvm.nvvm.barrier.cta.sync.aligned(x, y)
* llvm.nvvm.barrier.sync --> llvm.nvvm.barrier.cta.sync.all(x)
* llvm.nvvm.barrier.sync.cnt --> llvm.nvvm.barrier.cta.sync(x, y)
Of the 128-bits of buffer descriptor only 48 bits are address bits, so
following the discussion on https://discourse.llvm.org/t/clarifiying-the-semantics-of-ptrtoint/83987/54,
the logic conclusion is to set the index width to 48 bits instead of
the current value of 128.
Most of the test changes are mechanical datalayout updates, but there
is one actual change: the ptrmask test now uses .i48 instead of .i128
and I had to update SelectionDAGBuilder to correctly extend the mask.
Reviewed By: krzysz00
Pull Request: https://github.com/llvm/llvm-project/pull/139419
…__builtin_scalbn
Clang generates library calls for __builtin_* functions which can be a
problem for GPUs that cannot handle them. This patch generates call to
device implementation for __builtin_logb and ldexp intrinsic for
__builtin_scalbn.
The "target-features" function attribute is not currently considered
when adding vscale_range to a function. When +sve/+sme are pushed onto
functions with "#pragma attribute push(+sve/+sme)", the function
potentially misses out on optimizations that rely on vscale_range being
present.
This patch adds fp8 variants to existing intrinsics, whose operation
doesn't depend on arguments being a specific type.
It also changes mfloat8 type representation in memory from `i8` to
`<1xi8>`
Thread-local globals live, by default, in the default globals address
space, which may not be 0, so we need to overload @llvm.thread.pointer
to support other address spaces, and use the default globals address
space in Clang.
For i1 vectors, we used an i8 fixed vector as the storage type.
If the known minimum number of elements of the scalable vector type is
less than 8, we were doing the cast through memory. This used a load or
store from a fixed vector alloca. If is less than 8, DataLayout
indicates that the load/store reads/writes vscale bytes even if vscale
is known and vscale*X is less than or equal to 8. This means the load or
store is outside the bounds of the fixed size alloca as far as
DataLayout is concerned leading to undefined behavior.
This patch avoids this by widening the i1 scalable vector type with zero
elements until it is divisible by 8. This allows it be bitcasted to/from
an i8 scalable vector. We then insert or extract the i8 fixed vector
into this type.
Hopefully this enables #130973 to be accepted.
This change adds intrinsics and clang builtins for the cvt instruction
variants of type (FP4) `.e2m1x2`. introduced in PTX 8.6 for `sm_100a`,
`sm_101a`, and `sm_120a`.
Tests are added in `NVPTX/convert-sm100a.ll` and
`clang/test/CodeGen/builtins-nvptx.c` and verified through ptxas 12.8.0.
PTX Spec Reference:
https://docs.nvidia.com/cuda/parallel-thread-execution/#data-movement-and-conversion-instructions-cvt
I also fixed __builtin_wasm_ref_null_extern() to generate a diagnostic
when it gets an argument. It seems like `SemaRef.checkArgCount()` has a
bug that makes it unable to check for 0 args.
The 'counted_by' attribute is now available for pointers in structs.
It generates code for sanity checks as well as
__builtin_dynamic_object_size()
calculations. For example:
struct annotated_ptr {
int count;
char *buf __attribute__((counted_by(count)));
};
If the pointer's type is 'void *', use the 'sized_by' attribute, which
works similarly to 'counted_by', but can handle the 'void' base type:
struct annotated_ptr {
int count;
void *buf __attribute__((sized_by(count)));
};
If the 'count' field member occurs after the pointer, use the
'-fexperimental-late-parse-attributes' flag during compilation.
Note that 'counted_by' cannot be applied to a pointer to an incomplete
type, because the size isn't known.
struct foo;
struct annotated_ptr {
int count;
struct foo *buf __attribute__((counted_by(count))); /* invalid */
};
Signed-off-by: Bill Wendling <morbo@google.com>
Unused types are retained in the debug info when
-fno-eliminate-unused-debug-types is specified.
However, unused nested enums were not being emitted even with this
option.
This patch fixes the missing emission of unused nested enums with
-fno-eliminate-unused-debug-types
Similarly to #135016, refactor getPTrue to return splat (1) for
all-active patterns. The main motivation for this is to improve
code gen for fixed-length vector loads/stores that are converted to SVE
masked memory ops when the vectors are wider than Neon. Emitting the
mask as a splat helps DAGCombiner simplify all-active masked
loads/stores into unmaked ones, for which it already has suitable
combines and ISel has suitable patterns.
Adds support for MSVC's undocumented `/funcoverride` flag, which marks
functions as being replaceable by the Windows kernel loader. This is
used to allow functions to be upgraded depending on the capabilities of
the current processor (e.g., the kernel can be built with the naive
implementation of a function, but that function can be replaced at boot
with one that uses SIMD instructions if the processor supports them).
For each marked function we need to generate:
* An undefined symbol named `<name>_$fo$`.
* A defined symbol `<name>_$fo_default$` that points to the `.data`
section (anywhere in the data section, it is assumed to be zero sized).
* An `/ALTERNATENAME` linker directive that points from `<name>_$fo$` to
`<name>_$fo_default$`.
This is used by the MSVC linker to generate the appropriate metadata in
the Dynamic Value Relocation Table.
Marked function must never be inlined (otherwise those inline sites
can't be replaced).
Note that I've chosen to implement this in AsmPrinter as there was no
way to create a `GlobalVariable` for `<name>_$fo$` that would result in
a symbol being emitted (as nothing consumes it and it has no
initializer). I tried to have `llvm.used` and `llvm.compiler.used` point
to it, but this didn't help.
Within LLVM I referred to this feature as "loader replaceable" as
"function override" already has a different meaning to C++ developers...
I also took the opportunity to extract the feature symbol generation
code used by both AArch64 and X86 into a common function in AsmPrinter.
Allows the __ptrauth qualifier to be applied to pointer sized integer types,
updates Sema to ensure trivially copyable, etc correctly handle address
discriminated integers, and updates codegen to perform authentication
around arithmetic on the types.
Adds support for emitting Windows x64 Unwind V2 information, includes
support `/d2epilogunwind` in clang-cl.
Unwind v2 adds information about the epilogs in functions such that the
unwinder can unwind even in the middle of an epilog, without having to
disassembly the function to see what has or has not been cleaned up.
Unwind v2 requires that all epilogs are in "canonical" form:
* If there was a stack allocation (fixed or dynamic) in the prolog, then
the first instruction in the epilog must be a stack deallocation.
* Next, for each `PUSH` in the prolog there must be a corresponding
`POP` instruction in exact reverse order.
* Finally, the epilog must end with the terminator.
This change adds a pass to validate epilogs in modules that have Unwind
v2 enabled and, if they pass, emits new pseudo instructions to MC that
1) note that the function is using unwind v2 and 2) mark the start of
the epilog (this is either the first `POP` if there is one, otherwise
the terminator instruction). If a function does not meet these
requirements, it is downgraded to Unwind v1 (i.e., these new pseudo
instructions are not emitted).
Note that the unwind v2 table only marks the size of the epilog in the
"header" unwind code, but it's possible for epilogs to use different
terminator instructions thus they are not all the same size. As a work
around for this, MC will assume that all terminator instructions are
1-byte long - this still works correctly with the Windows unwinder as it
is only using the size to do a range check to see if a thread is in an
epilog or not, and since the instruction pointer will never be in the
middle of an instruction and the terminator is always at the end of an
epilog the range check will function correctly. This does mean, however,
that the "at end" optimization (where an epilog unwind code can be
elided if the last epilog is at the end of the function) can only be
used if the terminator is 1-byte long.
One other complication with the implementation is that the unwind table
for a function is emitted during streaming, however we can't calculate
the distance between an epilog and the end of the function at that time
as layout hasn't been completed yet (thus some instructions may be
relaxed). To work around this, epilog unwind codes are emitted via a
fixup. This also means that we can't pre-emptively downgrade a function
to Unwind v1 if one of these offsets is too large, so instead we raise
an error (but I've passed through the location information, so the user
will know which of their functions is problematic).
Add a new instrumentation section type `[sample-coldcov]` to
support`-fprofile-list` for sample pgo based cold function coverage.
Note that the current cold function coverage is based on sampling PGO
pipeline, which is incompatible with the existing [llvm] option(see
[PGOOptions](https://github.com/llvm/llvm-project/blob/main/llvm/include/llvm/Support/PGOOptions.h#L27-L43)),
so we can't reuse the IR-PGO(-fprofile-instrument=llvm) flag.
In DWARF info RISC-V Vector types are presented as DW_TAG_array_type
with tags DW_AT_type (what elements does this array consist of) and
DW_TAG_subrange_type. DW_TAG_subrange_type have DW_AT_upper_bound tag
which contain upper bound value for this array.
For now, it's generate same DWARF info about length of segmented types
and their corresponding non-tuple types.
For example, vint32m4x2_t and vint32m4_t have DW_TAG_array_type with
same DW_AT_type and DW_TAG_subrange_type, it means that this types have
same length, which is not correct
(vint32m4x2_t length is twice as big as vint32m4_t)
@fmayer introduced '-mllvm -array-bounds-pseudofn'
(https://github.com/llvm/llvm-project/pull/128977/) to make it easier to
see why crashes occurred, and to estimate with a profiler the cycles
spent on these array-bounds checks. This functionality could be usefully
generalized to other checks in future work.
This patch adds the plumbing for -fsanitize-annotate-debug-info, and
connects it to the existing array-bounds-pseudo-fn functionality i.e.,
-fsanitize-annotate-debug-info=array-bounds can be used as a replacement
for '-mllvm -array-bounds-pseudofn', though we do not yet delete the
latter.
Note: we replaced '-mllvm -array-bounds-pseudofn' in
clang/test/CodeGen/bounds-checking-debuginfo.c, because adding test
cases would modify the line numbers in the test assertions, and
therefore obscure that the test output is the same between '-mllvm
-array-bounds-pseudofn' and -fsanitize-annotate-debug-info=array-bounds.
Allow `-f[no]-sanitize-address-use-after-scope` to take effect under
kernel-address sanitizer (`-fsanitize=kernel-address`). `use-after-scope` is
now enabled by default under kernel-address sanitizer.
Previously, users may have enabled `use-after-scope` checks for kernel-address
sanitizer via `-mllvm -asan-use-after-scope=true`. While this may have worked
for optimization levels > O0, the required lifetime intrinsics to allow for
`use-after-scope` detection were not emitted under O0. This commit ensures
the required lifetime intrinsics are emitted under O0 with kernel-address
sanitizer.