Summary:
Clang has support for boolean vectors, these builtins expose the LLVM
instruction of the same name. This differs from a manual load and select
by potentially suppressing traps from deactivated lanes.
Fixes: https://github.com/llvm/llvm-project/issues/107753
Summary:
It's extremely common to conditionally blend two vectors. Previously
this was done with mask registers, which is what the normal ternary code
generation does when used on a vector. However, since Clang 15 we have
supported boolean vector types in the compiler. These are useful in
general for checking the mask registers, but are currently limited
because they do not map to an LLVM-IR select instruction.
This patch simply relaxes these checks, which are technically forbidden
by
the OpenCL standard. However, general vector support should be able to
handle these. We already support this for Arm SVE types, so this should
be make more consistent with the clang vector type.
These builtins are modeled on the clzg/ctzg builtins, which accept an
optional second argument. This second argument is returned if the first
argument is 0. These builtins unconditionally exhibit zero-is-undef
behaviour, regardless of target preference for the other ctz/clz
builtins. The builtins have constexpr support.
Fixes#154113
This patch handle struct of fixed vector and struct of array of fixed
vector correctly for VLS calling convention in EmitFunctionProlog,
EmitFunctionEpilog and EmitCall.
stack on: https://github.com/llvm/llvm-project/pull/147173
Adds support for accessing individual resources from fixed-size global resource arrays.
Design proposal:
https://github.com/llvm/wg-hlsl/blob/main/proposals/0028-resource-arrays.md
Enables indexing into globally scoped, fixed-size resource arrays to retrieve individual resources. The initialization logic is primarily handled during codegen. When a global resource array is indexed, the
codegen translates the `ArraySubscriptExpr` AST node into a constructor call for the corresponding resource record type and binding.
To support this behavior, Sema needs to ensure that:
- The constructor for the specific resource type is instantiated.
- An implicit binding attribute is added to resource arrays that lack explicit bindings (#152452).
Closes#145424
This option is confusingly named. What it actually controls is whether,
under the default of `-ffloat16-excess-precision=standard`, it is
beneficial for performance to perform calculations on float (without
intermediate rounding) or not. For `-ffloat16-excess-precision=none` the
LLVM `half` type will always be used, and all backends are expected to
legalize it correctly.
The 'cfi_salt' attribute specifies a string literal that is used as a
"salt" for Control-Flow Integrity (CFI) checks to distinguish between
functions with the same type signature. This attribute can be applied
to function declarations, function definitions, and function pointer
typedefs.
This attribute prevents function pointers from being replaced with
pointers to functions that have a compatible type, which can be a CFI
bypass vector.
The attribute affects type compatibility during compilation and CFI
hash generation during code generation.
Attribute syntax: [[clang::cfi_salt("<salt_string>")]]
GNU-style syntax: __attribute__((cfi_salt("<salt_string>")))
- The attribute takes a single string of non-NULL ASCII characters.
- It only applies to function types; using it on a non-function type
will generate an error.
- All function declarations and the function definition must include
the attribute and use identical salt values.
Example usage:
// Header file:
#define __cfi_salt(S) __attribute__((cfi_salt(S)))
// Convenient typedefs to avoid nested declarator syntax.
typedef int (*fp_unsalted_t)(void);
typedef int (*fp_salted_t)(void) __cfi_salt("pepper");
struct widget_ops {
fp_unsalted_t init; // Regular CFI.
fp_salted_t exec; // Salted CFI.
fp_unsalted_t teardown; // Regular CFI.
};
// bar.c file:
static int bar_init(void) { ... }
static int bar_salted_exec(void) __cfi_salt("pepper") { ... }
static int bar_teardown(void) { ... }
static struct widget_generator _generator = {
.init = bar_init,
.exec = bar_salted_exec,
.teardown = bar_teardown,
};
struct widget_generator *widget_gen = _generator;
// 2nd .c file:
int generate_a_widget(void) {
int ret;
// Called with non-salted CFI.
ret = widget_gen.init();
if (ret)
return ret;
// Called with salted CFI.
ret = widget_gen.exec();
if (ret)
return ret;
// Called with non-salted CFI.
return widget_gen.teardown();
}
Link: https://github.com/ClangBuiltLinux/linux/issues/1736
Link: https://github.com/KSPP/linux/issues/365
---------
Signed-off-by: Bill Wendling <morbo@google.com>
Co-authored-by: Aaron Ballman <aaron@aaronballman.com>
Introduces the use of pointer authentication to protect the invocation,
copy and dispose, reference, and descriptor pointers in Objective-C
block objects.
Resolves#141176
The cleanup structs expect that pointers and (u)int64_t have the same
alignment requirements, which isn't true on sparc32, which causes
SIGBUSes.
See also: https://github.com/llvm/llvm-project/issues/66620
Replaces the XOP/AVX512 per-element rotation/funnel shift builtins with the generic __builtin_elementwise_fshl/fshr
We still have uniform immediate variants to handle next.
Part of #153152
The following intrinsics were replaced by a combination of
`__builtin_shufflevector` and `__builtin_convertvector`:
- `__builtin_ia32_vcvtph2ps`
- `__builtin_ia32_vcvtph2ps256`
Fixes#152749
This patch handles the strided update in the `#pragma omp target update
from(data[a🅱️c])` directive where 'c' represents the strided access
leading to non-contiguous update in the `data` array when the offloaded
execution returns the control back to host from device using the `from`
clause.
Issue: Clang CodeGen where info is generated for the particular
`MapType` (to, from, etc), it was failing to detect the strided access.
Because of this, the `MapType` bits were incorrect when passed to
runtime. This led to incorrect execution (contiguous) in the
libomptarget runtime code.
Added a minimal testcase that verifies the working of the patch.
This is a major change on how we represent nested name qualifications in
the AST.
* The nested name specifier itself and how it's stored is changed. The
prefixes for types are handled within the type hierarchy, which makes
canonicalization for them super cheap, no memory allocation required.
Also translating a type into nested name specifier form becomes a no-op.
An identifier is stored as a DependentNameType. The nested name
specifier gains a lightweight handle class, to be used instead of
passing around pointers, which is similar to what is implemented for
TemplateName. There is still one free bit available, and this handle can
be used within a PointerUnion and PointerIntPair, which should keep
bit-packing aficionados happy.
* The ElaboratedType node is removed, all type nodes in which it could
previously apply to can now store the elaborated keyword and name
qualifier, tail allocating when present.
* TagTypes can now point to the exact declaration found when producing
these, as opposed to the previous situation of there only existing one
TagType per entity. This increases the amount of type sugar retained,
and can have several applications, for example in tracking module
ownership, and other tools which care about source file origins, such as
IWYU. These TagTypes are lazily allocated, in order to limit the
increase in AST size.
This patch offers a great performance benefit.
It greatly improves compilation time for
[stdexec](https://github.com/NVIDIA/stdexec). For one datapoint, for
`test_on2.cpp` in that project, which is the slowest compiling test,
this patch improves `-c` compilation time by about 7.2%, with the
`-fsyntax-only` improvement being at ~12%.
This has great results on compile-time-tracker as well:

This patch also further enables other optimziations in the future, and
will reduce the performance impact of template specialization resugaring
when that lands.
It has some other miscelaneous drive-by fixes.
About the review: Yes the patch is huge, sorry about that. Part of the
reason is that I started by the nested name specifier part, before the
ElaboratedType part, but that had a huge performance downside, as
ElaboratedType is a big performance hog. I didn't have the steam to go
back and change the patch after the fact.
There is also a lot of internal API changes, and it made sense to remove
ElaboratedType in one go, versus removing it from one type at a time, as
that would present much more churn to the users. Also, the nested name
specifier having a different API avoids missing changes related to how
prefixes work now, which could make existing code compile but not work.
How to review: The important changes are all in
`clang/include/clang/AST` and `clang/lib/AST`, with also important
changes in `clang/lib/Sema/TreeTransform.h`.
The rest and bulk of the changes are mostly consequences of the changes
in API.
PS: TagType::getDecl is renamed to `getOriginalDecl` in this patch, just
for easier to rebasing. I plan to rename it back after this lands.
Fixes#136624
Fixes https://github.com/llvm/llvm-project/issues/43179
Fixes https://github.com/llvm/llvm-project/issues/68670
Fixes https://github.com/llvm/llvm-project/issues/92757
The following intrinsics were replaced by `__builtin_elementwise_fma`:
- `__builtin_ia32_vfmaddps(256)`
- `__builtin_ia32_vfmaddpd(256)`
- `__builtin_ia32_vfmaddph(256)`
- `__builtin_ia32_vfmaddbf16(128 | 256 | 512)`
All the aforementioned `__builtin_ia32_vfmadd*` intrinsics are
equivalent to a `__builtin_elementwise_fma`, so keeping them is an
unnecessary indirection.
Fixes [#152461](https://github.com/llvm/llvm-project/issues/152461)
---------
Co-authored-by: Simon Pilgrim <llvm-dev@redking.me.uk>
Now that #149310 has restricted lifetime intrinsics to only work on
allocas, we can also drop the explicit size argument. Instead, the size
is implied by the alloca.
This removes the ability to only mark a prefix of an alloca alive/dead.
We never used that capability, so we should remove the need to handle
that possibility everywhere (though many key places, including stack
coloring, did not actually respect this).
Adds the `isHLSLResourceRecordArray()` method to the `Type` class. This method returns `true` if the `Type` represents an array of HLSL resource records. Defining this method on `Type` makes it accessible from both sema and codegen.
I fixed support for varargs functions
(previously it didn't crash but the codegen was incorrect).
I added tests for structs and unions which already work. With the
multivalue abi they crash in the backend, so I added a sema check that
rejects structs and unions for that abi.
It will also crash in the backend if passed an int128 or float128 type.
+ Changed type_mismatch minimal handler to accept and print pointer.
This will allow to distinguish null pointer use, misallignment and
incorrect object size.
The change increases binary size by ~1% and has almost no performance
impact.
Fixes#149943
This corrects the codegen for the final class optimization to
correct handle the case where there is no path to perform the
cast, and also corrects the codegen to handle ptrauth protected
vtable pointers.
As part of this fix we separate out the path computation as
that makes it easier to reason about the failure code paths
and more importantly means we can know what the type of the
this object is during the cast.
The allows us to use the GetVTablePointer interface which
correctly performs the authentication operations required when
pointer authentication is enabled. This still leaves incorrect
authentication behavior in the multiple inheritance case but
currently the optimization is disabled entirely if pointer
authentication is enabled.
Fixes#137518
Forward SourceLocation to `EmitCall` so that clang triggers an error
when a function inside `[[gnu::cleanup(func)]]` is annotated with
`[[gnu::error("some message")]]`.
resolves#146520
The vk::binding attribute allows users to explicitly set the set and
binding for a resource in SPIR-V without chaning the "register"
attribute, which will be used when targeting DXIL.
Fixes https://github.com/llvm/llvm-project/issues/136894
The codegen for the final class dynamic_cast optimization fails to
consider pointer authentication. This change resolves this be simply
disabling the optimization when pointer authentication enabled.
On COFF platform, d1b0cbff806b50d399826e79b9a53e4726c21302 generates a
debug info linked with VTable regardless definition is present or not.
If that VTable ends up implicitly dllimported from another DLL, ld.bfd
produces a runtime pseudo relocation for it (LLD doesn't, since
d17db6066d2524856fab493dd894f8396e896bc7). If the debug section is
stripped, the runtime pseudo relocation points to memory space outside
of the module, causing an access violation.
At this moment, we simply disable VTable debug info on COFF platform to
avoid this problem.
Writing something like this:
__builtin_dynamic_object_size((0, p->array), 0)
is equivalent to writing this:
__builtin_dynamic_object_size(p->array, 0)
though the former will give a warning about the first value being
unused.
Consider this declaration:
`int foo();`
This function is described in LLVM with `clang::FunctionNoProtoType`
class. ([See
description](https://clang.llvm.org/doxygen/classclang_1_1FunctionNoProtoType.html))
Judging by [this
comment](a1bf0d1394/clang/lib/CodeGen/CGCall.cpp (L159C11-L159C12))
all such functions are treated like functions with variadic number of
parameters.
When we want to [emit debug
info](0a8ddd3965/clang/lib/CodeGen/CGDebugInfo.cpp (L4808))
we have to know function that we calling.
In method
[getCalledFunction()](0a8ddd3965/llvm/include/llvm/IR/InstrTypes.h (L1348))
we compare two types of function:
1. Function that we deduce from calling operand, and
2. Function that we store locally
If they differ we get `nullptr` and can't emit appropriate debug info.
The only thing they differ is: lhs function is variadic, but rhs
function isn't
Reason of this difference is that under RISC-V there is no overridden
function that tells us about treating functions with no parameters.
[Default
function](0a8ddd3965/clang/lib/CodeGen/TargetInfo.cpp (L87))
always return `false`.
This patch overrides this function for RISC-V
CWD is queried in clang driver and passed to clang cc1 via flags when
needed. Respect the cc1 flags and do not repeated checking current
working directory in CodeGen.
This switches to `makeIntrusiveRefCnt<FileSystem>` where creating a new
object, and to passing/returning by `IntrusiveRefCntPtr<FileSystem>`
instead of `FileSystem*` or `FileSystem&`, when dealing with existing
objects.
Part of cleanup #151026.
The -gline-tables-only option emits minimal debug info for functions,
files and line numbers while omitting variables, parameters and most
type information. VTable debug symbols are emitted to facilitate a
debugger's ability to perform automatic type promotion on variables and
parameters. With variables and parameters being omitted, the VTable
symbols are unnecessary.
This is handled by the instcombine added in #147930; there is no need
for any clang-specific folding. NFC as all clang tests for
`__arm_in_streaming_mode()` used -O1, which applies the LLVM
instcombines.
Preprocessing the preprocessor output again interacts poorly with some
flag combinations when we perform a separate preprocessing stage. In our
case, `-no-integrated-cpp -dD` triggered this issue; but I guess that
other flags could also trigger problems (`-save-temps` instead of
`-no-integrated-cpp`).
Full context (which is quite weird I'll admit):
* To cache OpenCL kernel compilation results, we use the
`-no-integrated-cpp` for the driver to generate a separate preprocessing
command (`clang -E`) before the rest of the compilation.
* Some OpenCL C language features are implemented as macro definitions
(in `opencl-c-base.h`). The semantic analysis queries the preprocessor
to check if these are defined or not, for example, when we checks if a
builtin is available when using `-fdeclare-opencl-builtins`.
* To preserve these `#define` directives, on the preprocessor's output,
we use `-dD`. However, other `#define` directives are also maintained
besides OpenCL ones; which triggers the issue shown in this PR.
A better fix for our particular case could have been to move the
language features implemented as macros into some sort of a flag to be
used together with `-fdeclare-opencl-builtins`.
But I also thought that not preprocessing preprocessor outputs seemed
like something desirable. I hope to work on this on a follow up.
This patch adds a human readable trap category and message to UBSan
traps. The category and message are encoded in a fake frame in the debug
info where the function is a fake inline function where the name encodes
the trap category and message. This is the same mechanism used by
Clang’s `__builtin_verbose_trap()`.
This change allows consumers of binaries built with trapping UBSan to
more easily identify the reason for trapping. In particular LLDB already
has a frame recognizer that recognizes the fake function names emitted
in debug info by this patch. A patch testing this behavior in LLDB will
be added in a separately.
The human readable trap messages are based on the messages currently
emitted by the userspace runtime for UBSan in compiler-rt. Note the
wording is not identical because the userspace UBSan runtime has access
to dynamic information that is not available during Clang’s codegen.
Test cases for each UBSan trap kind are included.
This complements the [`-fsanitize-annotate-debug-info`
feature](https://github.com/llvm/llvm-project/pull/141997). While
`-fsanitize-annotate-debug-info` attempts to annotate all UBSan-added
instructions, this feature (`-fsanitize-debug-trap-reasons`) only
annotates the final trap instruction using SanitizerHandler information.
This work is part of a GSoc 2025 project.
Tests if the runtime type of the function pointer matches the static
type. If this returns false, calling the function pointer will trap.
Uses `@llvm.wasm.ref.test.func` added in #147486.
Also adds a "gc" wasm feature to gate the use of the ref.test
instruction.
Fixes#148063 by preventing the ByVal attribute from being placed on out
and inout function parameters which causes them to be eliminated by the
Dead Store Elimination (DSE) pass.