This caused link errors when building with sancov. See comment on the PR.
> Whereas it is UB in terms of the standard to delete an array of objects
> via pointer whose static type doesn't match its dynamic type, MSVC
> supports an extension allowing to do it.
> Aside from array deletion not working correctly in the mentioned case,
> currently not having this extension implemented causes clang to generate
> code that is not compatible with the code generated by MSVC, because
> clang always puts scalar deleting destructor to the vftable. This PR
> aims to resolve these problems.
>
> Fixes https://github.com/llvm/llvm-project/issues/19772
This reverts commit d6942d54f677000cf713d2b0eba57b641452beb4.
This patch makes the assertion (that is currently in a comment) that
validates that names mangled by clang can be demangled by LLVM actually
compile/work. There were some minor issues that needed to be fixed (like
starts_with not being available on std::string and needing to call
getDecl() on GD), and a logic issue that should be fixed in this patch.
This enables just uncommenting the assertion to enable it within the
compiler (minus needing to add the header file).
I initially implemented codegen to be a 'no-op' for these declarations,
which I thought was properly implemented. However, when they are a
top-level decl, we have a separate switch. This patch makes sure they
are properly emitted at top-level as a no-op, and adds a test for both
top-level and not top-level.
The resource wrapper should have internal linkage because it contains a
handle to the global resource, and it not the actual global.
Makeing this changed exposed that we were zeroinitializing the resouce,
which is a problem. The handle cannot be zeroinitialized. This is
changed to use poison instead.
Fixes https://github.com/llvm/llvm-project/issues/122767.
---------
Co-authored-by: Helena Kotas <hekotas@microsoft.com>
Whereas it is UB in terms of the standard to delete an array of objects
via pointer whose static type doesn't match its dynamic type, MSVC
supports an extension allowing to do it.
Aside from array deletion not working correctly in the mentioned case,
currently not having this extension implemented causes clang to generate
code that is not compatible with the code generated by MSVC, because
clang always puts scalar deleting destructor to the vftable. This PR
aims to resolve these problems.
Fixes https://github.com/llvm/llvm-project/issues/19772
Add option and statement attribute for controlling emitting of
target-specific
metadata to atomicrmw instructions in IR.
The RFC for this attribute and option is
https://discourse.llvm.org/t/rfc-add-clang-atomic-control-options-and-pragmas/80641,
Originally a pragma was proposed, then it was changed to clang
attribute.
This attribute allows users to specify one, two, or all three options
and must be applied
to a compound statement. The attribute can also be nested, with inner
attributes
overriding the options specified by outer attributes or the target's
default
options. These options will then determine the target-specific metadata
added to atomic
instructions in the IR.
In addition to the attribute, three new compiler options are introduced:
`-f[no-]atomic-remote-memory`, `-f[no-]atomic-fine-grained-memory`,
`-f[no-]atomic-ignore-denormal-mode`.
These compiler options allow users to override the default options
through the
Clang driver and front end. `-m[no-]unsafe-fp-atomics` is aliased to
`-f[no-]ignore-denormal-mode`.
In terms of implementation, the atomic attribute is represented in the
AST by the
existing AttributedStmt, with minimal changes to AST and Sema.
During code generation in Clang, the CodeGenModule maintains the current
atomic options,
which are used to emit the relevant metadata for atomic instructions.
RAII is used
to manage the saving and restoring of atomic options when entering
and exiting nested AttributedStmt.
All variable declarations in the global scope that are not resources,
static or empty are implicitly added to implicit constant buffer
`$Globals`. They are created in `hlsl_constant` address space and
collected in an implicit `HLSLBufferDecl` node that is added to the AST
at the end of the translation unit. Codegen is the same as for explicit
constant buffers.
Fixes#123801
This is a second attempt to implement this feature. The first attempt
had to be reverted because of memory leaks. The problem was adding a
`SmallVector` member on `HLSLBufferDecl` node to represent a list of
default buffer declarations. When this vector needed to grow, it
allocated memory that was never released, because all memory used by AST
nodes must be allocated by `ASTContext` allocator and is released all at
once. Destructors on AST nodes are never called.
It this change the list of default buffer declarations is collected in a
`SmallVector` instance on `SemaHLSL`. The `HLSLBufDecl` representing
`$Globals` is created at the end of the translation unit when the number
of declarations is known, and the list is copied into an array allocated
by the `ASTContext` allocator.
All variable declarations in the global scope that are not resources,
static or empty are implicitly added to implicit constant buffer
`$Globals`. They are created in `hlsl_constant` address space and
collected in an implicit `HLSLBufferDecl` node that is added to the AST
at the end of the translation unit. Codegen is the same as for explicit
constant buffers.
Fixes#123801
As Intel is working to add support for SPIR-V OpenMP device offloading
in upstream clang/liboffload, we need to modify the OpenMP frontend to
allow SPIR-V as well as generate valid IR for SPIR-V. For example, we
need the frontend to generate code to define and interact with device
globals used in the DeviceRTL.
This is the beginning of what I expect will be (many) other changes, but
let's get started with something simple.
---------
Signed-off-by: Sarnie, Nick <nick.sarnie@intel.com>
Kernel Control Flow Integrity (kCFI) is a feature that hardens indirect
calls by comparing a 32-bit hash of the function pointer's type against
a hash of the target function's type. If the hashes do not match, the
kernel may panic (or log the hash check failure, depending on the
kernel's configuration). These hashes are computed at compile time by
applying the xxHash64 algorithm to each mangled canonical function (or
function pointer) type, then truncating the result to 32 bits. This hash
is written into each indirect-callable function header by encoding it as
the 32-bit immediate operand to a `MOVri` instruction, e.g.:
```
__cfi_foo:
nop
nop
nop
nop
nop
nop
nop
nop
nop
nop
nop
movl $199571451, %eax # hash of foo's type = 0xBE537FB
foo:
...
```
This PR extends x86-based kCFI with a 3-bit arity indicator encoded in
the `MOVri` instruction's register (reg) field as follows:
| Arity Indicator | Description | Encoding in reg field |
| --------------- | --------------- | --------------- |
| 0 | 0 parameters | EAX |
| 1 | 1 parameter in RDI | ECX |
| 2 | 2 parameters in RDI and RSI | EDX |
| 3 | 3 parameters in RDI, RSI, and RDX | EBX |
| 4 | 4 parameters in RDI, RSI, RDX, and RCX | ESP |
| 5 | 5 parameters in RDI, RSI, RDX, RCX, and R8 | EBP |
| 6 | 6 parameters in RDI, RSI, RDX, RCX, R8, and R9 | ESI |
| 7 | At least one parameter may be passed on the stack | EDI |
For example, if `foo` takes 3 register arguments and no stack arguments
then the `MOVri` instruction in its kCFI header would instead be written
as:
```
movl $199571451, %ebx # hash of foo's type = 0xBE537FB
```
This PR will benefit other CFI approaches that build on kCFI, such as
FineIBT. For example, this proposed enhancement to FineIBT must be able
to infer (at kernel init time) which registers are live at an indirect
call target: https://lkml.org/lkml/2024/9/27/982. If the arity bits are
available in the kCFI function header, then this information is trivial
to infer.
Note that there is another existing PR proposal that includes the 3-bit
arity within the existing 32-bit immediate field, which introduces
different security properties:
https://github.com/llvm/llvm-project/pull/117121.
This both reapplies #118734, the initial attempt at this, and updates it
significantly.
First, it uses the newly added `StringTable` abstraction for string
tables, and simplifies the construction to build the string table and
info arrays separately. This should reduce any `constexpr` compile time
memory or CPU cost of the original PR while significantly improving the
APIs throughout.
It also restructures the builtins to support sharding across several
independent tables. This accomplishes two improvements from the
original PR:
1) It improves the APIs used significantly.
2) When builtins are defined from different sources (like SVE vs MVE in
AArch64), this allows each of them to build their own string table
independently rather than having to merge the string tables and info
structures.
3) It allows each shard to factor out a common prefix, often cutting the
size of the strings needed for the builtins by a factor two.
The second point is important both to allow different mechanisms of
construction (for example a `.def` file and a tablegen'ed `.inc` file,
or different tablegen'ed `.inc files), it also simply reduces the sizes
of these tables which is valuable given how large they are in some
cases. The third builds on that size reduction.
Initially, we use this new sharding rather than merging tables in
AArch64, LoongArch, RISCV, and X86. Mostly this helps ensure the system
works, as without further changes these still push scaling limits.
Subsequent commits will more deeply leverage the new structure,
including using the prefix capabilities which cannot be easily factored
out here and requires deep changes to the targets.
Per the ELF spec, section groups may only contain local symbols if those
symbols are only
referenced from within the section group. [1] In the case of template
parameter objects,
they can be referenced from outside the group when the type of the
object was declared
in an anonymous namespace. In that case, we can't place the object in a
COMDAT. This matches
GCC's linkage behavior on the test input.
[1]:
https://www.sco.com/developers/gabi/latest/ch4.sheader.html#section_groups
This is an implementation of P1061 Structure Bindings Introduce a Pack
without the ability to use packs outside of templates. There is a couple
of ways the AST could have been sliced so let me know what you think.
The only part of this change that I am unsure of is the
serialization/deserialization stuff. I followed the implementation of
other Exprs, but I do not really know how it is tested. Thank you for
your time considering this.
---------
Co-authored-by: Yanzuo Liu <zwuis@outlook.com>
* We want the default version to have this attribute too otherwise it
becomes indistinguishable from non-versioned functions.
* We don't need the '+' unlike target-features which can negate. This
will allow using the parsing API of target_version/clones for the
metadata too.
Currently, the more features a version has, the higher its priority is.
We are changing ACLE https://github.com/ARM-software/acle/pull/370 as
follows:
"Among any two versions, the higher priority version is determined by
identifying the highest priority feature that is specified in exactly
one of the versions, and selecting that version."
We need to be able to propagate information about FMV attribute strings
from C/C++ source to LLVM IR. This is necessary so that we can
distinguish which target-features are coming from the cmdline, which are
coming from the target attribute, and which are coming from feature
dependency expansion. We need this for static resolution of calls in
LLVM. Here's a motivating example:
Suppose you have target_version("i8mm+dotprod") and
target_version("fcma"). The first version clearly has higher priority.
Now suppose you specify -march=armv8-a+i8mm on the command line. Then
the versions would have target-features "+i8mm,+dotprod" and
"+i8mm,+fcma" respectively. If you are using those to deduce version
priority, then you would incorrectly deduce that the second version was
higher priority than the first.
Currently we need at least one more version other than the default to
trigger FMV. However we would like a header file declaration
__attribute__((target_version("default"))) void f(void);
to guarantee that there will be f.default
Re-apply #113148 after revert in #119331
If function pointer signing is enabled, sign personality function
pointer stored in `.DW.ref.__gxx_personality_v0` section with IA key,
0x7EAD = `ptrauth_string_discriminator("personality")` constant
discriminator and address diversity enabled.
If function pointer signing is enabled, sign personality function
pointer stored in `.DW.ref.__gxx_personality_v0` section with IA key,
0x7EAD = `ptrauth_string_discriminator("personality")` constant
discriminator and address diversity enabled.
Currently we have code with target hooks in CodeGenModule shared between
X86 and AArch64 for sorting MultiVersionResolverOptions. Those are used
when generating IFunc resolvers for FMV. The RISCV target has different
criteria for sorting, therefore it repeats sorting after calling
CodeGenFunction::EmitMultiVersionResolver.
I am moving the FMV priority logic in TargetInfo, so that it can be
implemented by the TargetParser which then makes it possible to query it
from llvm. Here is an example why this is handy:
https://github.com/llvm/llvm-project/pull/87939
When dealing with cpu_specific GlobalDecl,
GetOrCreateMultiVersionResolver should immediately return the already
created llvm function if it exists.
Fixes https://github.com/llvm/llvm-project/issues/115299.
Add `-fptrauth-elf-got` clang cc1 flag and set `ptrauth_elf_got`
preprocessor feature and `PointerAuthELFGOT` LangOption correspondingly.
No additional checks like ensuring OS binary format is ELF are
performed: it should be done on clang driver level when a pauth-enabled
environment implying signed GOT enabled is requested.
If the cc1 flag is passed, "ptrauth-elf-got" IR module flag is set.
Reproducer:
```
//--- a.cppm
export module a;
int func();
static int a = func();
//--- a.cpp
import a;
```
The `func()` should only execute once. However, before this patch we
will somehow import `static int a` from a.cppm incorrectly and
initialize that again.
This is super bad and can introduce serious runtime behaviors.
And also surprisingly, it looks like the root cause of the problem is
simply some oversight choosing APIs.
Fixes: #113187
Avoid to create init function since clang does not support global
variable with flexible array init.
It will cause assertion failure later.
Adds `@_init_resource_bindings()` function to module initialization that
includes `handle.fromBinding` intrinsic calls for simple resource
declarations. Arrays of resources or resources inside user defined types
are not supported yet.
While this unblocks our progress on [Compile a runnable shader from
clang](https://github.com/llvm/wg-hlsl/issues/7) milestone, this is
probably not the way we would like to handle resource binding
initialization going forward. Ideally, it should be done via the
resource class constructors in order to support dynamic resource binding
or unbounded arrays if resources.
Depends on PRs #110327 and #111203.
Part 1 of #105076
When the arch in the triple in "spirv", the default target codegen is
currently used. We should be using the spir-v target codegen. This will
be used to have SPIR-V specific lowering of the HLSL types.
Done by calling clang::runWithSufficientStackSpace().
Added CodeGenModule::runWithSufficientStackSpace() method similar to the
one in Sema to provide a single warning when this triggers
Fixes: #111699
Gentoo is planning to introduce a `*t64` suffix for triples that will be
used by 32-bit platforms that use 64-bit `time_t`. Add support for
parsing and accepting these triples, and while at it make clang
automatically enable the necessary glibc feature macros when this suffix
is used.
An open question is whether we can backport this to LLVM 19.x. After
all, adding new triplets to Triple sounds like an ABI change — though I
suppose we can minimize the risk of breaking something if we move new
enum values to the very end.
Rename the function to reflect its correct behavior and to be consistent
with `Module::getOrInsertFunction`. This is also in preparation of
adding a new `Intrinsic::getDeclaration` that will have behavior similar
to `Module::getFunction` (i.e, just lookup, no creation).
Currently, `__constant__` variables do not get unconditionally marked as
`constant` in IR, which seems a bit odd given their definition. This is
generally inconsequential for NVPTX/AMDGPU, since said variables get
emitted in the constant address space for those BEs. However, it is
potentially significant for e.g. HIP-on-SPIR-V cases, as SPIR-V does not
allow casts to/from the constant AS (`UniformConstant`), which forces
`__constant__` variables to be emitted in the global AS, thus making IR
constness meaningful.
This patch enables the following command line flags for RISC-V targets:
+ `-fcf-protection=branch` turns on forward-edge control-flow integrity conditioning
+ `-mcf-branch-label-scheme=unlabeled|func-sig` selects the label scheme used in the forward-edge CFI conditioning
This reverts commit 4a63f4d301c0e044073e1b1f8f110015ec1778a1.
It was reverted because of a buildbot breakage, but the fix-forward has
landed (https://github.com/llvm/llvm-project/pull/109023).
HLSL inlines all its functions by default. This uses the alwaysinline
attribute to make the alwaysinliner pass inline any function not
explicitly marked noinline by the user or autogeneration. The
alwayslinline marking takes place in `SetLLVMFunctionAttributesForDefinitions`
where all other inlining interactions are determined.
The outermost entry function is marked noinline because there's no
reason to inline it. Any user calls to an entry function will instead call
the internal mangled version of the entry function.
Adds tests for function and constructor inlining and augments some
existing tests to verify correct inlining of implicitly created
functions as well.
Incidentally restore RUN line that I believe was mistakenly removed as
part of #88918Fixes#89282