(This is a re-do of #138972, which had a minor warning in `Clang.cpp`.)
This PR adds some of the support needed for Windows hot-patching.
Windows implements a form of hot-patching. This allows patches to be
applied to Windows apps, drivers, and the kernel, without rebooting or
restarting any of these components. Hot-patching is a complex technology
and requires coordination between the OS, compilers, linkers, and
additional tools.
This PR adds support to Clang and LLVM for part of the hot-patching
process. It enables LLVM to generate the required code changes and to
generate CodeView symbols which identify hot-patched functions. The PR
provides new command-line arguments to Clang which allow developers to
identify the list of functions that need to be hot-patched. This PR also
allows LLVM to directly receive the list of functions to be modified, so
that language front-ends which have not yet been modified (such as Rust)
can still make use of hot-patching.
This PR:
* Adds a `MarkedForWindowsHotPatching` LLVM function attribute. This
attribute indicates that a function should be _hot-patched_. This
generates a new CodeView symbol, `S_HOTPATCHFUNC`, which identifies any
function that has been hot-patched. This attribute also causes accesses
to global variables to be indirected through a `_ref_*` global variable.
This allows hot-patched functions to access the correct version of a
global variable; the hot-patched code needs to access the variable in
the _original_ image, not the patch image.
* Adds a `AllowDirectAccessInHotPatchFunction` LLVM attribute. This
attribute may be placed on global variable declarations. It indicates
that the variable may be safely accessed without the `_ref_*`
indirection.
* Adds two Clang command-line parameters: `-fms-hotpatch-functions-file`
and `-fms-hotpatch-functions-list`. The `-file` flag may point to a text
file, which contains a list of functions to be hot-patched (one function
name per line). The `-list` flag simply directly identifies functions to
be patched, using a comma-separated list. These two command-line
parameters may also be combined; the final set of functions to be
hot-patched is the union of the two sets.
* Adds similar LLVM command-line parameters:
`--ms-hotpatch-functions-file` and `--ms-hotpatch-functions-list`.
* Adds integration tests for both LLVM and Clang.
* Adds support for dumping the new `S_HOTPATCHFUNC` CodeView symbol.
Although the flags are redundant between Clang and LLVM, this allows
additional languages (such as Rust) to take advantage of hot-patching
support before they have been modified to generate the required
attributes.
Credit to @dpaoliello, who wrote the original form of this patch.
This PR adds some of the support needed for Windows hot-patching.
Windows implements a form of hot-patching. This allows patches to be
applied to Windows apps, drivers, and the kernel, without rebooting or
restarting any of these components. Hot-patching is a complex technology
and requires coordination between the OS, compilers, linkers, and
additional tools.
This PR adds support to Clang and LLVM for part of the hot-patching
process. It enables LLVM to generate the required code changes and to
generate CodeView symbols which identify hot-patched functions. The PR
provides new command-line arguments to Clang which allow developers to
identify the list of functions that need to be hot-patched. This PR also
allows LLVM to directly receive the list of functions to be modified, so
that language front-ends which have not yet been modified (such as Rust)
can still make use of hot-patching.
This PR:
* Adds a `MarkedForWindowsHotPatching` LLVM function attribute. This
attribute indicates that a function should be _hot-patched_. This
generates a new CodeView symbol, `S_HOTPATCHFUNC`, which identifies any
function that has been hot-patched. This attribute also causes accesses
to global variables to be indirected through a `_ref_*` global variable.
This allows hot-patched functions to access the correct version of a
global variable; the hot-patched code needs to access the variable in
the _original_ image, not the patch image.
* Adds a `AllowDirectAccessInHotPatchFunction` LLVM attribute. This
attribute may be placed on global variable declarations. It indicates
that the variable may be safely accessed without the `_ref_*`
indirection.
* Adds two Clang command-line parameters: `-fms-hotpatch-functions-file`
and `-fms-hotpatch-functions-list`. The `-file` flag may point to a text
file, which contains a list of functions to be hot-patched (one function
name per line). The `-list` flag simply directly identifies functions to
be patched, using a comma-separated list. These two command-line
parameters may also be combined; the final set of functions to be
hot-patched is the union of the two sets.
* Adds similar LLVM command-line parameters:
`--ms-hotpatch-functions-file` and `--ms-hotpatch-functions-list`.
* Adds integration tests for both LLVM and Clang.
* Adds support for dumping the new `S_HOTPATCHFUNC` CodeView symbol.
Although the flags are redundant between Clang and LLVM, this allows
additional languages (such as Rust) to take advantage of hot-patching
support before they have been modified to generate the required
attributes.
Credit to @dpaoliello, who wrote the original form of this patch.
This extends https://github.com/llvm/llvm-project/pull/138577 to more UBSan checks, by changing SanitizerDebugLocation (formerly SanitizerScope) to add annotations if enabled for the specified ordinals.
Annotations will use the ordinal name if there is exactly one ordinal specified in the SanitizerDebugLocation; otherwise, it will use the handler name.
Updates the tests from https://github.com/llvm/llvm-project/pull/141814.
---------
Co-authored-by: Vitaly Buka <vitalybuka@google.com>
We have multiple different attributes in clang representing device
kernels for specific targets/languages. Refactor them into one attribute
with different spellings to make it more easily scalable for new
languages/targets.
---------
Signed-off-by: Sarnie, Nick <nick.sarnie@intel.com>
This patch is part of a stack that teaches Clang to generate Key Instructions
metadata for C and C++.
When returning a value, stores to the `retval` allocas and branches to `return`
block are put in the same atom group. They are both rank 1, which could in
theory introduce an extra step in some optimized code. This low risk currently
feels an acceptable for keeping the code a bit simpler (as opposed to adding
scaffolding to make the store rank 2).
In the case of a single return (no control flow) the return instruction inherits
the atom group of the branch to the return block when the blocks get folded
togather.
RFC:
https://discourse.llvm.org/t/rfc-improving-is-stmt-placement-for-better-interactive-debugging/82668
The feature is only functional in LLVM if LLVM is built with CMake flag
LLVM_EXPERIMENTAL_KEY_INSTRUCTIONs. Eventually that flag will be removed.
The InstrProf headers are very expensive. Avoid including them in all of
CodeGen/ by moving the CodeGenPGO member behind a unqiue_ptr.
This reduces clang build time by 0.8%.
Move the initialization of ptrauth-* function attributes near the
initialization of branch protection attributes. The semantics of these
groups of attributes partially overlaps, so handle both groups in
getDefaultFunctionAttributes() and setTargetAttributes() functions to
prevent getting them out of sync. This fixes C++ TLS wrappers.
For i1 vectors, we used an i8 fixed vector as the storage type.
If the known minimum number of elements of the scalable vector type is
less than 8, we were doing the cast through memory. This used a load or
store from a fixed vector alloca. If is less than 8, DataLayout
indicates that the load/store reads/writes vscale bytes even if vscale
is known and vscale*X is less than or equal to 8. This means the load or
store is outside the bounds of the fixed size alloca as far as
DataLayout is concerned leading to undefined behavior.
This patch avoids this by widening the i1 scalable vector type with zero
elements until it is divisible by 8. This allows it be bitcasted to/from
an i8 scalable vector. We then insert or extract the i8 fixed vector
into this type.
Hopefully this enables #130973 to be accepted.
Adds support for MSVC's undocumented `/funcoverride` flag, which marks
functions as being replaceable by the Windows kernel loader. This is
used to allow functions to be upgraded depending on the capabilities of
the current processor (e.g., the kernel can be built with the naive
implementation of a function, but that function can be replaced at boot
with one that uses SIMD instructions if the processor supports them).
For each marked function we need to generate:
* An undefined symbol named `<name>_$fo$`.
* A defined symbol `<name>_$fo_default$` that points to the `.data`
section (anywhere in the data section, it is assumed to be zero sized).
* An `/ALTERNATENAME` linker directive that points from `<name>_$fo$` to
`<name>_$fo_default$`.
This is used by the MSVC linker to generate the appropriate metadata in
the Dynamic Value Relocation Table.
Marked function must never be inlined (otherwise those inline sites
can't be replaced).
Note that I've chosen to implement this in AsmPrinter as there was no
way to create a `GlobalVariable` for `<name>_$fo$` that would result in
a symbol being emitted (as nothing consumes it and it has no
initializer). I tried to have `llvm.used` and `llvm.compiler.used` point
to it, but this didn't help.
Within LLVM I referred to this feature as "loader replaceable" as
"function override" already has a different meaning to C++ developers...
I also took the opportunity to extract the feature symbol generation
code used by both AArch64 and X86 into a common function in AsmPrinter.
It isn't used and is redundant with the result pointer type argument.
A more reasonable API would only have LangAS parameters, or IR parameters,
not both. Not all values have a meaningful value for this. I'm also
not sure why we have this at all, it's not overridden by any targets and
further simplification is possible.
Do not assume it's the alloca address space, we have an explicit
address space to use for the argument already. Also use the original
value's type instead of assuming DefaultAS.
This essentially reverts 1bf1a156d673.
OpenCL's handling of address spaces has always been a mess, but it's
better than it used to be so this hack appears to be unnecessary now.
None of the code here should really depend on the language or language
address space. The ABI address space to use is already explicit in the
ABIArgInfo, so use that instead of guessing it has anything to do with
LangAS::Default or getASTAllocaAddressSpace.
The below usage of LangAS::Default and getASTAllocaAddressSpace are also
suspect, but appears to be a more involved and separate fix.
Most callers want a constant index. Instead of making every caller
create a ConstantInt, we can do it in IRBuilder. This is similar to
createInsertElement/createExtractElement.
Fixes the issue reported following the merge of #118026.
When a valid `musttail` call is made, the function it is made from must
return immediately after the call; if there are any cleanups left in the
function, then an error is triggered. This is not necessary for fake
uses however - it is perfectly valid to simply emit the fake use
"cleanup" code before the tail call, and indeed LLVM will automatically
move any fake uses following a tail call to come before the tail call.
Therefore, this patch specifically choose to handle fake use cleanups
when a musttail call is present by simply emitting them immediately
before the call.
When records contain fields with pointer authentication, even simple
copies can require
additional work be performed. This patch contains the core functionality
required to
handle user defined structs, as well as the implicitly constructed
structs for blocks, etc.
Co-authored-by: Ahmed Bougacha
Co-authored-by: Akira Hatanaka
Co-authored-by: John Mccall
A function declared with the `sycl_kernel_entry_point` attribute,
sometimes called a SYCL kernel entry point function, specifies a pattern
from which the parameters and body of an offload entry point function,
sometimes called a SYCL kernel caller function, are derived.
SYCL kernel caller functions are emitted during SYCL device compilation.
Their parameters and body are derived from the `SYCLKernelCallStmt`
statement and `OutlinedFunctionDecl` declaration associated with their
corresponding SYCL kernel entry point function. A distinct SYCL kernel
caller function is generated for each SYCL kernel entry point function
defined as a non-inline function or ODR-used in the translation unit.
The name of each SYCL kernel caller function is parameterized by the
SYCL kernel name type specified by the `sycl_kernel_entry_point`
attribute attached to the corresponding SYCL kernel entry point
function. For the moment, the Itanium ABI mangled name for typeinfo data
(`_ZTS<type>`) is used to name these functions; a future change will
switch to a more appropriate naming scheme.
The calling convention used for a SYCL kernel caller function is target
dependent. Support for AMDGCN, NVPTX, and SPIR targets is currently
provided. These functions are required to observe the language
restrictions for SYCL devices as specified by the SYCL 2020
specification; this includes a forward progress guarantee and prohibits
recursion.
Only SYCL kernel caller functions, functions declared as
`SYCL_EXTERNAL`, and functions directly or indirectly referenced from
those functions should be emitted during device compilation. Pruning of
other declarations has not yet been implemented.
---------
Co-authored-by: Elizabeth Andrews <elizabeth.andrews@intel.com>
CGCall.cpp declares several functions with a return type that is an
explicitly spelled out specialization of `SmallVector`. Previously,
`auto` was used in several places to avoid repeating the long type name;
a use that Clang maintainers find unjustified. This change introduces
type aliases and replaces the existing uses of `auto` with the
corresponding alias name.
This feature is currently not supported in the compiler.
To facilitate this we emit a stub version of each kernel
function body with different name mangling scheme, and
replaces the respective kernel call-sites appropriately.
Fixes https://github.com/llvm/llvm-project/issues/60313
D120566 was an earlier attempt made to upstream a solution
for this issue.
---------
Co-authored-by: anikelal <anikelal@amd.com>
This is an expensive header, only include it where needed. Move some
functions out of line to achieve that.
This reduces time to build clang by ~0.5% in terms of instructions
retired.
Certain functions require the `returns_twice` attribute in order to
produce correct codegen. However, `-fno-builtin` removes all knowledge
of functions that require this attribute, so this PR modifies Clang to
add the `returns_twice` attribute even if `-fno-builtin` is set. This
behavior is also consistent with what GCC does.
It's not (easily) possible to get the builtin information from
`Builtins.td` because `-fno-builtin` causes Clang to never initialize
any builtins, so functions never get tokenized as functions/builtins
that require `returns_twice`. Therefore, the most straightforward
solution is to explicitly hard code the function names that require
`returns_twice`.
Fixes#122840
This patch corrects the typo 'dereferencable' to 'dereferenceable' in
CGCall.cpp.
The typo is located within a comment inside the `void
CodeGenModule::ConstructAttributeList` function.
This patch adds a function attribute `riscv_vls_cc` for RISCV VLS
calling
convention which takes 0 or 1 argument, the argument is the `ABI_VLEN`
which is the `VLEN` for passing the fixed-vector arguments, it wraps the
argument as a scalable vector(VLA) using the `ABI_VLEN` and uses the
corresponding mechanism to handle it. The range of `ABI_VLEN` is [32,
65536],
if not specified, the default value is 128.
Here is an example of VLS argument passing:
Non-VLS call:
```
void original_call(__attribute__((vector_size(16))) int arg) {}
=>
define void @original_call(i128 noundef %arg) {
entry:
...
ret void
}
```
VLS call:
```
void __attribute__((riscv_vls_cc(256))) vls_call(__attribute__((vector_size(16))) int arg) {}
=>
define riscv_vls_cc void @vls_call(<vscale x 1 x i32> %arg) {
entry:
...
ret void
}
}
```
The first Non-VLS call passes generic vector argument of 16 bytes by
flattened integer.
On the contrary, the VLS call uses `ABI_VLEN=256` which wraps the
vector to <vscale x 1 x i32> where the number of scalable vector
elements
is calaulated by: `ORIG_ELTS * RVV_BITS_PER_BLOCK / ABI_VLEN`.
Note: ORIG_ELTS = Vector Size / Type Size = 128 / 32 = 4.
PsABI PR: https://github.com/riscv-non-isa/riscv-elf-psabi-doc/pull/418
C-API PR: https://github.com/riscv-non-isa/riscv-c-api-doc/pull/68
This addresses two issues introduced by moving indirect args into an
explicit AS (please see
<https://github.com/llvm/llvm-project/pull/114062#issuecomment-2659829790>
and
<https://github.com/llvm/llvm-project/pull/114062#issuecomment-2661158477>):
1. Unconditionally stripping casts from a pre-allocated return slot was
incorrect / insufficient (this is illustrated by the
`amdgcn_sret_ctor.cpp` test);
2. Putting compiler manufactured sret args in a non default AS can lead
to a C-cast (surprisingly) requiring an AS cast (this is illustrated by
the `sret_cast_with_nonzero_alloca_as.cpp test).
The way we handle (2), by subverting CK_BitCast emission iff a sret arg
is involved, is quite naff, but I couldn't think of any other way to use
a non default indirect AS and make this case work (hopefully this is a
failure of imagination on my part).
`sret` arguments are always going to reside in the stack/`alloca`
address space, which makes the current formulation where their AS is
derived from the pointee somewhat quaint. This patch ensures that `sret`
ends up pointing to the `alloca` AS in IR function signatures, and also
guards agains trying to pass a casted `alloca`d pointer to a `sret` arg,
which can happen for most languages, when compiled for targets that have
a non-zero `alloca` AS (e.g. AMDGCN) / map `LangAS::default` to a
non-zero value (SPIR-V). A target could still choose to do something
different here, by e.g. overriding `classifyReturnType` behaviour.
In a broader sense, this patch extends non-aliased indirect args to also
carry an AS, which leads to changing the `getIndirect()` interface. At
the moment we're only using this for (indirect) returns, but it allows
for future handling of indirect args themselves. We default to using the
AllocaAS as that matches what Clang is currently doing, however if, in
the future, a target would opt for e.g. placing indirect returns in some
other storage, with another AS, this will require revisiting.
---------
Co-authored-by: Matt Arsenault <arsenm2@gmail.com>
Co-authored-by: Matt Arsenault <Matthew.Arsenault@amd.com>
This PR removes the old `nocapture` attribute, replacing it with the new
`captures` attribute introduced in #116990. This change is
intended to be essentially NFC, replacing existing uses of `nocapture`
with `captures(none)` without adding any new analysis capabilities.
Making use of non-`none` values is left for a followup.
Some notes:
* `nocapture` will be upgraded to `captures(none)` by the bitcode
reader.
* `nocapture` will also be upgraded by the textual IR reader. This is to
make it easier to use old IR files and somewhat reduce the test churn in
this PR.
* Helper APIs like `doesNotCapture()` will check for `captures(none)`.
* MLIR import will convert `captures(none)` into an `llvm.nocapture`
attribute. The representation in the LLVM IR dialect should be updated
separately.
Following the previous patch which adds the "extend lifetimes" flag
without (almost) any functionality, this patch adds the real feature by
allowing Clang to emit fake uses. These are emitted as a new form of cleanup,
set for variable addresses, which just emits a fake use intrinsic when the
variable falls out of scope. The code for achieving this is simple, with most
of the logic centered on determining whether to emit a fake use for a given
address, and on ensuring that fake uses are ignored in a few cases.
Co-authored-by: Stephen Tozer <stephen.tozer@sony.com>
Note that PointerUnion::dyn_cast has been soft deprecated in
PointerUnion.h:
// FIXME: Replace the uses of is(), get() and dyn_cast() with
// isa<T>, cast<T> and the llvm::dyn_cast<T>
Literal migration would result in dyn_cast_if_present (see the
definition of PointerUnion::dyn_cast), but this patch uses dyn_cast
because we expect Prototype.P to be nonnull.
The `Checked` parameter of `CodeGenFunction::EmitCheck` is of type
`ArrayRef<std::pair<llvm::Value *, SanitizerMask>>`, which is overly
generalized: SanitizerMask can denote that zero or more sanitizers are
enabled, but `EmitCheck` requires that exactly one sanitizer is
specified in the SanitizerMask (e.g.,
`SanitizeTrap.has(Checked[i].second)` enforces that).
This patch replaces SanitizerMask with SanitizerOrdinal in the `Checked`
parameter of `EmitCheck` and code that transitively relies on it. This
should not affect the behavior of UBSan, but it has the advantages that:
- the code is clearer: it avoids ambiguity in EmitCheck about what to do
if multiple bits are set
- specifying the wrong number of sanitizers in `Checked[i].second` will
be detected as a compile-time error, rather than a runtime assertion
failure
Suggested by Vitaly in https://github.com/llvm/llvm-project/pull/122392
as an alternative to adding an explicit runtime assertion that the
SanitizerMask contains exactly one sanitizer.
This reverts commit 81fc3add1e627c23b7270fe2739cdacc09063e54.
This breaks some LLDB tests, e.g.
SymbolFile/DWARF/x86/no_unique_address-with-bitfields.cpp:
lldb: ../llvm-project/clang/lib/AST/Decl.cpp:4604: unsigned int clang::FieldDecl::getBitWidthValue() const: Assertion `isa<ConstantExpr>(getBitWidth())' failed.
Save the bitwidth value as a `ConstantExpr` with the value set. Remove
the `ASTContext` parameter from `getBitWidthValue()`, so the latter
simply returns the value from the `ConstantExpr` instead of
constant-evaluating the bitwidth expression every time it is called.
Closes#119360.
This bug occurs when passing a VLA to `va_arg`. Since the return value
is inferred to be an array, it triggers
`ScalarExprEmitter::VisitCastExpr`, which converts it to a pointer and
subsequently calls `CodeGenFunction::EmitAggExpr`. At this point,
because the inferred type is an `AggExpr` instead of a `ScalarExpr`,
`ScalarExprEmitter::VisitVAArgExpr` is not invoked, and as a result,
`CodeGenFunction::EmitVariablyModifiedType` is also not called, leading
to the size of the VLA not being retrieved.
The solution is to move the call to
`CodeGenFunction::EmitVariablyModifiedType` into
`CodeGenFunction::EmitVAArg`, ensuring that the size of the VLA is
correctly obtained regardless of whether the expression is an `AggExpr`
or a `ScalarExpr`.
- Use `poison` instead of `undef` as a phi operand for an unreachable path (the predecessor
will not go the BB that uses the value of the phi).
- Call `@llvm.vector.insert` with a `poison` subvec when performing a
`bitcast` from a fixed vector to a scalable vector.
Note that PointerUnion::{is,get} have been soft deprecated in
PointerUnion.h:
// FIXME: Replace the uses of is(), get() and dyn_cast() with
// isa<T>, cast<T> and the llvm::dyn_cast<T>
I'm not touching PointerUnion::dyn_cast for now because it's a bit
complicated; we could blindly migrate it to dyn_cast_if_present, but
we should probably use dyn_cast when the operand is known to be
non-null.
Get inout/out parameters working for HLSL Arrays.
Utilizes the fix from #109323, and corrects the assignment behavior
slightly to allow for Non-LValues on the RHS.
Closes#106917
---------
Co-authored-by: Chris B <beanz@abolishcrlf.org>