We have multiple different attributes in clang representing device
kernels for specific targets/languages. Refactor them into one attribute
with different spellings to make it more easily scalable for new
languages/targets.
---------
Signed-off-by: Sarnie, Nick <nick.sarnie@intel.com>
This patch is part of a stack that teaches Clang to generate Key Instructions
metadata for C and C++.
When returning a value, stores to the `retval` allocas and branches to `return`
block are put in the same atom group. They are both rank 1, which could in
theory introduce an extra step in some optimized code. This low risk currently
feels an acceptable for keeping the code a bit simpler (as opposed to adding
scaffolding to make the store rank 2).
In the case of a single return (no control flow) the return instruction inherits
the atom group of the branch to the return block when the blocks get folded
togather.
RFC:
https://discourse.llvm.org/t/rfc-improving-is-stmt-placement-for-better-interactive-debugging/82668
The feature is only functional in LLVM if LLVM is built with CMake flag
LLVM_EXPERIMENTAL_KEY_INSTRUCTIONs. Eventually that flag will be removed.
The InstrProf headers are very expensive. Avoid including them in all of
CodeGen/ by moving the CodeGenPGO member behind a unqiue_ptr.
This reduces clang build time by 0.8%.
This patch implements IR-based Profile-Guided Optimization support in
Flang through the following flags:
- `-fprofile-generate` for instrumentation-based profile generation
- `-fprofile-use=<dir>/file` for profile-guided optimization
Resolves#74216 (implements IR PGO support phase)
**Key changes:**
- Frontend flag handling aligned with Clang/GCC semantics
- Instrumentation hooks into LLVM PGO infrastructure
- LIT tests verifying:
- Instrumentation metadata generation
- Profile loading from specified path
- Branch weight attribution (IR checks)
**Tests:**
- Added gcc-flag-compatibility.f90 test module verifying:
- Flag parsing boundary conditions
- IR-level profile annotation consistency
- Profile input path normalization rules
- SPEC2006 benchmark results will be shared in comments
For details on LLVM's PGO framework, refer to [Clang PGO
Documentation](https://clang.llvm.org/docs/UsersManual.html#profile-guided-optimization).
This implementation was developed by [XSCC Compiler
Team](https://github.com/orgs/OpenXiangShan/teams/xscc).
---------
Co-authored-by: ict-ql <168183727+ict-ql@users.noreply.github.com>
Co-authored-by: Tom Eccles <t@freedommail.info>
CGDebugInfo::completeFunction was added previously but mistakenly not
called (dropped through the cracks while putting together the patch
stack). Moved out of #134652 and #134654.
This patch is part of a stack that teaches Clang to generate Key Instructions
metadata for C and C++.
RFC:
https://discourse.llvm.org/t/rfc-improving-is-stmt-placement-for-better-interactive-debugging/82668
The feature is only functional in LLVM if LLVM is built with CMake flag
LLVM_EXPERIMENTAL_KEY_INSTRUCTIONs. Eventually that flag will be removed.
This is a scoped helper similar to ApplyDebugLocation that creates a new source
location atom group which instructions can be added to.
A source atom is a source construct that is "interesting" for debug stepping
purposes. We use an atom group number to track the instruction(s) that implement
the functionality for the atom, plus backup instructions/source locations.
This patch is part of a stack that teaches Clang to generate Key Instructions
metadata for C and C++.
RFC:
https://discourse.llvm.org/t/rfc-improving-is-stmt-placement-for-better-interactive-debugging/82668
The feature is only functional in LLVM if LLVM is built with CMake flag
LLVM_EXPERIMENTAL_KEY_INSTRUCTIONs. Eventually that flag will be removed.
Move the initialization of ptrauth-* function attributes near the
initialization of branch protection attributes. The semantics of these
groups of attributes partially overlaps, so handle both groups in
getDefaultFunctionAttributes() and setTargetAttributes() functions to
prevent getting them out of sync. This fixes C++ TLS wrappers.
The "target-features" function attribute is not currently considered
when adding vscale_range to a function. When +sve/+sme are pushed onto
functions with "#pragma attribute push(+sve/+sme)", the function
potentially misses out on optimizations that rely on vscale_range being
present.
Currently BlockAddresses store both the Function and the BasicBlock they
reference, and the BlockAddress is part of the use list of both the
Function and BasicBlock.
This is quite awkward, because this is not really a use of the function
itself (and walks of function uses generally skip block addresses for
that reason). This also has weird implications on function RAUW (as that
will replace the function in block addresses in a way that generally
doesn't make sense), and causes other peculiar issues, like the ability
to have multiple block addresses for one block (with different
functions).
Instead, I believe it makes more sense to specify only the basic block
and let the function be implied by the BB parent. This does mean that we
may have block addresses without a function (if the BB is not inserted),
but this should only happen during IR construction.
This feature is currently not supported in the compiler.
To facilitate this we emit a stub version of each kernel
function body with different name mangling scheme, and
replaces the respective kernel call-sites appropriately.
Fixes https://github.com/llvm/llvm-project/issues/60313
D120566 was an earlier attempt made to upstream a solution
for this issue.
---------
Co-authored-by: anikelal <anikelal@amd.com>
With -fpatchable-function-entry (or the patchable_function_entry
function attribute), we emit records of patchable entry locations to the
__patchable_function_entries section. Add an additional parameter to the
command line option that allows one to specify a different default
section name for the records, and an identical parameter to the function
attribute that allows one to override the section used.
The main use case for this change is the Linux kernel using prefix NOPs
for ftrace, and thus depending on__patchable_function_entries to locate
traceable functions. Functions that are not traceable currently disable
entry NOPs using the function attribute, but this creates a
compatibility issue with -fsanitize=kcfi, which expects all indirectly
callable functions to have a type hash prefix at the same offset from
the function entry.
Adding a section parameter would allow the kernel to distinguish between
traceable and non-traceable functions by adding entry records to
separate sections while maintaining a stable function prefix layout for
all functions. LKML discussion:
https://lore.kernel.org/lkml/Y1QEzk%2FA41PKLEPe@hirez.programming.kicks-ass.net/
This implements the R2 semantics of P0963.
The R1 semantics, as outlined in the paper, were introduced in Clang 6.
In addition to that, the paper proposes swapping the evaluation order of
condition expressions and the initialization of binding declarations
(i.e. std::tuple-like decompositions).
If we have +sme but not +sve, we would not set vscale_range on
functions. It should be valid to apply it with the same range with just
+sme, which can help mitigate some performance regressions in cases such
as scalable vector bitcasts (https://godbolt.org/z/exhe4jd8d).
Following the previous patch which adds the "extend lifetimes" flag
without (almost) any functionality, this patch adds the real feature by
allowing Clang to emit fake uses. These are emitted as a new form of cleanup,
set for variable addresses, which just emits a fake use intrinsic when the
variable falls out of scope. The code for achieving this is simple, with most
of the logic centered on determining whether to emit a fake use for a given
address, and on ensuring that fake uses are ignored in a few cases.
Co-authored-by: Stephen Tozer <stephen.tozer@sony.com>
- Adding the changes from PRs:
- #116331
- #121852
- Fixes test `tools/dxil-dis/debug-info.ll`
- Address some missed comments in the previous PR
---------
Co-authored-by: joaosaffran <joao.saffran@microsoft.com>
The `Checked` parameter of `CodeGenFunction::EmitCheck` is of type
`ArrayRef<std::pair<llvm::Value *, SanitizerMask>>`, which is overly
generalized: SanitizerMask can denote that zero or more sanitizers are
enabled, but `EmitCheck` requires that exactly one sanitizer is
specified in the SanitizerMask (e.g.,
`SanitizeTrap.has(Checked[i].second)` enforces that).
This patch replaces SanitizerMask with SanitizerOrdinal in the `Checked`
parameter of `EmitCheck` and code that transitively relies on it. This
should not affect the behavior of UBSan, but it has the advantages that:
- the code is clearer: it avoids ambiguity in EmitCheck about what to do
if multiple bits are set
- specifying the wrong number of sanitizers in `Checked[i].second` will
be detected as a compile-time error, rather than a runtime assertion
failure
Suggested by Vitaly in https://github.com/llvm/llvm-project/pull/122392
as an alternative to adding an explicit runtime assertion that the
SanitizerMask contains exactly one sanitizer.
`CounterPair` can hold `<uint32_t, uint32_t>` instead of current
`unsigned`, to hold also the counter number of SkipPath. For now, this
change provides the skeleton and only `CounterPair::Executed` is used.
Each counter number can have `None` to suppress emitting counter
increment. 2nd element `Skipped` is initialized as `None` by default,
since most `Stmt*` don't have a pair of counters.
This change also provides stubs for the verifier. I'll provide the impl
of verifier for `+Asserts` later.
`markStmtAsUsed(bool, Stmt*)` may be used to inform that other side
counter may not emitted.
`markStmtMaybeUsed(S)` may be used for the `Stmt` and its inner will be
excluded for emission in the case of skipping by constant folding. I put
it into places where I found.
`verifyCounterMap()` will check the coverage map and the counter map,
and can be used to report inconsistency.
These verifier methods shall be eliminated in `-Asserts`.
https://discourse.llvm.org/t/rfc-integrating-singlebytecoverage-with-branch-coverage/82492
- adding Flatten and Branch to if stmt.
- adding dxil control flow hint metadata generation
- modifing spirv OpSelectMerge to account for the specific attributes.
Closes#70112
---------
Co-authored-by: Joao Saffran <jderezende@microsoft.com>
Co-authored-by: joaosaffran <joao.saffran@microsoft.com>
The flag is placed together with pointer authentication flags since they
serve the same security purpose of protecting against attacks on control
flow. The flag is not ABI-affecting and might be enabled separately if
needed, but it's also intended to be enabled as part of pauth-enabled
environments (e.g. pauthtest).
See also codegen implementation #97666.
Currently we have code with target hooks in CodeGenModule shared between
X86 and AArch64 for sorting MultiVersionResolverOptions. Those are used
when generating IFunc resolvers for FMV. The RISCV target has different
criteria for sorting, therefore it repeats sorting after calling
CodeGenFunction::EmitMultiVersionResolver.
I am moving the FMV priority logic in TargetInfo, so that it can be
implemented by the TargetParser which then makes it possible to query it
from llvm. Here is an example why this is handy:
https://github.com/llvm/llvm-project/pull/87939
# What
This PR renames the newly-introduced llvm attribute
`sanitize_realtime_unsafe` to `sanitize_realtime_blocking`. Likewise,
sibling variables such as `SanitizeRealtimeUnsafe` are renamed to
`SanitizeRealtimeBlocking` respectively. There are no other functional
changes.
# Why?
- There are a number of problems that can cause a function to be
real-time "unsafe",
- we wish to communicate what problems rtsan detects and *why* they're
unsafe, and
- a generic "unsafe" attribute is, in our opinion, too broad a net -
which may lead to future implementations that need extra contextual
information passed through them in order to communicate meaningful
reasons to users.
- We want to avoid this situation and make the runtime library boundary
API/ABI as simple as possible, and
- we believe that restricting the scope of attributes to names like
`sanitize_realtime_blocking` is an effective means of doing so.
We also feel that the symmetry between `[[clang::blocking]]` and
`sanitize_realtime_blocking` is easier to follow as a developer.
# Concerns
- I'm aware that the LLVM attribute `sanitize_realtime_unsafe` has been
part of the tree for a few weeks now (introduced here:
https://github.com/llvm/llvm-project/pull/106754). Given that it hasn't
been released in version 20 yet, am I correct in considering this to not
be a breaking change?
Translates `RWBuffer` and `StructuredBuffer` resources buffer types to
DirectX target types `dx.TypedBuffer` and `dx.RawBuffer`.
Includes a change of `HLSLAttributesResourceType` from 'sugar' type to
full canonical type. This is required for codegen and other clang
infrastructure to work property on HLSL resource types.
Fixes#95952 (part 2/2)
Rename the function to reflect its correct behavior and to be consistent
with `Module::getOrInsertFunction`. This is also in preparation of
adding a new `Intrinsic::getDeclaration` that will have behavior similar
to `Module::getFunction` (i.e, just lookup, no creation).
This patch enable the function multiversion(FMV) and `target_clones`
attribute for RISC-V target.
The proposal of `target_clones` syntax can be found at the
https://github.com/riscv-non-isa/riscv-c-api-doc/pull/48 (which has
landed), as modified by the proposed
https://github.com/riscv-non-isa/riscv-c-api-doc/pull/85 (which adds the
priority syntax).
It supports the `target_clones` function attribute and function
multiversioning feature for RISC-V target. It will generate the ifunc
resolver function for the function that declared with target_clones
attribute.
The resolver function will check the version support by runtime object
`__riscv_feature_bits`.
For example:
```
__attribute__((target_clones("default", "arch=+ver1", "arch=+ver2"))) int bar() {
return 1;
}
```
the corresponding resolver will be like:
```
bar.resolver() {
__init_riscv_feature_bits();
// Check arch=+ver1
if ((__riscv_feature_bits.features[0] & BITMASK_OF_VERSION1) == BITMASK_OF_VERSION1) {
return bar.arch=+ver1;
} else {
// Check arch=+ver2
if ((__riscv_feature_bits.features[0] & BITMASK_OF_VERSION2) == BITMASK_OF_VERSION2) {
return bar.arch=+ver2;
} else {
// Default
return bar.default;
}
}
}
```
Introducing `HLSLAttributedResourceType` - a new type that is similar to
`AttributedType` but with additional data specific to HLSL resources.
`AttributeType` currently only stores an attribute kind and no
additional data from the type attribute parameters. This does not really
work for HLSL resources since its type attributes contain non-boolean
values that need to be retained as well.
For example:
```
template <typename T> class RWBuffer {
__hlsl_resource_t [[hlsl::resource_class(uav)]] [[hlsl::is_rov]] handle;
};
```
The data `HLSLAttributedResourceType` needs to eventually store are:
- resource class (SRV, UAV, CBuffer, Sampler)
- texture dimension(1-3)
- flags is_rov, is_array, is_feedback and is_multisample
- contained type
All of these values except contained type will be stored in
`HLSLAttributedResourceType::Attributes` struct and accessed
individually via the fields. There is also `Data` alias that covers all
of these values as a `unsigned` which is used for hashing and the AST
type serialization.
During type attribute processing all HLSL type attributes will be
validated and collected by SemaHLSL (by
`SemaHLSL::handleResourceTypeAttr`) and in the end combined into a
single `HLSLAttributedResourceType` instance (in
`SemaHLSL::ProcessResourceTypeAttributes`). `SemaHLSL` will also need to
short-term store the `TypeLoc` information for the new type that will be
grabbed by `TypeSpecLocFiller` soon after the type is created.
Part 1/2 of #104861
Previously, functions named "main" got the NoRecurse attribute
consistent with the behavior of C++, which HLSL largely follows.
However, standard recursion is not allowed in HLSL, so all functions
should really have this attribute. This doesn't prevent recursion, but
rather signals that these functions aren't expected to recurse.
Practically, this was done so that entry point functions named "main"
would have all have the same attributes as otherwise identical entry
points with other names.
This required small changes to the this assignment tests because they no
longer generate so many attribute sets since more of them match.
related to #105244
but done to simplify testing for #89806
Introduce the `-fsanitize=realtime` flag in clang driver
Plug in the RealtimeSanitizer PassManager pass in Codegen, and attribute
a function based on if it has the `[[clang::nonblocking]]` function
effect.
Marks exported functions with `"hlsl.export"` attribute. This
information will be later used by DXILFinalizeLinkage pass (coming soon)
to determine which functions should have internal linkage in the final
DXIL code.
Related to #llvm/llvm-project#92071
Through the new `-foffload-via-llvm` flag, CUDA kernels can now be
lowered to the LLVM/Offload API. On the Clang side, this is simply done
by using the OpenMP offload toolchain and emitting calls to `llvm*`
functions to orchestrate the kernel launch rather than `cuda*`
functions. These `llvm*` functions are implemented on top of the
existing LLVM/Offload API.
As we are about to redefine the Offload API, this wil help us in the
design process as a second offload language.
We do not support any CUDA APIs yet, however, we could:
https://www.osti.gov/servlets/purl/1892137
For proper host execution we need to resurrect/rebase
https://tianshilei.me/wp-content/uploads/2021/12/llpp-2021.pdf
(which was designed for debugging).
```
❯❯❯ cat test.cu
extern "C" {
void *llvm_omp_target_alloc_shared(size_t Size, int DeviceNum);
void llvm_omp_target_free_shared(void *DevicePtr, int DeviceNum);
}
__global__ void square(int *A) { *A = 42; }
int main(int argc, char **argv) {
int DevNo = 0;
int *Ptr = reinterpret_cast<int *>(llvm_omp_target_alloc_shared(4, DevNo));
*Ptr = 7;
printf("Ptr %p, *Ptr %i\n", Ptr, *Ptr);
square<<<1, 1>>>(Ptr);
printf("Ptr %p, *Ptr %i\n", Ptr, *Ptr);
llvm_omp_target_free_shared(Ptr, DevNo);
}
❯❯❯ clang++ test.cu -O3 -o test123 -foffload-via-llvm --offload-arch=native
❯❯❯ llvm-objdump --offloading test123
test123: file format elf64-x86-64
OFFLOADING IMAGE [0]:
kind elf
arch gfx90a
triple amdgcn-amd-amdhsa
producer openmp
❯❯❯ LIBOMPTARGET_INFO=16 ./test123
Ptr 0x155448ac8000, *Ptr 7
Ptr 0x155448ac8000, *Ptr 42
```
This provides -fptrauth-auth-traps, which at the frontend level only
controls the addition of the "ptrauth-auth-traps" function attribute.
The attribute in turn controls various aspects of backend codegen, by
providing the guarantee that every "auth" operation generated will trap
on failure.
This can either be delegated to the hardware (if AArch64 FPAC is known
to be available), in which case this attribute doesn't change codegen.
Otherwise, if FPAC isn't available, this asks the backend to emit
additional instructions to check and trap on auth failure.
We already ended up with -fptrauth-returns, the feature macro, the lang
opt, and the actual backend lowering.
The only part left is threading it all through PointerAuthOptions, to
drive the addition of the "ptrauth-returns" attribute to generated
functions.
While there, do minor cleanup on ptrauth-function-attributes.c.
This also adds ptrauth_key_return_address to ptrauth.h.