#129142 added support for emitting Windows x64 unwind v2 information,
but it was "best effort". If any function didn't follow the requirements
for v2 it was silently downgraded to v1.
There are some parts of Windows (specifically kernel-mode code running
on Xbox) that require v2, hence we need the ability to fail the
compilation if v2 can't be used.
This change also adds a heuristic to check if there might be too many
unwind codes, it's currently conservative (i.e., assumes that certain
prolog instructions will use the maximum number of unwind codes).
Future work: attempting to chain unwind info across multiple tables if
there are too many unwind codes due to epilogs and adding a heuristic to
detect if an epilog will be too far from the end of the function.
Implements
https://github.com/llvm/wg-hlsl/blob/main/proposals/0026-symbol-visibility.md.
The change is to stop using the `hlsl.export` attribute. Instead,
symbols with "program linkage" in HLSL will have export linkage with
default visibility, and symbols with "external linkage" in HLSL will
have export linkage with hidden visibility.
These are identified by misc-include-cleaner. I've filtered out those
that break builds. Also, I'm staying away from llvm-config.h,
config.h, and Compiler.h, which likely cause platform- or
compiler-specific build failures.
MSVC always emits minimal CodeView metadata with compiler information,
even when debug info is otherwise disabled. Other tools may rely on this
metadata being present. For example, linkers use it to determine whether
hotpatching is enabled for the object file.
This PR resubmits the changes from #136098, which was previously
reverted due to a build failure during the linking stage:
```
undefined reference to `llvm::DebugInfoCorrelate'
undefined reference to `llvm::ProfileCorrelate'
```
The root cause was that `llvm/lib/Frontend/Driver/CodeGenOptions.cpp`
references symbols from the `Instrumentation` component, but the
`LINK_COMPONENTS` in the `llvm/lib/Frontend/CMakeLists.txt` for
`LLVMFrontendDriver` did not include it. As a result, linking failed in
configurations where these components were not transitively linked.
### Fix:
This updated patch explicitly adds `Instrumentation` to
`LINK_COMPONENTS` in the relevant `llvm/lib/Frontend/CMakeLists.txt`
file to ensure the required symbols are properly resolved.
---------
Co-authored-by: ict-ql <168183727+ict-ql@users.noreply.github.com>
Co-authored-by: Chyaka <52224511+liliumshade@users.noreply.github.com>
Co-authored-by: Tarun Prabhu <tarunprabhu@gmail.com>
The SPIR-V backend does not have access to the original name of a
resource in the source, so it tries to create a name. This leads to some
problems with reflection.
That is why start to pass the name of the resource from Clang to the
SPIR-V backend.
Fixes#138533
Fixed two issues:
1. assertion with -flto. the linker wrapper action is missing for
wrapping the device binary. Added it for -flto.
2. when there are two HIP files, the kernels in the second file were not
found. This is because the -r option of linker wrapper assumes offload
entries section of HIP to be hip_offloading_entries but it is actually
llvm_offload_entries, causing the offload entries sections not made
unique for different object files. Fixed and tested working for both
-fgpu-rdc and -fno-gpu-rdc case with and without -r
As reported on #135064, the generic pointer coercion code in
CoerceIntOrPtrToIntOrPtr cannot handle address space casts (it tries to bitcast
the pointers). This bails out if an address space qualifier is found on the
pointer.
This commit is using the same mechanism as vk::ext_builtin_input to
implement the SV_Position semantic input.
The HLSL signature is not yet ready for DXIL, hence this commit only
implements the SPIR-V side.
This is incomplete as it doesn't allow the semantic on hull/domain and
other shaders, but it's a first step to validate the overall
input/output
semantic logic.
Fixes https://github.com/llvm/llvm-project/issues/136969
So far, the `DW_AT_linkage_name` of the coroutine `resume`, `destroy`,
`cleanup` and `noalloc` function clones were incorrectly set to the
original function name instead of the updated function names.
With this commit, we now update the `DW_AT_linkage_name` to the correct
name. This has multiple benefits:
1. it's easier for me (and other toolchain developers) to understand the
output of `llvm-dwarf-dump` when coroutines are involved.
2. When hitting a breakpoint, both LLDB and GDB now tell you which clone
of the function you are in. E.g., GDB now prints "Breakpoint 1.2,
coro_func(int) [clone .resume] (v=43) at ..." instead of "Breakpoint
1.2, coro_func(int) (v=43) at ...".
3. GDB's `info line coro_func` command now allows you to distinguish the
multiple different clones of the function.
In Swift, the linkage names of the clones were already updated. The
comment right above the relevant code in `CoroSplit.cpp` already hinted
that the linkage name should probably also be updated in C++. This
comment was added in commit 6ce76ff7eb7640, and back then the
corresponding `DW_AT_specification` (i.e., `SP->getDeclaration()`) was
not updated, yet, which led to problems for C++. In the meantime, commit
ca1a5b37c7236d added code to also update `SP->getDeclaration`, as such
there is no reason anymore to not update the linkage name for C++.
Note that most test cases used inconsistent function names for the LLVM
function vs. the DISubprogram linkage name. clang would never emit such
LLVM IR. This confused me initially, and hence I fixed it while updating
the test case.
Drive-by fix: The change in `CGVTables.cpp` is purely stylistic, NFC.
When looking for other usages of `replaceWithDistinct`, I got initially
confused because `CGVTables.cpp` was calling a static function via an
object instance.
Codegen support for reduction over private variable with reduction
clause. Section 7.6.10 in in OpenMP 6.0 spec.
- An internal shared copy is initialized with an initializer value.
- The shared copy is updated by combining its value with the values from
the private copies created by the clause.
- Once an encountering thread verifies that all updates are complete,
its original list item is updated by merging its value with that of the
shared copy and then broadcast to all threads.
Sample Test Case from OpenMP 6.0 Example
```
#include <assert.h>
#include <omp.h>
#define N 10
void do_red(int n, int *v, int &sum_v)
{
sum_v = 0; // sum_v is private
#pragma omp for reduction(original(private),+: sum_v)
for (int i = 0; i < n; i++)
{
sum_v += v[i];
}
}
int main(void)
{
int v[N];
for (int i = 0; i < N; i++)
v[i] = i;
#pragma omp parallel num_threads(4)
{
int s_v; // s_v is private
do_red(N, v, s_v);
assert(s_v == 45);
}
return 0;
}
```
Expected Codegen:
```
// A shared global/static variable is introduced for the reduction result.
// This variable is initialized (e.g., using memset or a UDR initializer)
// e.g., .omp.reduction.internal_private_var
// Barrier before any thread performs combination
call void @__kmpc_barrier(...)
// Initialization block (executed by thread 0)
// e.g., call void @llvm.memset.p0.i64(...) or call @udr_initializer(...)
call void @__kmpc_critical(...)
// Inside critical section:
// Load the current value from the shared variable
// Load the thread-local private variable's value
// Perform the reduction operation
// Store the result back to the shared variable
call void @__kmpc_end_critical(...)
// Barrier after all threads complete their combinations
call void @__kmpc_barrier(...)
// Broadcast phase:
// Load the final result from the shared variable)
// Store the final result to the original private variable in each thread
// Final barrier after broadcast
call void @__kmpc_barrier(...)
```
---------
Co-authored-by: Chandra Ghale <ghale@pe31.hpc.amslabs.hpecorp.net>
This pull request fixes coverage mapping on GPU targets.
- It adds an address space cast to the coverage mapping generation pass.
- It reads the profiled function names from the ELF directly. Reading it
from public globals was causing issues in cases where multiple
device-code object files are linked together.
- try_emplace(Key) is shorter than insert({Key, nullptr}).
- try_emplace performs value initialization without value parameters.
- We overwrite values on successful insertion anyway.
This commit adds code to lower WaveGetLaneCount() into the SPV or DXIL
intrinsic. The backends will then need to lower the intrinsic into
proper SPIR-V/DXIL.
Related to #99159
CreateVScale took a scaling parameter that had a single use outside of
IRBuilder with all other callers having to create a redundant
ConstantInt. To work round this some code perferred to use
CreateIntrinsic directly.
This patch simplifies CreateVScale to return a call to the llvm.vscale()
intrinsic and nothing more. As well as simplifying the existing call
sites I've also migrated the uses of CreateIntrinsic.
Whilst IRBuilder used CreateVScale's scaling parameter as part of the
implementations of CreateElementCount and CreateTypeSize, I have
follow-on work to switch them to the NUW varaiety and thus they would
stop using CreateVScale's scaling as well. To prepare for this I have
moved the multiplication and constant folding into the implementations
of CreateElementCount and CreateTypeSize.
As a final step I have replaced some callers of CreateVScale with
CreateElementCount where it's clear from the code they wanted the
latter.
The aim here is to avoid a ptrtoint->inttoptr round-trip through the function
argument whilst keeping the calling convention the same. Given a struct which
is <= 128bits in size, which can only contain either 1 or 2 pointers, we
convert to a ptr or [2 x ptr] as opposed to the old coercion that uses i64 or
[2 x i64]. This helps alias analysis produce more accurate results.
This breaks a lot of new driver HIP compilation. We should probably
revert this for now until we can make a fixed version.
```c++
static __global__ void print() { printf("%s\n", "foo"); }
void b();
int main() {
hipLaunchKernelGGL(print, dim3(1), dim3(1), 0, 0);
auto y = hipDeviceSynchronize();
b();
}
```
```c++
static __global__ void print() { printf("%s\n", "bar"); }
void b() {
hipLaunchKernelGGL(print, dim3(1), dim3(1), 0, 0);
auto y = hipDeviceSynchronize();
}
```
```console
$ clang++ a.hip b.hip --offload-arch=gfx1030 --offload-new-driver
$ ./a.out
foo
foo
```
```console
$ clang++ a.hip b.hip --offload-arch=gfx1030 --offload-new-driver -flto
<crash>
```
This reverts commit d54c28b9c1396fa92d9347ac1135da7907121cb8.
The dependency from the type sugar of the underlying type of a Typedef
were not being considered for the dependency of the TypedefType itself.
A TypedefType should be instantiation dependent if it involves
non-instantiated template parameters, even if they don't contribute to
the canonical type.
Besides, a TypedefType should be instantiation dependent if it is
declared in a dependent context, but fixing that would have performance
consequences, as otherwise non-dependent typedef declarations would need
to be transformed during instantiation as well.
This removes the workaround added in
https://github.com/llvm/llvm-project/pull/90032
Fixes https://github.com/llvm/llvm-project/issues/89774
This extends https://github.com/llvm/llvm-project/pull/138577 to more UBSan checks, by changing SanitizerDebugLocation (formerly SanitizerScope) to add annotations if enabled for the specified ordinals.
Annotations will use the ordinal name if there is exactly one ordinal specified in the SanitizerDebugLocation; otherwise, it will use the handler name.
Updates the tests from https://github.com/llvm/llvm-project/pull/141814.
---------
Co-authored-by: Vitaly Buka <vitalybuka@google.com>
When using `-fcs-generate-profile` with distributed thin-lto in the same
fashion we do for local thin-lto, we hit the following assertion:
6041c745f3/llvm/lib/Support/PGOOptions.cpp (L36)
Using local thin-lto with LLD for MachO, we set the missing path
automatically to a default value: https://reviews.llvm.org/D151589. In
this fix we add the same behavior.
---------
Co-authored-by: Nuri Amari <nuriamari@fb.com>
This option complements -funique-source-file-names and allows the user
to use a different unique identifier than the source file path.
Reviewers: teresajohnson
Reviewed By: teresajohnson
Pull Request: https://github.com/llvm/llvm-project/pull/142901
Most of the recent development on the MemProfiler has been on the Use part. The instrumentation has been quite stable for a while. As the complexity of the use grows (with undrifting, diagnostics etc) I figured it would be good to separate these two implementations.
We have multiple different attributes in clang representing device
kernels for specific targets/languages. Refactor them into one attribute
with different spellings to make it more easily scalable for new
languages/targets.
---------
Signed-off-by: Sarnie, Nick <nick.sarnie@intel.com>
This enables the LLVM optimizer to view accesses to distinct struct
members as independent, also for array members. For example, the
following two stores no longer alias:
struct S { int a[10]; int b; };
void test(S *p, int i) {
p->a[i] = ...;
p->b = ...;
}
Array members were already added to TBAA struct type nodes in commit
57493e29. Here, we extend a path tag for an array subscript expression.
This introduces the attribute discussed in
https://discourse.llvm.org/t/rfc-function-type-attribute-to-prevent-cfi-instrumentation/85458.
The proposed name has been changed from `no_cfi` to
`cfi_unchecked_callee` to help differentiate from `no_sanitize("cfi")`
more easily. The proposed attribute has the following semantics:
1. Indirect calls to a function type with this attribute will not be
instrumented with CFI. That is, the indirect call will not be checked.
Note that this only changes the behavior for indirect calls on pointers
to function types having this attribute. It does not prevent all
indirect function calls for a given type from being checked.
2. All direct references to a function whose type has this attribute
will always reference the true function definition rather than an entry
in the CFI jump table.
3. When a pointer to a function with this attribute is implicitly cast
to a pointer to a function without this attribute, the compiler will
give a warning saying this attribute is discarded. This warning can be
silenced with an explicit C-style cast or C++ static_cast.
This patch is part of a stack that teaches Clang to generate Key Instructions
metadata for C and C++.
When returning a value, stores to the `retval` allocas and branches to `return`
block are put in the same atom group. They are both rank 1, which could in
theory introduce an extra step in some optimized code. This low risk currently
feels an acceptable for keeping the code a bit simpler (as opposed to adding
scaffolding to make the store rank 2).
In the case of a single return (no control flow) the return instruction inherits
the atom group of the branch to the return block when the blocks get folded
togather.
RFC:
https://discourse.llvm.org/t/rfc-improving-is-stmt-placement-for-better-interactive-debugging/82668
The feature is only functional in LLVM if LLVM is built with CMake flag
LLVM_EXPERIMENTAL_KEY_INSTRUCTIONs. Eventually that flag will be removed.
This variable attribute is used in HLSL to add Vulkan specific builtins
in a shader.
The attribute is documented here:
17727e88fd/proposals/0011-inline-spirv.md
Those variable, even if marked as `static` are externally initialized by
the pipeline/driver/GPU. This is handled by moving them to a specific
address space `hlsl_input`, also added by this commit.
The design for input variables in Clang can be found here:
355771361e/proposals/0019-spirv-input-builtin.md
Co-authored-by: Justin Bogner <mail@justinbogner.com>
With pointer authentication it becomes non-trivial to correctly load the
vtable pointer of a polymorphic object.
__builtin_get_vtable_pointer is a function that performs the load and
performs the appropriate authentication operations if necessary.
`HLSLRootSignature.h` was originally created to hold the struct
definitions of an `llvm::hlsl::rootsig::RootElement` and some helper
functions for it.
However, there many users of the structs that don't require any of the
helper methods. This requires us to link the `FrontendHLSL` library,
where we otherwise wouldn't need to.
For instance:
- This [revert](https://github.com/llvm/llvm-project/pull/142005) was
required as it requires linking to the unrequired `FrontendHLSL` library
- As part of the change required here:
https://github.com/llvm/llvm-project/issues/126557. We will want to add
an `HLSLRootSignatureVersion` enum. Ideally this could live with the
root signature struct defs, but we don't want to link the helper objects
into `clang/Basic/TargetOptions.h`
This change allows the struct definitions to be kept in a single header
file and to then have the `FrontendHLSL` library only be linked when
required.
Enable _Float16 for LoongArch target. Additionally, this change fixes
incorrect ABI lowering of _Float16 in the case of structs containing
fp16 that are eligible for passing via GPR+FPR or FPR+FPR. Finally, it
also fixes int16 -> __fp16 conversion code gen, which uses generic LLVM
IR rather than llvm.convert.to.fp16 intrinsics.
DIBuilder began tracking definition subprograms and finalizing them in
`DIBuilder::finalize()` in eb1bb4e419.
Currently, `finalizeSubprogram()` attaches local variables, imported
entities, and labels to the `retainedNodes:` field of a corresponding
subprogram.
After 75819aedf, the definition and some declaration subprograms are
finalized in `DIBuilder::finalize()`:
`AllSubprograms` holds references to definition subprograms.
`AllRetainTypes` holds references to declaration subprograms.
For DISubprogram elements of both variables, `finalizeSubprogram()` was
called there.
However, `retainTypes()` is not necessarily called for every declaration
subprogram (as in 40a3fcb0).
DIBuilder clients may also want to attach DILocalVariables to
declaration subprograms, for example, in 58bdf8f9a8.
Thus, the `finalizeSubprogram()` function is called for all definition
subprograms in `DIBuilder::finalize()` because they are stored in the
`AllSubprograms` by the `CreateFunction(isDefinition: true)` call. But
for the declaration subprograms, it should be called manually.
With this commit, `AllSubprograms` is used for holding and finalizing all DISubprograms.
With the current behavior the following example yields a linker error:
"multiple definition of `foo.default'"
// Translation Unit 1
__attribute__((target_clones("dotprod, sve"))) int foo(void) { return 1; }
// Translation Unit 2
int foo(void) { return 0; }
__attribute__((target_version("dotprod"))) int foo(void);
__attribute__((target_version("sve"))) int foo(void);
int bar(void) { return foo(); }
That is because foo.default is generated twice. As a user I don't find
this particularly intuitive. If I wanted the default to be generated in
TU1 I'd rather write target_clones("dotprod, sve", "default")
explicitly.
When changing the code I noticed that the RISC-V target defers the
resolver emission when encountering a target_version definition. This
seems accidental since it only makes sense for AArch64, where we only
emit a resolver once we've processed the entire TU, and only if the
default version is present. I've changed this so that RISC-V immediately
emmits the resolver. I adjusted the codegen tests since the functions
now appear in a different order.
Implements https://github.com/ARM-software/acle/pull/377
The InstrProf headers are very expensive. Avoid including them in all of
CodeGen/ by moving the CodeGenPGO member behind a unqiue_ptr.
This reduces clang build time by 0.8%.
This patch implements IR-based Profile-Guided Optimization support in
Flang through the following flags:
- `-fprofile-generate` for instrumentation-based profile generation
- `-fprofile-use=<dir>/file` for profile-guided optimization
Resolves#74216 (implements IR PGO support phase)
**Key changes:**
- Frontend flag handling aligned with Clang/GCC semantics
- Instrumentation hooks into LLVM PGO infrastructure
- LIT tests verifying:
- Instrumentation metadata generation
- Profile loading from specified path
- Branch weight attribution (IR checks)
**Tests:**
- Added gcc-flag-compatibility.f90 test module verifying:
- Flag parsing boundary conditions
- IR-level profile annotation consistency
- Profile input path normalization rules
- SPEC2006 benchmark results will be shared in comments
For details on LLVM's PGO framework, refer to [Clang PGO
Documentation](https://clang.llvm.org/docs/UsersManual.html#profile-guided-optimization).
This implementation was developed by [XSCC Compiler
Team](https://github.com/orgs/OpenXiangShan/teams/xscc).
---------
Co-authored-by: ict-ql <168183727+ict-ql@users.noreply.github.com>
Co-authored-by: Tom Eccles <t@freedommail.info>