We have multiple different attributes in clang representing device
kernels for specific targets/languages. Refactor them into one attribute
with different spellings to make it more easily scalable for new
languages/targets.
---------
Signed-off-by: Sarnie, Nick <nick.sarnie@intel.com>
This variable attribute is used in HLSL to add Vulkan specific builtins
in a shader.
The attribute is documented here:
17727e88fd/proposals/0011-inline-spirv.md
Those variable, even if marked as `static` are externally initialized by
the pipeline/driver/GPU. This is handled by moving them to a specific
address space `hlsl_input`, also added by this commit.
The design for input variables in Clang can be found here:
355771361e/proposals/0019-spirv-input-builtin.md
Co-authored-by: Justin Bogner <mail@justinbogner.com>
With the current behavior the following example yields a linker error:
"multiple definition of `foo.default'"
// Translation Unit 1
__attribute__((target_clones("dotprod, sve"))) int foo(void) { return 1; }
// Translation Unit 2
int foo(void) { return 0; }
__attribute__((target_version("dotprod"))) int foo(void);
__attribute__((target_version("sve"))) int foo(void);
int bar(void) { return foo(); }
That is because foo.default is generated twice. As a user I don't find
this particularly intuitive. If I wanted the default to be generated in
TU1 I'd rather write target_clones("dotprod, sve", "default")
explicitly.
When changing the code I noticed that the RISC-V target defers the
resolver emission when encountering a target_version definition. This
seems accidental since it only makes sense for AArch64, where we only
emit a resolver once we've processed the entire TU, and only if the
default version is present. I've changed this so that RISC-V immediately
emmits the resolver. I adjusted the codegen tests since the functions
now appear in a different order.
Implements https://github.com/ARM-software/acle/pull/377
This patch implements IR-based Profile-Guided Optimization support in
Flang through the following flags:
- `-fprofile-generate` for instrumentation-based profile generation
- `-fprofile-use=<dir>/file` for profile-guided optimization
Resolves#74216 (implements IR PGO support phase)
**Key changes:**
- Frontend flag handling aligned with Clang/GCC semantics
- Instrumentation hooks into LLVM PGO infrastructure
- LIT tests verifying:
- Instrumentation metadata generation
- Profile loading from specified path
- Branch weight attribution (IR checks)
**Tests:**
- Added gcc-flag-compatibility.f90 test module verifying:
- Flag parsing boundary conditions
- IR-level profile annotation consistency
- Profile input path normalization rules
- SPEC2006 benchmark results will be shared in comments
For details on LLVM's PGO framework, refer to [Clang PGO
Documentation](https://clang.llvm.org/docs/UsersManual.html#profile-guided-optimization).
This implementation was developed by [XSCC Compiler
Team](https://github.com/orgs/OpenXiangShan/teams/xscc).
---------
Co-authored-by: ict-ql <168183727+ict-ql@users.noreply.github.com>
Co-authored-by: Tom Eccles <t@freedommail.info>
Recently in some of our internal testing, we noticed that the compiler
was sometimes generating an empty linker.options section which seems
unnecessary. This proposed change causes the compiler to simply omit
emitting the linker.options section if it is empty.
Unused types are retained in the debug info when
-fno-eliminate-unused-debug-types is specified.
However, unused nested enums were not being emitted even with this
option.
This patch fixes the missing emission of unused nested enums with
-fno-eliminate-unused-debug-types
Adds support for emitting Windows x64 Unwind V2 information, includes
support `/d2epilogunwind` in clang-cl.
Unwind v2 adds information about the epilogs in functions such that the
unwinder can unwind even in the middle of an epilog, without having to
disassembly the function to see what has or has not been cleaned up.
Unwind v2 requires that all epilogs are in "canonical" form:
* If there was a stack allocation (fixed or dynamic) in the prolog, then
the first instruction in the epilog must be a stack deallocation.
* Next, for each `PUSH` in the prolog there must be a corresponding
`POP` instruction in exact reverse order.
* Finally, the epilog must end with the terminator.
This change adds a pass to validate epilogs in modules that have Unwind
v2 enabled and, if they pass, emits new pseudo instructions to MC that
1) note that the function is using unwind v2 and 2) mark the start of
the epilog (this is either the first `POP` if there is one, otherwise
the terminator instruction). If a function does not meet these
requirements, it is downgraded to Unwind v1 (i.e., these new pseudo
instructions are not emitted).
Note that the unwind v2 table only marks the size of the epilog in the
"header" unwind code, but it's possible for epilogs to use different
terminator instructions thus they are not all the same size. As a work
around for this, MC will assume that all terminator instructions are
1-byte long - this still works correctly with the Windows unwinder as it
is only using the size to do a range check to see if a thread is in an
epilog or not, and since the instruction pointer will never be in the
middle of an instruction and the terminator is always at the end of an
epilog the range check will function correctly. This does mean, however,
that the "at end" optimization (where an epilog unwind code can be
elided if the last epilog is at the end of the function) can only be
used if the terminator is 1-byte long.
One other complication with the implementation is that the unwind table
for a function is emitted during streaming, however we can't calculate
the distance between an epilog and the end of the function at that time
as layout hasn't been completed yet (thus some instructions may be
relaxed). To work around this, epilog unwind codes are emitted via a
fixup. This also means that we can't pre-emptively downgrade a function
to Unwind v1 if one of these offsets is too large, so instead we raise
an error (but I've passed through the location information, so the user
will know which of their functions is problematic).
It isn't used and is redundant with the result pointer type argument.
A more reasonable API would only have LangAS parameters, or IR parameters,
not both. Not all values have a meaningful value for this. I'm also
not sure why we have this at all, it's not overridden by any targets and
further simplification is possible.
OpenCL Kernels body is emitted as stubs and the kernel is emitted as
call to respective stub.
(https://github.com/llvm/llvm-project/pull/115821).
The stub function should be alwaysinlined, since call to stub can cause
performance drop.
Co-authored-by: anikelal <anikelal@amd.com>
- Adds resource constructor that takes explicit binding for all resource
classes.
- Updates implementation of default resource constructor to initialize
resource handle to `poison`.
- Removes initialization of resource classes from Codegen.
- Initialization of `cbuffer` still needs to happen in `CGHLSLRuntime`
because it does not have a corresponding resource class type.
- Adds `ImplicitCastExpr` for builtin function calls. Sema adds these
automatically when a method of a template class is instantiated, but
some resource classes like `ByteAddressBuffer` are not templates so they
need to have this added explicitly.
Design proposal: llvm/wg-hlsl#197
Closes#134154
It appears that OpenMP changes caused this comment to get moved away
from where it was intended, so this patch puts it back next to the
multiversioning code, as intended.
Use GetAddrOfGlobal, which is a more general API that takes a
GlobalDecl, and handles declaring C++ destructors and other types in a
general way. We can use this to generalize over functions and variable
declarations.
This fixes issues reported on #130674 by @lexi-nadia .
This reverts commit 27c1aa9b9cf9e0b14211758ff8f7d3aaba24ffcf.
See comments on PR. After this change, Clang now asserts like this:
clang: ../llvm/include/llvm/IR/Metadata.h:1435: const MDOperand &llvm::MDNode::getOperand(unsigned int) const: Assertion `I < getNumOperands() && "Out of range"' failed.
...
#8 0x000055f345c4e4cb clang::CodeGen::CGDebugInfo::getOrCreateInstanceMethodType()
#9 0x000055f345c5ba4f clang::CodeGen::CGDebugInfo::EmitFunctionDecl()
#10 0x000055f345b52519 clang::CodeGen::CodeGenModule::EmitExternalFunctionDeclaration()
This is due to pre-existing jankiness in the way BPF emits extra
declarations for debug info, but we should rollback and then fix forward.
A function declared with the `sycl_kernel_entry_point` attribute,
sometimes called a SYCL kernel entry point function, specifies a pattern
from which the parameters and body of an offload entry point function,
sometimes called a SYCL kernel caller function, are derived.
SYCL kernel caller functions are emitted during SYCL device compilation.
Their parameters and body are derived from the `SYCLKernelCallStmt`
statement and `OutlinedFunctionDecl` declaration associated with their
corresponding SYCL kernel entry point function. A distinct SYCL kernel
caller function is generated for each SYCL kernel entry point function
defined as a non-inline function or ODR-used in the translation unit.
The name of each SYCL kernel caller function is parameterized by the
SYCL kernel name type specified by the `sycl_kernel_entry_point`
attribute attached to the corresponding SYCL kernel entry point
function. For the moment, the Itanium ABI mangled name for typeinfo data
(`_ZTS<type>`) is used to name these functions; a future change will
switch to a more appropriate naming scheme.
The calling convention used for a SYCL kernel caller function is target
dependent. Support for AMDGCN, NVPTX, and SPIR targets is currently
provided. These functions are required to observe the language
restrictions for SYCL devices as specified by the SYCL 2020
specification; this includes a forward progress guarantee and prohibits
recursion.
Only SYCL kernel caller functions, functions declared as
`SYCL_EXTERNAL`, and functions directly or indirectly referenced from
those functions should be emitted during device compilation. Pruning of
other declarations has not yet been implemented.
---------
Co-authored-by: Elizabeth Andrews <elizabeth.andrews@intel.com>
The purpose of this flag is to allow the compiler to assume that each
object file passed to the linker has been compiled using a unique
source file name. This is useful for reducing link times when doing
ThinLTO in combination with whole-program devirtualization or CFI,
as it allows modules without exported symbols to be built with ThinLTO.
Reviewers: vitalybuka, teresajohnson
Reviewed By: teresajohnson
Pull Request: https://github.com/llvm/llvm-project/pull/135728
Finding operator delete[] is still problematic, without it the extension
is a security hazard, so reverting until the problem with operator
delete[] is figured out.
This reverts the following PRs:
Reland [MS][clang] Add support for vector deleting destructors (llvm#133451)
[MS][clang] Make sure vector deleting dtor calls correct operator delete (llvm#133950)
[MS][clang] Fix crash on deletion of array of pointers (llvm#134088)
[clang] Do not diagnose unused deleted operator delete[] (llvm#134357)
[MS][clang] Error about ambiguous operator delete[] only when required (llvm#135041)
This feature is currently not supported in the compiler.
To facilitate this we emit a stub version of each kernel
function body with different name mangling scheme, and
replaces the respective kernel call-sites appropriately.
Fixes https://github.com/llvm/llvm-project/issues/60313
D120566 was an earlier attempt made to upstream a solution
for this issue.
---------
Co-authored-by: anikelal <anikelal@amd.com>
Whereas it is UB in terms of the standard to delete an array of objects
via pointer whose static type doesn't match its dynamic type, MSVC
supports an extension allowing to do it.
Aside from array deletion not working correctly in the mentioned case,
currently not having this extension implemented causes clang to generate
code that is not compatible with the code generated by MSVC, because
clang always puts scalar deleting destructor to the vftable. This PR
aims to resolve these problems.
It was reverted due to link time errors in chromium with sanitizer
coverage enabled,
which is fixed by https://github.com/llvm/llvm-project/pull/131929 .
The second commit of this PR also contains a fix for a runtime failure
in chromium reported
in
https://github.com/llvm/llvm-project/pull/126240#issuecomment-2730216384
.
Fixes https://github.com/llvm/llvm-project/issues/19772
Original PR: #130537
Originally reverted due to revert of dependent commit. Relanding with no
changes.
This changes the MemberPointerType representation to use a
NestedNameSpecifier instead of a Type to represent the base class.
Since the qualifiers are always parsed as nested names, there was an
impedance mismatch when converting these back and forth into types, and
this led to issues in preserving sugar.
The nested names are indeed a better match for these, as the differences
which a QualType can represent cannot be expressed syntatically, and
they represent the use case more exactly, being either dependent or
referring to a CXXRecord, unqualified.
This patch also makes the MemberPointerType able to represent sugar for
a {up/downcast}cast conversion of the base class, although for now the
underlying type is canonical, as preserving the sugar up to that point
requires further work.
As usual, includes a few drive-by fixes in order to make use of the
improvements.
Original PR: #130537
Reland after updating lldb too.
This changes the MemberPointerType representation to use a
NestedNameSpecifier instead of a Type to represent the base class.
Since the qualifiers are always parsed as nested names, there was an
impedance mismatch when converting these back and forth into types, and
this led to issues in preserving sugar.
The nested names are indeed a better match for these, as the differences
which a QualType can represent cannot be expressed syntatically, and
they represent the use case more exactly, being either dependent or
referring to a CXXRecord, unqualified.
This patch also makes the MemberPointerType able to represent sugar for
a {up/downcast}cast conversion of the base class, although for now the
underlying type is canonical, as preserving the sugar up to that point
requires further work.
As usual, includes a few drive-by fixes in order to make use of the
improvements.
This changes the MemberPointerType representation to use a
NestedNameSpecifier instead of a Type to represent the class.
Since the qualifiers are always parsed as nested names, there was an
impedance mismatch when converting these back and forth into types, and
this led to issues in preserving sugar.
The nested names are indeed a better match for these, as the differences
which a QualType can represent cannot be expressed syntactically, and it
also represents the use case more exactly, being either dependent or
referring to a CXXRecord, unqualified.
This patch also makes the MemberPointerType able to represent sugar for
a {up/downcast}cast conversion of the base class, although for now the
underlying type is canonical, as preserving the sugar up to that point
requires further work.
As usual, includes a few drive-by fixes in order to make use of the
improvements, and removing some duplications, for example
CheckBaseClassAccess is deduplicated from across SemaAccess and
SemaCast.
This caused link errors when building with sancov. See comment on the PR.
> Whereas it is UB in terms of the standard to delete an array of objects
> via pointer whose static type doesn't match its dynamic type, MSVC
> supports an extension allowing to do it.
> Aside from array deletion not working correctly in the mentioned case,
> currently not having this extension implemented causes clang to generate
> code that is not compatible with the code generated by MSVC, because
> clang always puts scalar deleting destructor to the vftable. This PR
> aims to resolve these problems.
>
> Fixes https://github.com/llvm/llvm-project/issues/19772
This reverts commit d6942d54f677000cf713d2b0eba57b641452beb4.
This patch makes the assertion (that is currently in a comment) that
validates that names mangled by clang can be demangled by LLVM actually
compile/work. There were some minor issues that needed to be fixed (like
starts_with not being available on std::string and needing to call
getDecl() on GD), and a logic issue that should be fixed in this patch.
This enables just uncommenting the assertion to enable it within the
compiler (minus needing to add the header file).
I initially implemented codegen to be a 'no-op' for these declarations,
which I thought was properly implemented. However, when they are a
top-level decl, we have a separate switch. This patch makes sure they
are properly emitted at top-level as a no-op, and adds a test for both
top-level and not top-level.
The resource wrapper should have internal linkage because it contains a
handle to the global resource, and it not the actual global.
Makeing this changed exposed that we were zeroinitializing the resouce,
which is a problem. The handle cannot be zeroinitialized. This is
changed to use poison instead.
Fixes https://github.com/llvm/llvm-project/issues/122767.
---------
Co-authored-by: Helena Kotas <hekotas@microsoft.com>
Whereas it is UB in terms of the standard to delete an array of objects
via pointer whose static type doesn't match its dynamic type, MSVC
supports an extension allowing to do it.
Aside from array deletion not working correctly in the mentioned case,
currently not having this extension implemented causes clang to generate
code that is not compatible with the code generated by MSVC, because
clang always puts scalar deleting destructor to the vftable. This PR
aims to resolve these problems.
Fixes https://github.com/llvm/llvm-project/issues/19772
Add option and statement attribute for controlling emitting of
target-specific
metadata to atomicrmw instructions in IR.
The RFC for this attribute and option is
https://discourse.llvm.org/t/rfc-add-clang-atomic-control-options-and-pragmas/80641,
Originally a pragma was proposed, then it was changed to clang
attribute.
This attribute allows users to specify one, two, or all three options
and must be applied
to a compound statement. The attribute can also be nested, with inner
attributes
overriding the options specified by outer attributes or the target's
default
options. These options will then determine the target-specific metadata
added to atomic
instructions in the IR.
In addition to the attribute, three new compiler options are introduced:
`-f[no-]atomic-remote-memory`, `-f[no-]atomic-fine-grained-memory`,
`-f[no-]atomic-ignore-denormal-mode`.
These compiler options allow users to override the default options
through the
Clang driver and front end. `-m[no-]unsafe-fp-atomics` is aliased to
`-f[no-]ignore-denormal-mode`.
In terms of implementation, the atomic attribute is represented in the
AST by the
existing AttributedStmt, with minimal changes to AST and Sema.
During code generation in Clang, the CodeGenModule maintains the current
atomic options,
which are used to emit the relevant metadata for atomic instructions.
RAII is used
to manage the saving and restoring of atomic options when entering
and exiting nested AttributedStmt.
All variable declarations in the global scope that are not resources,
static or empty are implicitly added to implicit constant buffer
`$Globals`. They are created in `hlsl_constant` address space and
collected in an implicit `HLSLBufferDecl` node that is added to the AST
at the end of the translation unit. Codegen is the same as for explicit
constant buffers.
Fixes#123801
This is a second attempt to implement this feature. The first attempt
had to be reverted because of memory leaks. The problem was adding a
`SmallVector` member on `HLSLBufferDecl` node to represent a list of
default buffer declarations. When this vector needed to grow, it
allocated memory that was never released, because all memory used by AST
nodes must be allocated by `ASTContext` allocator and is released all at
once. Destructors on AST nodes are never called.
It this change the list of default buffer declarations is collected in a
`SmallVector` instance on `SemaHLSL`. The `HLSLBufDecl` representing
`$Globals` is created at the end of the translation unit when the number
of declarations is known, and the list is copied into an array allocated
by the `ASTContext` allocator.
All variable declarations in the global scope that are not resources,
static or empty are implicitly added to implicit constant buffer
`$Globals`. They are created in `hlsl_constant` address space and
collected in an implicit `HLSLBufferDecl` node that is added to the AST
at the end of the translation unit. Codegen is the same as for explicit
constant buffers.
Fixes#123801
As Intel is working to add support for SPIR-V OpenMP device offloading
in upstream clang/liboffload, we need to modify the OpenMP frontend to
allow SPIR-V as well as generate valid IR for SPIR-V. For example, we
need the frontend to generate code to define and interact with device
globals used in the DeviceRTL.
This is the beginning of what I expect will be (many) other changes, but
let's get started with something simple.
---------
Signed-off-by: Sarnie, Nick <nick.sarnie@intel.com>
Kernel Control Flow Integrity (kCFI) is a feature that hardens indirect
calls by comparing a 32-bit hash of the function pointer's type against
a hash of the target function's type. If the hashes do not match, the
kernel may panic (or log the hash check failure, depending on the
kernel's configuration). These hashes are computed at compile time by
applying the xxHash64 algorithm to each mangled canonical function (or
function pointer) type, then truncating the result to 32 bits. This hash
is written into each indirect-callable function header by encoding it as
the 32-bit immediate operand to a `MOVri` instruction, e.g.:
```
__cfi_foo:
nop
nop
nop
nop
nop
nop
nop
nop
nop
nop
nop
movl $199571451, %eax # hash of foo's type = 0xBE537FB
foo:
...
```
This PR extends x86-based kCFI with a 3-bit arity indicator encoded in
the `MOVri` instruction's register (reg) field as follows:
| Arity Indicator | Description | Encoding in reg field |
| --------------- | --------------- | --------------- |
| 0 | 0 parameters | EAX |
| 1 | 1 parameter in RDI | ECX |
| 2 | 2 parameters in RDI and RSI | EDX |
| 3 | 3 parameters in RDI, RSI, and RDX | EBX |
| 4 | 4 parameters in RDI, RSI, RDX, and RCX | ESP |
| 5 | 5 parameters in RDI, RSI, RDX, RCX, and R8 | EBP |
| 6 | 6 parameters in RDI, RSI, RDX, RCX, R8, and R9 | ESI |
| 7 | At least one parameter may be passed on the stack | EDI |
For example, if `foo` takes 3 register arguments and no stack arguments
then the `MOVri` instruction in its kCFI header would instead be written
as:
```
movl $199571451, %ebx # hash of foo's type = 0xBE537FB
```
This PR will benefit other CFI approaches that build on kCFI, such as
FineIBT. For example, this proposed enhancement to FineIBT must be able
to infer (at kernel init time) which registers are live at an indirect
call target: https://lkml.org/lkml/2024/9/27/982. If the arity bits are
available in the kCFI function header, then this information is trivial
to infer.
Note that there is another existing PR proposal that includes the 3-bit
arity within the existing 32-bit immediate field, which introduces
different security properties:
https://github.com/llvm/llvm-project/pull/117121.
This both reapplies #118734, the initial attempt at this, and updates it
significantly.
First, it uses the newly added `StringTable` abstraction for string
tables, and simplifies the construction to build the string table and
info arrays separately. This should reduce any `constexpr` compile time
memory or CPU cost of the original PR while significantly improving the
APIs throughout.
It also restructures the builtins to support sharding across several
independent tables. This accomplishes two improvements from the
original PR:
1) It improves the APIs used significantly.
2) When builtins are defined from different sources (like SVE vs MVE in
AArch64), this allows each of them to build their own string table
independently rather than having to merge the string tables and info
structures.
3) It allows each shard to factor out a common prefix, often cutting the
size of the strings needed for the builtins by a factor two.
The second point is important both to allow different mechanisms of
construction (for example a `.def` file and a tablegen'ed `.inc` file,
or different tablegen'ed `.inc files), it also simply reduces the sizes
of these tables which is valuable given how large they are in some
cases. The third builds on that size reduction.
Initially, we use this new sharding rather than merging tables in
AArch64, LoongArch, RISCV, and X86. Mostly this helps ensure the system
works, as without further changes these still push scaling limits.
Subsequent commits will more deeply leverage the new structure,
including using the prefix capabilities which cannot be easily factored
out here and requires deep changes to the targets.
Per the ELF spec, section groups may only contain local symbols if those
symbols are only
referenced from within the section group. [1] In the case of template
parameter objects,
they can be referenced from outside the group when the type of the
object was declared
in an anonymous namespace. In that case, we can't place the object in a
COMDAT. This matches
GCC's linkage behavior on the test input.
[1]:
https://www.sco.com/developers/gabi/latest/ch4.sheader.html#section_groups
This is an implementation of P1061 Structure Bindings Introduce a Pack
without the ability to use packs outside of templates. There is a couple
of ways the AST could have been sliced so let me know what you think.
The only part of this change that I am unsure of is the
serialization/deserialization stuff. I followed the implementation of
other Exprs, but I do not really know how it is tested. Thank you for
your time considering this.
---------
Co-authored-by: Yanzuo Liu <zwuis@outlook.com>