Most callers want a constant index. Instead of making every caller
create a ConstantInt, we can do it in IRBuilder. This is similar to
createInsertElement/createExtractElement.
Fixes the issue reported following the merge of #118026.
When a valid `musttail` call is made, the function it is made from must
return immediately after the call; if there are any cleanups left in the
function, then an error is triggered. This is not necessary for fake
uses however - it is perfectly valid to simply emit the fake use
"cleanup" code before the tail call, and indeed LLVM will automatically
move any fake uses following a tail call to come before the tail call.
Therefore, this patch specifically choose to handle fake use cleanups
when a musttail call is present by simply emitting them immediately
before the call.
When records contain fields with pointer authentication, even simple
copies can require
additional work be performed. This patch contains the core functionality
required to
handle user defined structs, as well as the implicitly constructed
structs for blocks, etc.
Co-authored-by: Ahmed Bougacha
Co-authored-by: Akira Hatanaka
Co-authored-by: John Mccall
A function declared with the `sycl_kernel_entry_point` attribute,
sometimes called a SYCL kernel entry point function, specifies a pattern
from which the parameters and body of an offload entry point function,
sometimes called a SYCL kernel caller function, are derived.
SYCL kernel caller functions are emitted during SYCL device compilation.
Their parameters and body are derived from the `SYCLKernelCallStmt`
statement and `OutlinedFunctionDecl` declaration associated with their
corresponding SYCL kernel entry point function. A distinct SYCL kernel
caller function is generated for each SYCL kernel entry point function
defined as a non-inline function or ODR-used in the translation unit.
The name of each SYCL kernel caller function is parameterized by the
SYCL kernel name type specified by the `sycl_kernel_entry_point`
attribute attached to the corresponding SYCL kernel entry point
function. For the moment, the Itanium ABI mangled name for typeinfo data
(`_ZTS<type>`) is used to name these functions; a future change will
switch to a more appropriate naming scheme.
The calling convention used for a SYCL kernel caller function is target
dependent. Support for AMDGCN, NVPTX, and SPIR targets is currently
provided. These functions are required to observe the language
restrictions for SYCL devices as specified by the SYCL 2020
specification; this includes a forward progress guarantee and prohibits
recursion.
Only SYCL kernel caller functions, functions declared as
`SYCL_EXTERNAL`, and functions directly or indirectly referenced from
those functions should be emitted during device compilation. Pruning of
other declarations has not yet been implemented.
---------
Co-authored-by: Elizabeth Andrews <elizabeth.andrews@intel.com>
CGCall.cpp declares several functions with a return type that is an
explicitly spelled out specialization of `SmallVector`. Previously,
`auto` was used in several places to avoid repeating the long type name;
a use that Clang maintainers find unjustified. This change introduces
type aliases and replaces the existing uses of `auto` with the
corresponding alias name.
This feature is currently not supported in the compiler.
To facilitate this we emit a stub version of each kernel
function body with different name mangling scheme, and
replaces the respective kernel call-sites appropriately.
Fixes https://github.com/llvm/llvm-project/issues/60313
D120566 was an earlier attempt made to upstream a solution
for this issue.
---------
Co-authored-by: anikelal <anikelal@amd.com>
This is an expensive header, only include it where needed. Move some
functions out of line to achieve that.
This reduces time to build clang by ~0.5% in terms of instructions
retired.
Certain functions require the `returns_twice` attribute in order to
produce correct codegen. However, `-fno-builtin` removes all knowledge
of functions that require this attribute, so this PR modifies Clang to
add the `returns_twice` attribute even if `-fno-builtin` is set. This
behavior is also consistent with what GCC does.
It's not (easily) possible to get the builtin information from
`Builtins.td` because `-fno-builtin` causes Clang to never initialize
any builtins, so functions never get tokenized as functions/builtins
that require `returns_twice`. Therefore, the most straightforward
solution is to explicitly hard code the function names that require
`returns_twice`.
Fixes#122840
This patch corrects the typo 'dereferencable' to 'dereferenceable' in
CGCall.cpp.
The typo is located within a comment inside the `void
CodeGenModule::ConstructAttributeList` function.
This patch adds a function attribute `riscv_vls_cc` for RISCV VLS
calling
convention which takes 0 or 1 argument, the argument is the `ABI_VLEN`
which is the `VLEN` for passing the fixed-vector arguments, it wraps the
argument as a scalable vector(VLA) using the `ABI_VLEN` and uses the
corresponding mechanism to handle it. The range of `ABI_VLEN` is [32,
65536],
if not specified, the default value is 128.
Here is an example of VLS argument passing:
Non-VLS call:
```
void original_call(__attribute__((vector_size(16))) int arg) {}
=>
define void @original_call(i128 noundef %arg) {
entry:
...
ret void
}
```
VLS call:
```
void __attribute__((riscv_vls_cc(256))) vls_call(__attribute__((vector_size(16))) int arg) {}
=>
define riscv_vls_cc void @vls_call(<vscale x 1 x i32> %arg) {
entry:
...
ret void
}
}
```
The first Non-VLS call passes generic vector argument of 16 bytes by
flattened integer.
On the contrary, the VLS call uses `ABI_VLEN=256` which wraps the
vector to <vscale x 1 x i32> where the number of scalable vector
elements
is calaulated by: `ORIG_ELTS * RVV_BITS_PER_BLOCK / ABI_VLEN`.
Note: ORIG_ELTS = Vector Size / Type Size = 128 / 32 = 4.
PsABI PR: https://github.com/riscv-non-isa/riscv-elf-psabi-doc/pull/418
C-API PR: https://github.com/riscv-non-isa/riscv-c-api-doc/pull/68
This addresses two issues introduced by moving indirect args into an
explicit AS (please see
<https://github.com/llvm/llvm-project/pull/114062#issuecomment-2659829790>
and
<https://github.com/llvm/llvm-project/pull/114062#issuecomment-2661158477>):
1. Unconditionally stripping casts from a pre-allocated return slot was
incorrect / insufficient (this is illustrated by the
`amdgcn_sret_ctor.cpp` test);
2. Putting compiler manufactured sret args in a non default AS can lead
to a C-cast (surprisingly) requiring an AS cast (this is illustrated by
the `sret_cast_with_nonzero_alloca_as.cpp test).
The way we handle (2), by subverting CK_BitCast emission iff a sret arg
is involved, is quite naff, but I couldn't think of any other way to use
a non default indirect AS and make this case work (hopefully this is a
failure of imagination on my part).
`sret` arguments are always going to reside in the stack/`alloca`
address space, which makes the current formulation where their AS is
derived from the pointee somewhat quaint. This patch ensures that `sret`
ends up pointing to the `alloca` AS in IR function signatures, and also
guards agains trying to pass a casted `alloca`d pointer to a `sret` arg,
which can happen for most languages, when compiled for targets that have
a non-zero `alloca` AS (e.g. AMDGCN) / map `LangAS::default` to a
non-zero value (SPIR-V). A target could still choose to do something
different here, by e.g. overriding `classifyReturnType` behaviour.
In a broader sense, this patch extends non-aliased indirect args to also
carry an AS, which leads to changing the `getIndirect()` interface. At
the moment we're only using this for (indirect) returns, but it allows
for future handling of indirect args themselves. We default to using the
AllocaAS as that matches what Clang is currently doing, however if, in
the future, a target would opt for e.g. placing indirect returns in some
other storage, with another AS, this will require revisiting.
---------
Co-authored-by: Matt Arsenault <arsenm2@gmail.com>
Co-authored-by: Matt Arsenault <Matthew.Arsenault@amd.com>
This PR removes the old `nocapture` attribute, replacing it with the new
`captures` attribute introduced in #116990. This change is
intended to be essentially NFC, replacing existing uses of `nocapture`
with `captures(none)` without adding any new analysis capabilities.
Making use of non-`none` values is left for a followup.
Some notes:
* `nocapture` will be upgraded to `captures(none)` by the bitcode
reader.
* `nocapture` will also be upgraded by the textual IR reader. This is to
make it easier to use old IR files and somewhat reduce the test churn in
this PR.
* Helper APIs like `doesNotCapture()` will check for `captures(none)`.
* MLIR import will convert `captures(none)` into an `llvm.nocapture`
attribute. The representation in the LLVM IR dialect should be updated
separately.
Following the previous patch which adds the "extend lifetimes" flag
without (almost) any functionality, this patch adds the real feature by
allowing Clang to emit fake uses. These are emitted as a new form of cleanup,
set for variable addresses, which just emits a fake use intrinsic when the
variable falls out of scope. The code for achieving this is simple, with most
of the logic centered on determining whether to emit a fake use for a given
address, and on ensuring that fake uses are ignored in a few cases.
Co-authored-by: Stephen Tozer <stephen.tozer@sony.com>
Note that PointerUnion::dyn_cast has been soft deprecated in
PointerUnion.h:
// FIXME: Replace the uses of is(), get() and dyn_cast() with
// isa<T>, cast<T> and the llvm::dyn_cast<T>
Literal migration would result in dyn_cast_if_present (see the
definition of PointerUnion::dyn_cast), but this patch uses dyn_cast
because we expect Prototype.P to be nonnull.
The `Checked` parameter of `CodeGenFunction::EmitCheck` is of type
`ArrayRef<std::pair<llvm::Value *, SanitizerMask>>`, which is overly
generalized: SanitizerMask can denote that zero or more sanitizers are
enabled, but `EmitCheck` requires that exactly one sanitizer is
specified in the SanitizerMask (e.g.,
`SanitizeTrap.has(Checked[i].second)` enforces that).
This patch replaces SanitizerMask with SanitizerOrdinal in the `Checked`
parameter of `EmitCheck` and code that transitively relies on it. This
should not affect the behavior of UBSan, but it has the advantages that:
- the code is clearer: it avoids ambiguity in EmitCheck about what to do
if multiple bits are set
- specifying the wrong number of sanitizers in `Checked[i].second` will
be detected as a compile-time error, rather than a runtime assertion
failure
Suggested by Vitaly in https://github.com/llvm/llvm-project/pull/122392
as an alternative to adding an explicit runtime assertion that the
SanitizerMask contains exactly one sanitizer.
This reverts commit 81fc3add1e627c23b7270fe2739cdacc09063e54.
This breaks some LLDB tests, e.g.
SymbolFile/DWARF/x86/no_unique_address-with-bitfields.cpp:
lldb: ../llvm-project/clang/lib/AST/Decl.cpp:4604: unsigned int clang::FieldDecl::getBitWidthValue() const: Assertion `isa<ConstantExpr>(getBitWidth())' failed.
Save the bitwidth value as a `ConstantExpr` with the value set. Remove
the `ASTContext` parameter from `getBitWidthValue()`, so the latter
simply returns the value from the `ConstantExpr` instead of
constant-evaluating the bitwidth expression every time it is called.
Closes#119360.
This bug occurs when passing a VLA to `va_arg`. Since the return value
is inferred to be an array, it triggers
`ScalarExprEmitter::VisitCastExpr`, which converts it to a pointer and
subsequently calls `CodeGenFunction::EmitAggExpr`. At this point,
because the inferred type is an `AggExpr` instead of a `ScalarExpr`,
`ScalarExprEmitter::VisitVAArgExpr` is not invoked, and as a result,
`CodeGenFunction::EmitVariablyModifiedType` is also not called, leading
to the size of the VLA not being retrieved.
The solution is to move the call to
`CodeGenFunction::EmitVariablyModifiedType` into
`CodeGenFunction::EmitVAArg`, ensuring that the size of the VLA is
correctly obtained regardless of whether the expression is an `AggExpr`
or a `ScalarExpr`.
- Use `poison` instead of `undef` as a phi operand for an unreachable path (the predecessor
will not go the BB that uses the value of the phi).
- Call `@llvm.vector.insert` with a `poison` subvec when performing a
`bitcast` from a fixed vector to a scalable vector.
Note that PointerUnion::{is,get} have been soft deprecated in
PointerUnion.h:
// FIXME: Replace the uses of is(), get() and dyn_cast() with
// isa<T>, cast<T> and the llvm::dyn_cast<T>
I'm not touching PointerUnion::dyn_cast for now because it's a bit
complicated; we could blindly migrate it to dyn_cast_if_present, but
we should probably use dyn_cast when the operand is known to be
non-null.
Get inout/out parameters working for HLSL Arrays.
Utilizes the fix from #109323, and corrects the assignment behavior
slightly to allow for Non-LValues on the RHS.
Closes#106917
---------
Co-authored-by: Chris B <beanz@abolishcrlf.org>
If `__attribute__((flatten))` is used on a function, or
`[[clang::always_inline]]` on a statement, don't inline any callees with
incompatible streaming attributes. Without this check, clang may produce
incorrect code when these attributes are used in code with streaming
functions.
Note: The docs for flatten say it can be ignored when inlining is
impossible: "causes calls within the attributed function to be inlined
unless it is impossible to do so".
Similarly, the (clang-only) `[[clang::always_inline]]` statement
attribute is more relaxed than the GNU `__attribute__((always_inline))`
(which says it should error it if it can't inline), saying only "If a
statement is marked [[clang::always_inline]] and contains calls, the
compiler attempts to inline those calls.". The docs also go on to show
an example of where `[[clang::always_inline]]` has no effect.
Pure Scalable Types are defined in AAPCS64 here:
https://github.com/ARM-software/abi-aa/blob/main/aapcs64/aapcs64.rst#pure-scalable-types-psts
And should be passed according to Rule C.7 here:
https://github.com/ARM-software/abi-aa/blob/main/aapcs64/aapcs64.rst#682parameter-passing-rules
This part of the ABI is completely unimplemented in Clang, instead it
treats PSTs sometimes as HFAs/HVAs, sometime as general composite types.
This patch implements the rules for passing PSTs by employing the
`CoerceAndExpand` method and extending it to:
* allow array types in the `coerceToType`; Now only `[N x i8]` are
considered padding.
* allow mismatch between the elements of the `coerceToType` and the
elements of the `unpaddedCoerceToType`; AArch64 uses this to map
fixed-length vector types to SVE vector types.
Corectly passing a PST argument needs a decision in Clang about whether
to pass it in memory or registers or, equivalently, whether to use the
`Indirect` or `Expand/CoerceAndExpand` method. It was considered
relatively harder (or not practically possible) to make that decision in
the AArch64 backend.
Hence this patch implements the register counting from AAPCS64 (cf.
`NSRN`, `NPRN`) to guide the Clang's decision.
If a call using the musttail attribute returns it's value through an
sret argument pointer, we must forward an incoming sret pointer to it,
instead of creating a new alloca. This is always possible because the
musttail attribute requires the caller and callee to have the same
argument and return types.
For the purpose of verifying proper arguments extensions per the target's ABI,
introduce the NoExt attribute that may be used by a target when neither sign-
or zeroextension is required (e.g. with a struct in register). The purpose of
doing so is to be able to verify that there is always one of these attributes
present and by this detecting cases where sign/zero extension is actually
missing.
As a first step, this patch has the verification step done for the SystemZ
backend only, but left off by default until all known issues have been
addressed.
Other targets/front-ends can now also add NoExt attribute where needed and do
this check in the backend.
HLSL output parameters are denoted with the `inout` and `out` keywords
in the function declaration. When an argument to an output parameter is
constructed a temporary value is constructed for the argument.
For `inout` pamameters the argument is initialized via copy-initialization
from the argument lvalue expression to the parameter type. For `out`
parameters the argument is not initialized before the call.
In both cases on return of the function the temporary value is written
back to the argument lvalue expression through an implicit assignment
binary operator with casting as required.
This change introduces a new HLSLOutArgExpr ast node which represents
the output argument behavior. The OutArgExpr has three defined children:
- An OpaqueValueExpr of the argument lvalue expression.
- An OpaqueValueExpr of the copy-initialized parameter.
- A BinaryOpExpr assigning the first with the value of the second.
Fixes#87526
---------
Co-authored-by: Damyan Pepper <damyanp@microsoft.com>
Co-authored-by: John McCall <rjmccall@gmail.com>
Backend:
- Caller and callee arguments no longer have to match, just to take up the same space, as they can be changed before the call
- Allowed tail calls if callee and callee both (or neither) use sret, wheras before it would be dissalowed if either used sret
- Allowed tail calls if byval args are used
- Added debug trace for IsEligibleForTailCallOptimisation
Frontend (clang):
- Do not generate extra alloca if sret is used with musttail, as the space for the sret is allocated already
Change-Id: Ic7f246a7eca43c06874922d642d7dc44bdfc98ec
This commit introduces attribute bpf_fastcall to declare BPF functions
that do not clobber some of the caller saved registers (R0-R5).
The idea is to generate the code complying with generic BPF ABI,
but allow compatible Linux Kernel to remove unnecessary spills and
fills of non-scratched registers (given some compiler assistance).
For such functions do register allocation as-if caller saved registers
are not clobbered, but later wrap the calls with spill and fill
patterns that are simple to recognize in kernel.
For example for the following C code:
#define __bpf_fastcall __attribute__((bpf_fastcall))
void bar(void) __bpf_fastcall;
void buz(long i, long j, long k);
void foo(long i, long j, long k) {
bar();
buz(i, j, k);
}
First allocate registers as if:
foo:
call bar # note: no spills for i,j,k (r1,r2,r3)
call buz
exit
And later insert spills fills on the peephole phase:
foo:
*(u64 *)(r10 - 8) = r1; # Such call pattern is
*(u64 *)(r10 - 16) = r2; # correct when used with
*(u64 *)(r10 - 24) = r3; # old kernels.
call bar
r3 = *(u64 *)(r10 - 24); # But also allows new
r2 = *(u64 *)(r10 - 16); # kernels to recognize the
r1 = *(u64 *)(r10 - 8); # pattern and remove spills/fills.
call buz
exit
The offsets for generated spills/fills are picked as minimal stack
offsets for the function. Allocated stack slots are not used for any
other purposes, in order to simplify in-kernel analysis.
In incremental compilation clang works with multiple `llvm::Module`s.
Our current approach is to create a CodeGenModule entity for every new
module request (via StartModule). However, some of the state such as the
mangle context needs to be preserved to keep the original semantics in
the ever-growing TU.
Fixes: llvm/llvm-project#95581.
cc: @jeaye
This commit introduces attribute bpf_fastcall to declare BPF functions
that do not clobber some of the caller saved registers (R0-R5).
The idea is to generate the code complying with generic BPF ABI, but
allow compatible Linux Kernel to remove unnecessary spills and fills of
non-scratched registers (given some compiler assistance).
For such functions do register allocation as-if caller saved registers
are not clobbered, but later wrap the calls with spill and fill patterns
that are simple to recognize in kernel.
For example for the following C code:
#define __bpf_fastcall __attribute__((bpf_fastcall))
void bar(void) __bpf_fastcall;
void buz(long i, long j, long k);
void foo(long i, long j, long k) {
bar();
buz(i, j, k);
}
First allocate registers as if:
foo:
call bar # note: no spills for i,j,k (r1,r2,r3)
call buz
exit
And later insert spills fills on the peephole phase:
foo:
*(u64 *)(r10 - 8) = r1; # Such call pattern is
*(u64 *)(r10 - 16) = r2; # correct when used with
*(u64 *)(r10 - 24) = r3; # old kernels.
call bar
r3 = *(u64 *)(r10 - 24); # But also allows new
r2 = *(u64 *)(r10 - 16); # kernels to recognize the
r1 = *(u64 *)(r10 - 8); # pattern and remove spills/fills.
call buz
exit
The offsets for generated spills/fills are picked as minimal stack
offsets for the function. Allocated stack slots are not used for any
other purposes, in order to simplify in-kernel analysis.
Corresponding functionality had been merged in Linux Kernel as
[this](https://lore.kernel.org/bpf/172179364482.1919.9590705031832457529.git-patchwork-notify@kernel.org/)
patch set (the patch assumed that `no_caller_saved_regsiters` attribute
would be used by LLVM, naming does not matter for the Kernel).
Default attributes assigned to all functions according to the command
line parameters. Some functions might have their own attributes and we
need to set or remove attributes accordingly.
Tests are updated to test this scenarios too.