LLVM support for the attribute has been implemented already, so it just
plumbs it through to the CUDA front-end.
One notable difference from NVCC is that the attribute can be used
regardless of the targeted GPU. On the older GPUs it will just be
ignored. The attribute is a performance hint, and does not warrant a
hard error if compiler can't benefit from it on a particular GPU
variant.
This commit partially implements SPIRTargetCodeGenInfo::getHLSLType. It
can now generate the spirv type for the following HLSL types:
1. RWBuffer
2. Buffer
3. Sampler
---------
Co-authored-by: Nathan Gauër <github@keenuts.net>
Enables the support of `-fcf-protection=return` on RISC-V, which
requires Zicfiss. It also adds a string attribute "hw-shadow-stack"
to every function if the option is set on RISC-V
Pure Scalable Types are defined in AAPCS64 here:
https://github.com/ARM-software/abi-aa/blob/main/aapcs64/aapcs64.rst#pure-scalable-types-psts
And should be passed according to Rule C.7 here:
https://github.com/ARM-software/abi-aa/blob/main/aapcs64/aapcs64.rst#682parameter-passing-rules
This part of the ABI is completely unimplemented in Clang, instead it
treats PSTs sometimes as HFAs/HVAs, sometime as general composite types.
This patch implements the rules for passing PSTs by employing the
`CoerceAndExpand` method and extending it to:
* allow array types in the `coerceToType`; Now only `[N x i8]` are
considered padding.
* allow mismatch between the elements of the `coerceToType` and the
elements of the `unpaddedCoerceToType`; AArch64 uses this to map
fixed-length vector types to SVE vector types.
Corectly passing a PST argument needs a decision in Clang about whether
to pass it in memory or registers or, equivalently, whether to use the
`Indirect` or `Expand/CoerceAndExpand` method. It was considered
relatively harder (or not practically possible) to make that decision in
the AArch64 backend.
Hence this patch implements the register counting from AAPCS64 (cf.
`NSRN`, `NPRN`) to guide the Clang's decision.
Translates `RWBuffer` and `StructuredBuffer` resources buffer types to
DirectX target types `dx.TypedBuffer` and `dx.RawBuffer`.
Includes a change of `HLSLAttributesResourceType` from 'sugar' type to
full canonical type. This is required for codegen and other clang
infrastructure to work property on HLSL resource types.
Fixes#95952 (part 2/2)
Gentoo is planning to introduce a `*t64` suffix for triples that will be
used by 32-bit platforms that use 64-bit `time_t`. Add support for
parsing and accepting these triples, and while at it make clang
automatically enable the necessary glibc feature macros when this suffix
is used.
An open question is whether we can backport this to LLVM 19.x. After
all, adding new triplets to Triple sounds like an ABI change — though I
suppose we can minimize the risk of breaking something if we move new
enum values to the very end.
Rename the function to reflect its correct behavior and to be consistent
with `Module::getOrInsertFunction`. This is also in preparation of
adding a new `Intrinsic::getDeclaration` that will have behavior similar
to `Module::getFunction` (i.e, just lookup, no creation).
This is primarily meant to address the issue identified in #109182,
around incorrect usage of `-fsycl-is-device`; we now have AMDGCN
flavoured SPIR-V which retains the desired behaviour around the default
AS and does not depend on the SYCL language being enabled to do so.
Overall, there are three changes:
1. We unconditionally use the `SPIRDefIsGen` AS map for AMDGCNSPIRV
target, as there is no case where the hack of setting default to private
would be desirable, and it can be used for languages other than OCL/HIP;
2. We implement `SPIRVTargetCodeGenInfo::getGlobalVarAddressSpace` for
SPIR-V in general, because otherwise using it from languages other than
HIP or OpenCL would yield 0, incorrectly;
3. We remove the incorrect usage of `-fsycl-is-device`.
This change adds support for correctly lowering the `__scoped` Clang
builtins, and corresponding scoped LLVM instructions. These were
previously unconditionally lowered to Device scope, which is possibly incorrect.
Furthermore, the default / implicit scope is changed from Device (an
OpenCL assumption) to AllSvmDevices (aka System), since the SPIR-V BE is not
OpenCL specific / can ingest IR coming from other language front-ends. OpenCL
defaulting to Device scope is now reflected in the front-end handling of atomic
ops, which seems preferable.
For the purpose of verifying proper arguments extensions per the target's ABI,
introduce the NoExt attribute that may be used by a target when neither sign-
or zeroextension is required (e.g. with a struct in register). The purpose of
doing so is to be able to verify that there is always one of these attributes
present and by this detecting cases where sign/zero extension is actually
missing.
As a first step, this patch has the verification step done for the SystemZ
backend only, but left off by default until all known issues have been
addressed.
Other targets/front-ends can now also add NoExt attribute where needed and do
this check in the backend.
This patch enable the function multiversion(FMV) and `target_clones`
attribute for RISC-V target.
The proposal of `target_clones` syntax can be found at the
https://github.com/riscv-non-isa/riscv-c-api-doc/pull/48 (which has
landed), as modified by the proposed
https://github.com/riscv-non-isa/riscv-c-api-doc/pull/85 (which adds the
priority syntax).
It supports the `target_clones` function attribute and function
multiversioning feature for RISC-V target. It will generate the ifunc
resolver function for the function that declared with target_clones
attribute.
The resolver function will check the version support by runtime object
`__riscv_feature_bits`.
For example:
```
__attribute__((target_clones("default", "arch=+ver1", "arch=+ver2"))) int bar() {
return 1;
}
```
the corresponding resolver will be like:
```
bar.resolver() {
__init_riscv_feature_bits();
// Check arch=+ver1
if ((__riscv_feature_bits.features[0] & BITMASK_OF_VERSION1) == BITMASK_OF_VERSION1) {
return bar.arch=+ver1;
} else {
// Check arch=+ver2
if ((__riscv_feature_bits.features[0] & BITMASK_OF_VERSION2) == BITMASK_OF_VERSION2) {
return bar.arch=+ver2;
} else {
// Default
return bar.default;
}
}
}
```
Adds target codegen info class for DirectX. For now it always translates
`__hlsl_resource_t` handle to `target("dx.TypedBuffer", i32, 1, 0, 1)`
(`RWBuffer<int>`). More work is needed to determine the actual target
exp type and parameters based on the resource handle attributes.
Part 1/2 of #95952
Update codegen for func param with transparent_union attr to be that of
the first union member.
This is a followup to #101738 to fix non-ppc codegen and closes#76773.
The AMDGPU kernel ABI is not directly representable in SPIR-V, since it
relies on passing aggregates `byref`, and SPIR-V only encodes `byval`
(which the AMDGPU BE disallows for kernel arguments). As a temporary
solution to this mismatch, we add special handling for AMDGCN flavoured
SPIR-V, whereby aggregates are passed as direct, both to kernels and to
normal functions. This is not ideal (there are pathological cases where
performance is heavily impacted), but empirically robust and guaranteed
to work as the AMDGPU BE retains handling of `direct` passing for legacy
reasons.
We will revisit this in the future, but as it stands it is enough to
pass a wide array of integration tests and generates correct SPIR-V and
correct reverse translation into LLVM IR. The
amdgpu-kernel-arg-pointer-type test is updated via the automated script,
and thus becomes quite noisy.
Such struct types:
```
struct {
struct{} a;
long long b;
};
stuct {
struct{} a;
double b;
};
```
For such structures, Lo is NoClass and Hi is Integer/SSE. And when this
structure argument is passed, the high part is passed at offset 8 in
memory. So we should do special handling for these types in
EmitVAArg.Fix https://github.com/llvm/llvm-project/issues/79790 and fix
https://github.com/llvm/llvm-project/issues/86371.
struct SuperEmpty { struct{ int a[0];} b;};
Such 0 sized structs in c++ mode can not be ignored in i386 for that c++
fields are never empty.But when EmitVAArg, its size is 0, so that
va_list not increase.Maybe we can just Ignore this kind of arguments,
like X86_64 did. Fix#86385.
Test `aarch64-sme-inline-streaming-attrs.c` caused some buildbot
failures, because the test was missing a `REQUIRES: aarch64-registered
target`. This was because we've demoted the error to a warning, which
then resulted in a different error message, because Clang can't actually
CodeGen the IR.
PR #77936 introduced a diagnostic to avoid calls being inlined into
functions with a different streaming mode, because inlining those
functions may result in different runtime behaviour. This was necessary
because LLVM doesn't check whether inlining is possible and thus blindly
inlines the function without checking feasibility.
In practice however, this introduces an artificial restriction that the
user may not be able to work around. Calling an `always_inline` function
from some header file that is out of the control of the user would
result in an error that the user cannot remedy.
Therefore, this patch demotes the error into a warning (for calls from
streaming[-compatible] -> non-streaming), but the proper fix would be to
fix the AlwaysInliner in LLVM to avoid inlining when it has analyzed the
callee and has determined that inlining is not possible.
Calling an always_inline function for calls from non-streaming ->
streaming will remain an error, because there is little pre-existing
code for SME, so it is expected that the header file can be modified by
the user (e.g. by using `__arm_streaming_compatible` if the code is
claimed to be compatible).
Use this to replace the emission of the amdgpu-unsafe-fp-atomics
attribute in favor of per-instruction metadata. In the future
new fine grained controls should be introduced that also cover
the integer cases.
Add a wrapper around CreateAtomicRMW that appends the metadata,
and update a few use contexts to use it.
The SystemZ ABI code was missing code to handle the transparent_union
extension. Arguments of such types are specified to be passed like
the first member of the union, instead of according to the usual
ABI calling convention for aggregates.
This did not make much difference in practice as the SystemZ ABI
already specifies that 1-, 2-, 4- or 8-byte aggregates are passed
in registers. However, there *is* a difference if the first member
of the transparent union is a scalar integer type smaller than word
size - if passed as a scalar, it needs to be zero- or sign-extended
to word size, while if passed as aggregate, it is not.
Fixed by adding code to handle transparent_union similar to what
is done on other targets.
Summary:
This patch implements support for variadic functions for NVPTX targets.
The implementation here mainly follows what was done to implement it for
AMDGPU in https://github.com/llvm/llvm-project/pull/93362.
We change the NVPTX codegen to lower all variadic arguments to functions
by-value. This creates a flattened set of arguments that the IR lowering
pass converts into a struct with the proper alignment.
The behavior of this function was determined by iteratively checking
what the NVCC copmiler generates for its output. See examples like
https://godbolt.org/z/KavfTGY93. I have noted the main methods that
NVIDIA uses to lower variadic functions.
1. All arguments are passed in a pointer to aggregate.
2. The minimum alignment for a plain argument is 4 bytes.
3. Alignment is dictated by the underlying type
4. Structs are flattened and do not have their alignment changed.
5. NVPTX never passes any arguments indirectly, even very large ones.
This patch passes the tests in the `libc` project currently, including
support for `sprintf`.
So far branch protection, sign return address, guarded control stack
attributes are
only emitted as module flags to indicate the functions need to be
generated with
those features.
The problem is in case of an LTO build the module flags are merged with
the `min`
rule which means if one of the module is not build with sign return
address then the features
will be turned off for all functions. Due to the functions take the
branch-protection and
sign-return-address features from the module flags. The
sign-return-address is
function level option therefore it is expected functions from files that
is
compiled with -mbranch-protection=pac-ret to be protected.
The inliner might inline functions with different set of flags as it
doesn't consider
the module flags.
This patch adds the attributes to all functions and drops the checking
of the module flags
for the code generation.
Module flag is still used for generating the ELF markers.
Also drops the "true"/"false" values from the
branch-protection-enforcement,
branch-protection-pauth-lr, guarded-control-stack attributes as presence
of the
attribute means it is on absence means off and no other option.
Releand with test fixes.
So far branch protection, sign return address, guarded control stack
attributes are
only emitted as module flags to indicate the functions need to be
generated with
those features.
The problem is in case of an LTO build the module flags are merged with
the `min`
rule which means if one of the module is not build with sign return
address then the features
will be turned off for all functions. Due to the functions take the
branch-protection and
sign-return-address features from the module flags. The
sign-return-address is
function level option therefore it is expected functions from files that
is
compiled with -mbranch-protection=pac-ret to be protected.
The inliner might inline functions with different set of flags as it
doesn't consider
the module flags.
This patch adds the attributes to all functions and drops the checking
of the module flags
for the code generation.
Module flag is still used for generating the ELF markers.
Also drops the "true"/"false" values from the
branch-protection-enforcement,
branch-protection-pauth-lr, guarded-control-stack attributes as presence
of the
attribute means it is on absence means off and no other option.
According to RISC-V integer calling convention empty structs or union
arguments or return values are ignored by C compilers which support them
as a non-standard extension. This is not the case for C++, which
requires them to be sized types.
Fixes#97285
This is a mostly-target-independent variadic function optimisation and
lowering pass. It is only enabled for AMDGPU in this initial commit.
The purpose is to make C style variadic functions a zero cost
abstraction. They are lowered to equivalent IR which is then amenable to
other optimisations. This is inherently slightly target specific but
much less so than one might expect - the C varargs interface heavily
constrains the ABI design divergence.
The pass is primarily tested from webassembly. This is because wasm has
a straightforward variadic lowering strategy which coincides exactly
with what this pass transforms code into and a struct passing convention
with few cases to check. Adding further targets conventions is
straightforward and elided from this patch primarily to simplify the
review. Implemented in other branches are Linux X86, AMD64, AArch64 and
NVPTX.
Testing for targets that have existing lowering for va_arg from clang is
most efficiently done by checking that clang | opt completely elides the
variadic syntax from test cases. The lowering produces a struct for each
call site which can be inspected to check the various alignment and
indirections are correct.
AMDGPU presently has no variadic support other than some ad hoc printf
handling. Combined with the pass being inactive on all other targets
landing this represents strict increase in capability with zero risk.
Testing and refining will continue post commit.
In addition to the compiler tests included here, a self contained x64
clang/musl toolchain was constructed using the "lowering" instead of the
systemv ABI and used to build various C programs like lua and libxml2.
Pass variadic arguments without changing their type, unlike the fixed
ones.
Fixed arguments are modified to better fit into registers. This patch
leaves those unchanged.
Splitting struct types into individual fields and packing small structs
into integers works well for passing via registers. Variadic arguments
are currently unimplemented in the backend. They're likely to be
implemented as a pointer to stack memory in which case register-themed
optimisations are inapplicable.
Splitting the struct into fields makes it difficult to implement va_arg
robustly. The rules around padding and alignment to inverse the struct
splitting could be constructed, but at high complexity and no particular
advantage.
Passing types as-is means there is a 1:1 correspondence with the type
information va_arg has to work with and the parameter type at the call
site.
This is an ABI change, but as the only functions affected are variadic
ones which are presently a compilation error, not a functional break.
Factored out of the larger #93362 and can land independently.
This is in effect a revert of f139ae3d93797, as we have since gained a
more sophisticated way of doing extra IRGen with the addition of
RawAddress in #86923.
Make sure that empty structs are treated as if it has a size of one bit
in function parameters and return types so that it occupies a full
argument and/or return register slot.
This fixes crashes and miscompilations when passing and/or returning
empty structs.
Reviewed by: @s-barannikov