This variable attribute is used in HLSL to add Vulkan specific builtins
in a shader.
The attribute is documented here:
17727e88fd/proposals/0011-inline-spirv.md
Those variable, even if marked as `static` are externally initialized by
the pipeline/driver/GPU. This is handled by moving them to a specific
address space `hlsl_input`, also added by this commit.
The design for input variables in Clang can be found here:
355771361e/proposals/0019-spirv-input-builtin.md
Co-authored-by: Justin Bogner <mail@justinbogner.com>
Enable _Float16 for LoongArch target. Additionally, this change fixes
incorrect ABI lowering of _Float16 in the case of structs containing
fp16 that are eligible for passing via GPR+FPR or FPR+FPR. Finally, it
also fixes int16 -> __fp16 conversion code gen, which uses generic LLVM
IR rather than llvm.convert.to.fp16 intrinsics.
OpenCL translator has a `__spirv` namespace, and defining the
`__spirv__` macro causes issues downstream on the OpenCL side. This
macro is needed to keep compatibility with HLSL/DXC, but can be avoided
for other targets/languages.
These are identified by misc-include-cleaner. I've filtered out those
that break builds. Also, I'm staying away from llvm-config.h,
config.h, and Compiler.h, which likely cause platform- or
compiler-specific build failures.
The patch introduce __builtin_spirv_generic_cast_to_ptr_explicit which
is lowered to the llvm.spv.generic.cast.to.ptr.explicit intrinsic.
The SPIR-V builtins are now split into 3 differents file:
BuiltinsSPIRVCore.td,
BuiltinsSPIRVVK.td for Vulkan specific builtins, BuiltinsSPIRVCL.td for
OpenCL specific builtins
and BuiltinsSPIRVCommon.td for common ones.
The patch also introduces a new header defining its SPIR-V friendly
equivalent (__spirv_GenericCastToPtrExplicit_ToGlobal,
__spirv_GenericCastToPtrExplicit_ToLocal and
__spirv_GenericCastToPtrExplicit_ToPrivate). The functions are declared
as aliases to the new builtin allowing C-like languages to have a
definition to rely on as well as gaining proper front-end diagnostics.
The motivation for the header is to provide a stable binding for
applications or library (such as SYCL) and allows non SPIR-V targets to
provide an implementation (via libclc or similar to how it is done for
gpuintrin.h).
This adds support under LoongArch for the target("..") attributes.
The supported formats are:
- "arch=<arch>" strings, that specify the architecture features for a
function as per the -march=arch option.
- "tune=<cpu>" strings, that specify the tune-cpu cpu for a function as
per -mtune.
- "<feature>", "no-<feature>" enabled/disables the specific feature.
As best as I can see, all NVPTX architectures support the generic
address space.
I note there's a FIXME in the target's address space map about 'generic'
still having to be added to the target but we haven't observed any
issues with it downstream. The generic address space is mapped to the
same target address space as default/private (0), but this isn't
necessarily a problem for users.
Of the 128-bits of buffer descriptor only 48 bits are address bits, so
following the discussion on https://discourse.llvm.org/t/clarifiying-the-semantics-of-ptrtoint/83987/54,
the logic conclusion is to set the index width to 48 bits instead of
the current value of 128.
Most of the test changes are mechanical datalayout updates, but there
is one actual change: the ptrmask test now uses .i48 instead of .i128
and I had to update SelectionDAGBuilder to correctly extend the mask.
Reviewed By: krzysz00
Pull Request: https://github.com/llvm/llvm-project/pull/139419
This patch adds preprocessor macros when Zicfilp CFI is enabled. To be
specific:
+ `#define __riscv_landing_pad 1` when `-fcf-protection=[full|branch]`
+ `#define __riscv_landing_pad_unlabeled 1` when
`-fcf-protection=[full|branch] -mcf-branch-label-scheme=unlabeled`
The macros are proposed in riscv-non-isa/riscv-c-api-doc#76 , and the
CLI flags are from riscv-non-isa/riscv-toolchain-conventions#54.
The "target-features" function attribute is not currently considered
when adding vscale_range to a function. When +sve/+sme are pushed onto
functions with "#pragma attribute push(+sve/+sme)", the function
potentially misses out on optimizations that rely on vscale_range being
present.
The instructions are not supported on either 32-bit ELF (due to no
redzone) or 32-bit AIX due to the instructions always using the full
64-bit width of the register inputs.
Following #137070, this PR adds an initial set of Intel `OffloadArch`
values with corresponding predicates that will be used in SYCL
offloading. More Intel architectures will be added in a future PR.
Based on feedback from https://github.com/llvm/llvm-project/pull/136753,
remove the dummy values for OpenCL and make them match the zero default
AS map.
Signed-off-by: Sarnie, Nick <nick.sarnie@intel.com>
Since commit 613a077b05b8352a48695be295037306f5fca151, `flang` doesn't
build any longer on Solaris/amd64:
```
flang/lib/Evaluate/intrinsics-library.cpp:225:26:
error: address of overloaded function 'acos' does not match required type '__float128 (__float128)'
225 | FolderFactory<F, F{std::acos}>::Create("acos"),
| ^~~~~~~~~
```
That patch led to the version of `quadmath.h` deep inside `/usr/gcc/<N>`
to be found, thus `HAS_QUADMATHLIB` is defined. However, the `struct
HostRuntimeLibrary<__float128, LibraryVersion::Libm>` template is
guarded by `_POSIX_C_SOURCE >= 200112L || _XOPEN_SOURCE >= 600`, while
`clang` only predefines `_XOPEN_SOURCE=500`.
This code dates back to commit 0c1941cb055fcf008e17faa6605969673211bea3
back in 2012. Currently, this is long obsolete and `gcc` prefefines
`_XOPEN_SOURCE=600` instead since GCC 4.6 back in 2011.
This patch follows that.
Tested on `amd64-pc-solaris2.11` and `sparcv9-sun-solaris2.11`.
- _Float16 is now accepted by Clang.
- The half IR type is fully handled by the backend.
- These values are passed in FP registers and converted to/from float around
each operation.
- Compiler-rt conversion functions are now built for s390x including the missing
extendhfdf2 which was added.
Fixes#50374
The recently announced IBM z17 processor implements the architecture
already supported as "arch15" in LLVM. This patch adds support for "z17"
as an alternate architecture name for arch15.
This patch also add the scheduler description for the z17 processor,
provided by Jonas Paulsson.
This is an alternative to
https://github.com/llvm/llvm-project/pull/122103
In SPIR-V, private global variables have the Private storage class. This
PR adds a new address space which allows frontend to emit variable with
this storage class when targeting this backend.
This is covered in this proposal: llvm/wg-hlsl@4c9e11a
This PR will cause addrspacecast to show up in several cases, like class
member functions or assignment. Those will have to be handled in the
backend later on, particularly to fixup pointer storage classes in some
functions.
Before this change, global variable were emitted with the 'Function'
storage class, which was wrong.
SPIR-V has strict address space rules, constant globals cannot be in the
default address space.
The OMPIRBuilder change was required for lit tests to pass, we were
missing an addrspacecast.
---------
Signed-off-by: Sarnie, Nick <nick.sarnie@intel.com>
- fixes#132303
- Moves dot2add from a language builtin to a target builtin.
- Sets the scaffolding for Sema checks for DX builtins
- Setup DirectX backend as able to have target builtins
- Adds a DX TargetBuiltins emitter in
`clang/lib/CodeGen/TargetBuiltins/DirectX.cpp`
This patch introduces the `vmem-to-lds-load-insts` target feature, which
can be used to enable builtins `__builtin_amdgcn_global_load_lds` and
`__builtin_amdgcn_raw_ptr_buffer_load_lds` on platforms which have this
feature.
This feature is only available on gfx9/10.
A limitation of using a common target feature for both builtins is that
we could have made `__builtin_amdgcn_raw_ptr_buffer_load_lds` available
on gfx6,7,8.
The i6400 and i6500 are high performance multi-core microprocessors from
MIPS that provide best in class power efficiency for use in
system-on-chip (SoC) applications. i6400 and i6500 implements Release 6
of the MIPS64 Instruction Set Architecture with full hardware
multithreading and hardware virtualization support.
I broke this in
f3cd223838,
I should have added this to the `SPIRV64` subclass, but I accidentally
added it to base `TargetInfo`.
Using an unsupported target should error in the driver way before this
though.
Signed-off-by: Sarnie, Nick <nick.sarnie@intel.com>
This patch makes Clang predefine `_CRT_USE_BUILTIN_OFFSETOF` in
MS-compatible modes. The macro can make the `offsetof` provided by MS
UCRT's `<stddef.h>` to select the `__builtin_offsetof` version, so with
it Clang (Clang-cl) can directly consume UCRT's `offsetof`.
MSVC predefines the macro as `1` since at least VS 2017 19.14, but I
think it's also OK to define it in "older" compatible modes.
Fixes#59689.
As discussed in [1], introduce BPF instructions with load-acquire and
store-release semantics under -mcpu=v4. Define 2 new flags:
BPF_LOAD_ACQ 0x100
BPF_STORE_REL 0x110
A "load-acquire" is a BPF_STX | BPF_ATOMIC instruction with the 'imm'
field set to BPF_LOAD_ACQ (0x100).
Similarly, a "store-release" is a BPF_STX | BPF_ATOMIC instruction with
the 'imm' field set to BPF_STORE_REL (0x110).
Unlike existing atomic read-modify-write operations that only support
BPF_W (32-bit) and BPF_DW (64-bit) size modifiers, load-acquires and
store-releases also support BPF_B (8-bit) and BPF_H (16-bit). An 8- or
16-bit load-acquire zero-extends the value before writing it to a 32-bit
register, just like ARM64 instruction LDAPRH and friends.
As an example (assuming little-endian):
long foo(long *ptr) {
return __atomic_load_n(ptr, __ATOMIC_ACQUIRE);
}
foo() can be compiled to:
db 10 00 00 00 01 00 00 r0 = load_acquire((u64 *)(r1 + 0x0))
95 00 00 00 00 00 00 00 exit
opcode (0xdb): BPF_ATOMIC | BPF_DW | BPF_STX
imm (0x00000100): BPF_LOAD_ACQ
Similarly:
void bar(short *ptr, short val) {
__atomic_store_n(ptr, val, __ATOMIC_RELEASE);
}
bar() can be compiled to:
cb 21 00 00 10 01 00 00 store_release((u16 *)(r1 + 0x0), w2)
95 00 00 00 00 00 00 00 exit
opcode (0xcb): BPF_ATOMIC | BPF_H | BPF_STX
imm (0x00000110): BPF_STORE_REL
Inline assembly is also supported.
Add a pre-defined macro, __BPF_FEATURE_LOAD_ACQ_STORE_REL, to let
developers detect this new feature. It can also be disabled using a new
llc option, -disable-load-acq-store-rel.
Using __ATOMIC_RELAXED for __atomic_store{,_n}() will generate a "plain"
store (BPF_MEM | BPF_STX) instruction:
void foo(short *ptr, short val) {
__atomic_store_n(ptr, val, __ATOMIC_RELAXED);
}
6b 21 00 00 00 00 00 00 *(u16 *)(r1 + 0x0) = w2
95 00 00 00 00 00 00 00 exit
Similarly, using __ATOMIC_RELAXED for __atomic_load{,_n}() will generate
a zero-extending, "plain" load (BPF_MEM | BPF_LDX) instruction:
int foo(char *ptr) {
return __atomic_load_n(ptr, __ATOMIC_RELAXED);
}
71 11 00 00 00 00 00 00 w1 = *(u8 *)(r1 + 0x0)
bc 10 08 00 00 00 00 00 w0 = (s8)w1
95 00 00 00 00 00 00 00 exit
Currently __ATOMIC_CONSUME is an alias for __ATOMIC_ACQUIRE. Using
__ATOMIC_SEQ_CST ("sequentially consistent") is not supported yet and
will cause an error:
$ clang --target=bpf -mcpu=v4 -c bar.c > /dev/null
bar.c:1:5: error: sequentially consistent (seq_cst) atomic load/store is
not supported
1 | int foo(int *ptr) { return __atomic_load_n(ptr, __ATOMIC_SEQ_CST); }
| ^
...
Finally, rename those isST*() and isLD*() helper functions in
BPFMISimplifyPatchable.cpp based on what the instructions actually do,
rather than their instruction class.
[1]
https://lore.kernel.org/all/20240729183246.4110549-1-yepeilin@google.com/
This patch adds a function attribute `riscv_vls_cc` for RISCV VLS
calling
convention which takes 0 or 1 argument, the argument is the `ABI_VLEN`
which is the `VLEN` for passing the fixed-vector arguments, it wraps the
argument as a scalable vector(VLA) using the `ABI_VLEN` and uses the
corresponding mechanism to handle it. The range of `ABI_VLEN` is [32,
65536],
if not specified, the default value is 128.
Here is an example of VLS argument passing:
Non-VLS call:
```
void original_call(__attribute__((vector_size(16))) int arg) {}
=>
define void @original_call(i128 noundef %arg) {
entry:
...
ret void
}
```
VLS call:
```
void __attribute__((riscv_vls_cc(256))) vls_call(__attribute__((vector_size(16))) int arg) {}
=>
define riscv_vls_cc void @vls_call(<vscale x 1 x i32> %arg) {
entry:
...
ret void
}
}
```
The first Non-VLS call passes generic vector argument of 16 bytes by
flattened integer.
On the contrary, the VLS call uses `ABI_VLEN=256` which wraps the
vector to <vscale x 1 x i32> where the number of scalable vector
elements
is calaulated by: `ORIG_ELTS * RVV_BITS_PER_BLOCK / ABI_VLEN`.
Note: ORIG_ELTS = Vector Size / Type Size = 128 / 32 = 4.
PsABI PR: https://github.com/riscv-non-isa/riscv-elf-psabi-doc/pull/418
C-API PR: https://github.com/riscv-non-isa/riscv-c-api-doc/pull/68
Add option and statement attribute for controlling emitting of
target-specific
metadata to atomicrmw instructions in IR.
The RFC for this attribute and option is
https://discourse.llvm.org/t/rfc-add-clang-atomic-control-options-and-pragmas/80641,
Originally a pragma was proposed, then it was changed to clang
attribute.
This attribute allows users to specify one, two, or all three options
and must be applied
to a compound statement. The attribute can also be nested, with inner
attributes
overriding the options specified by outer attributes or the target's
default
options. These options will then determine the target-specific metadata
added to atomic
instructions in the IR.
In addition to the attribute, three new compiler options are introduced:
`-f[no-]atomic-remote-memory`, `-f[no-]atomic-fine-grained-memory`,
`-f[no-]atomic-ignore-denormal-mode`.
These compiler options allow users to override the default options
through the
Clang driver and front end. `-m[no-]unsafe-fp-atomics` is aliased to
`-f[no-]ignore-denormal-mode`.
In terms of implementation, the atomic attribute is represented in the
AST by the
existing AttributedStmt, with minimal changes to AST and Sema.
During code generation in Clang, the CodeGenModule maintains the current
atomic options,
which are used to emit the relevant metadata for atomic instructions.
RAII is used
to manage the saving and restoring of atomic options when entering
and exiting nested AttributedStmt.