1643 Commits

Author SHA1 Message Date
Nathan Gauër
20d70196c9
[HLSL][SPIR-V] Implement vk::ext_builtin_input attribute (#138530)
This variable attribute is used in HLSL to add Vulkan specific builtins
in a shader.
The attribute is documented here:

17727e88fd/proposals/0011-inline-spirv.md

Those variable, even if marked as `static` are externally initialized by
the pipeline/driver/GPU. This is handled by moving them to a specific
address space `hlsl_input`, also added by this commit.

The design for input variables in Clang can be found here:
355771361e/proposals/0019-spirv-input-builtin.md


Co-authored-by: Justin Bogner <mail@justinbogner.com>
2025-06-04 13:22:37 +02:00
Ami-zhang
06f779b69d
Reland "[Clang][LoongArch] Support target attribute for function" (#142546)
This relands #140700. I have updated the test case('targetattr.c') to
resolve the test failure.

Original PR resulted in test fail:
https://lab.llvm.org/buildbot/#/builders/11/builds/16173
https://lab.llvm.org/buildbot/#/builders/202/builds/1531

Original description:
Followup to #140700.
2025-06-03 15:57:50 +08:00
Ami-zhang
8c65f68330
[clang][LoongArch] Add support for the _Float16 type (#141703)
Enable _Float16 for LoongArch target. Additionally, this change fixes
incorrect ABI lowering of _Float16 in the case of structs containing
fp16 that are eligible for passing via GPR+FPR or FPR+FPR. Finally, it
also fixes int16 -> __fp16 conversion code gen, which uses generic LLVM
IR rather than llvm.convert.to.fp16 intrinsics.
2025-06-03 14:26:11 +08:00
Tomas Matheson
832a7bb460
[AArch64] Add missing Neon Types (#126945)
The AAPCS64 adds a number of vector types to the C unconditionally:
https://github.com/ARM-software/abi-aa/blob/main/aapcs64/aapcs64.rst#11appendix-support-for-advanced-simd-extensions

The equivalent SVE types are already available in clang:
https://github.com/ARM-software/abi-aa/blob/main/aapcs64/aapcs64.rst#12appendix-support-for-scalable-vectors

__mfp8 is defined in the ACLE
https://arm-software.github.io/acle/main/acle.html#data-types

---------

Co-authored-by: David Green <david.green@arm.com>
2025-06-02 17:09:35 +01:00
Nathan Gauër
df5f65d22a
[SPIR-V] Only emit __spirv__ when targeting HLSL (#142401)
OpenCL translator has a `__spirv` namespace, and defining the
`__spirv__` macro causes issues downstream on the OpenCL side. This
macro is needed to keep compatibility with HLSL/DXC, but can be avoided
for other targets/languages.
2025-06-02 11:51:53 -04:00
Kazu Hirata
cd9fe8a34c
[Basic] Remove unused includes (NFC) (#142295)
These are identified by misc-include-cleaner.  I've filtered out those
that break builds.  Also, I'm staying away from llvm-config.h,
config.h, and Compiler.h, which likely cause platform- or
compiler-specific build failures.
2025-05-31 19:00:31 -07:00
Paul Kirth
d93788fcbf
Revert "[Clang][LoongArch] Support target attribute for function" (#141998)
Reverts llvm/llvm-project#140700

This breaks bots both in buildbot and downstream CI: 
- https://lab.llvm.org/buildbot/#/builders/11/builds/16173
- https://lab.llvm.org/buildbot/#/builders/202/builds/1531
-
https://ci.chromium.org/ui/p/fuchsia/builders/toolchain.ci/clang-host-linux-x64/b8713537585914796017/overview
2025-05-29 11:26:44 -07:00
Victor Lomuller
c474f8f240
[clang][SPIRV] Add builtin for OpGenericCastToPtrExplicit and its SPIR-V friendly binding (#137805)
The patch introduce __builtin_spirv_generic_cast_to_ptr_explicit which
is lowered to the llvm.spv.generic.cast.to.ptr.explicit intrinsic.

The SPIR-V builtins are now split into 3 differents file:
BuiltinsSPIRVCore.td,
BuiltinsSPIRVVK.td for Vulkan specific builtins, BuiltinsSPIRVCL.td for
OpenCL specific builtins
and BuiltinsSPIRVCommon.td for common ones.

The patch also introduces a new header defining its SPIR-V friendly
equivalent (__spirv_GenericCastToPtrExplicit_ToGlobal,
__spirv_GenericCastToPtrExplicit_ToLocal and
__spirv_GenericCastToPtrExplicit_ToPrivate). The functions are declared
as aliases to the new builtin allowing C-like languages to have a
definition to rely on as well as gaining proper front-end diagnostics.

The motivation for the header is to provide a stable binding for
applications or library (such as SYCL) and allows non SPIR-V targets to
provide an implementation (via libclc or similar to how it is done for
gpuintrin.h).
2025-05-29 15:19:40 +02:00
Ami-zhang
b359422eeb
[Clang][LoongArch] Support target attribute for function (#140700)
This adds support under LoongArch for the target("..") attributes.

The supported formats are:
- "arch=<arch>" strings, that specify the architecture features for a
function as per the -march=arch option.
- "tune=<cpu>" strings, that specify the tune-cpu cpu for a function as
per -mtune.
- "<feature>", "no-<feature>" enabled/disables the specific feature.
2025-05-29 19:54:48 +08:00
Steven Perron
5584020d8a
[HLSL][SPIRV] Implement the SPIR-V target type for cbuffers. (#140061)
This change implement the type used to represent cbuffer for SPIR-V.

Fixes https://github.com/llvm/llvm-project/issues/138274.
2025-05-28 07:51:03 -04:00
CarolineConcatto
7569de5272
[Clang][AArch64]Add FP8 ACLE macros implementation (#140591)
This patch implements the macros described in the ACLE[1]

[1]
https://github.com/ARM-software/acle/blob/main/main/acle.md#modal-8-bit-floating-point-extensions
2025-05-27 10:01:38 +01:00
hev
689342de25
[Clang][LoongArch] Add inline asm support for the q constraint (#141037)
This patch adds support for the `q` constraint:
a general-purpose register except for $r0 and $r1 (for the csrxchg
instruction)

Link: https://gcc.gnu.org/pipermail/gcc-patches/2025-May/684339.html
2025-05-23 11:14:41 +08:00
Fraser Cormack
6553dc30b8
[NVPTX] Support the OpenCL generic addrspace feature by default (#137940)
As best as I can see, all NVPTX architectures support the generic
address space.

I note there's a FIXME in the target's address space map about 'generic'
still having to be added to the target but we haven't observed any
issues with it downstream. The generic address space is mapped to the
same target address space as default/private (0), but this isn't
necessarily a problem for users.
2025-05-21 09:55:11 +01:00
Jim Lin
d561d595c4
[RISCV] Implement intrinsics for XAndesVPackFPH (#140007)
This patch implements clang intrinsic support for XAndesVPackFPH.

The document for the intrinsics can be found at:

https://github.com/andestech/andes-vector-intrinsic-doc/blob/ast-v5_4_0-release-v5/auto-generated/andes-v5/intrinsic_funcs.adoc#andes-vector-packed-fp16-extensionxandesvpackfph
and with policy variants

https://github.com/andestech/andes-vector-intrinsic-doc/blob/ast-v5_4_0-release-v5/auto-generated/andes-v5/policy_funcs/intrinsic_funcs.adoc#andes-vector-packed-fp16-extensionxandesvpackfph

Co-authored-by: Tony Chuan-Yue Yuan <yuan593@andestech.com>
2025-05-20 13:16:51 +08:00
Alexander Richardson
07e2ba445d
[AMDGPU] Set AS8 address width to 48 bits
Of the 128-bits of buffer descriptor only 48 bits are address bits, so
following the discussion on https://discourse.llvm.org/t/clarifiying-the-semantics-of-ptrtoint/83987/54,
the logic conclusion is to set the index width to 48 bits instead of
the current value of 128.

Most of the test changes are mechanical datalayout updates, but there
is one actual change: the ptrmask test now uses .i48 instead of .i128
and I had to update SelectionDAGBuilder to correctly extend the mask.

Reviewed By: krzysz00

Pull Request: https://github.com/llvm/llvm-project/pull/139419
2025-05-19 17:26:05 -07:00
Ming-Yi Lai
c28d6c2f5f
[Clang][RISCV] Add Zicfilp CFI unlabeled scheme preprocessor macros (#109600)
This patch adds preprocessor macros when Zicfilp CFI is enabled. To be
specific:

+ `#define __riscv_landing_pad 1` when `-fcf-protection=[full|branch]`
+ `#define __riscv_landing_pad_unlabeled 1` when
`-fcf-protection=[full|branch] -mcf-branch-label-scheme=unlabeled`

The macros are proposed in riscv-non-isa/riscv-c-api-doc#76 , and the
CLI flags are from riscv-non-isa/riscv-toolchain-conventions#54.
2025-05-19 18:39:31 +08:00
Kazu Hirata
9adcb4fe12
[clang] Use llvm::replace (NFC) (#140264) 2025-05-16 09:06:31 -07:00
Matthew Devereau
22576e2cce
[Clang][AArch64] Add pessimistic vscale_range for sve/sme (#137624)
The "target-features" function attribute is not currently considered
when adding vscale_range to a function. When +sve/+sme are pushed onto
functions with "#pragma attribute push(+sve/+sme)", the function
potentially misses out on optimizations that rely on vscale_range being
present.
2025-05-16 09:39:07 +01:00
Kazu Hirata
0f0fd6213e
[Basic] Use std::optional::value_or (NFC) (#140172) 2025-05-15 23:28:57 -07:00
Sean Fertile
4ac8e90df7
[PPC] Disable rop-protect for 32-bit OS targets. (#139619)
The instructions are not supported on either 32-bit ELF (due to no
redzone) or 32-bit AIX due to the instructions always using the full
64-bit width of the register inputs.
2025-05-14 13:00:15 -04:00
Prabhu Rajasekaran
20d6375796
[clang] Handle CC attrs for UEFI (#138935)
UEFI's default ABI is MS ABI. Handle the calling convention attributes
accordingly.
2025-05-07 21:42:01 -07:00
Tomohiro Kashiwada
eb6d51a2fd
[Cygwin] Enable TLS on Cygwin target (#138618)
Cygwin environment and toolchain supports EMUTLS.

From
https://cygwin.com/git/?p=newlib-cygwin.git;a=blob;f=config/tls.m4;hb=HEAD#l118,

```
$ LANG=C gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-pc-cygwin/15/lto-wrapper.exe Target: x86_64-pc-cygwin
Configured with: (snip)
Thread model: posix
Supported LTO compression algorithms: zlib zstd
gcc version 15.0.1 20250406 (experimental) (GCC)

$ echo '__thread int a; int b; int main() { return a = b; }' | gcc -S -xc -o- - | grep __emutls_get_address
        call    __emutls_get_address
        .def    __emutls_get_address;   .scl    2;      .type   32;     .endef
```
2025-05-06 23:38:18 +03:00
Justin Cai
faf4e8af74
[Clang][SYCL] Add initial set of Intel OffloadArch values (#138158)
Following #137070, this PR adds an initial set of Intel `OffloadArch`
values with corresponding predicates that will be used in SYCL
offloading. More Intel architectures will be added in a future PR.
2025-05-01 16:29:48 -05:00
Prabhu Rajasekaran
6274cdb9a6
[clang] Fix UEFI Target info (#127290)
For X64 UEFI targets set appropriate integer type sizes, and relevant
ABI information. 

---------

Co-authored-by: Petr Hosek <phosek@google.com>
2025-04-30 10:07:57 -07:00
Fraser Cormack
1e31f4b5eb
[AMDGPU] Support the OpenCL generic addrspace feature by default (#137636)
This feature should be supported on AMDGCN architectures with flat
addressing.
2025-04-29 14:14:00 +01:00
Nick Sarnie
f9ee5ce605
[clang][SPIR-V] Fix OpenCL addrspace mapping when using non-zero default AS (#137187)
Based on feedback from https://github.com/llvm/llvm-project/pull/136753,
remove the dummy values for OpenCL and make them match the zero default
AS map.

Signed-off-by: Sarnie, Nick <nick.sarnie@intel.com>
2025-04-28 19:26:01 +00:00
Rainer Orth
e71c8ea3cc
[Driver] Fix _XOPEN_SOURCE definition on Solaris (#137141)
Since commit 613a077b05b8352a48695be295037306f5fca151, `flang` doesn't
build any longer on Solaris/amd64:
```
flang/lib/Evaluate/intrinsics-library.cpp:225:26:
error: address of overloaded function 'acos' does not match required type '__float128 (__float128)'
  225 |       FolderFactory<F, F{std::acos}>::Create("acos"),
      |                          ^~~~~~~~~
```
That patch led to the version of `quadmath.h` deep inside `/usr/gcc/<N>`
to be found, thus `HAS_QUADMATHLIB` is defined. However, the `struct
HostRuntimeLibrary<__float128, LibraryVersion::Libm>` template is
guarded by `_POSIX_C_SOURCE >= 200112L || _XOPEN_SOURCE >= 600`, while
`clang` only predefines `_XOPEN_SOURCE=500`.

This code dates back to commit 0c1941cb055fcf008e17faa6605969673211bea3
back in 2012. Currently, this is long obsolete and `gcc` prefefines
`_XOPEN_SOURCE=600` instead since GCC 4.6 back in 2011.

This patch follows that.

Tested on `amd64-pc-solaris2.11` and `sparcv9-sun-solaris2.11`.
2025-04-26 17:06:04 +02:00
Nick Sarnie
52a96491e1
[clang][SPIR-V] Addrspace of opencl_global should always be 1 (#136753)
This fixes a CUDA SPIR-V regression introduced in
https://github.com/llvm/llvm-project/pull/134399.

---------

Signed-off-by: Sarnie, Nick <nick.sarnie@intel.com>
2025-04-24 14:20:13 +00:00
Ben Shi
b0524f3329
[clang][AVR] Improve compatibility of inline assembly with avr-gcc (#136534)
Allow the value 64 to be round up to 0 for constraint 'I'.
2025-04-23 18:42:07 +08:00
modiking
d6a68be7af
[NVPTX] Add support for Shared Cluster Memory address space [1/2] (#135444)
Adds support for new Shared Cluster Memory Address Space
(SHARED_CLUSTER, addrspace 7). See
https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#distributed-shared-memory
for details.

1. Update address space structures and datalayout to contain the new
space
2. Add new intrinsics that use this new address space
3. Update NVPTX alias analysis

The existing intrinsics are updated in
https://github.com/llvm/llvm-project/pull/136768
2025-04-22 15:14:39 -07:00
Steven Perron
c073c22865
[HLSL] Use hlsl_device address space for getpointer. (#127675)
We add the hlsl_device address space to represent the device memory
space as defined in section 1.7.1.3 of the [HLSL
spec](https://microsoft.github.io/hlsl-specs/specs/hlsl.pdf).

Fixes https://github.com/llvm/llvm-project/issues/127075
2025-04-22 13:26:32 -04:00
Kazu Hirata
0ed1c9862d
[clang] llvm::append_range (NFC) (#136440) 2025-04-19 10:37:25 -07:00
Jonas Paulsson
6d03f51f0c
[SystemZ] Add support for 16-bit floating point. (#109164)
- _Float16 is now accepted by Clang.

- The half IR type is fully handled by the backend.

- These values are passed in FP registers and converted to/from float around
  each operation.

- Compiler-rt conversion functions are now built for s390x including the missing
  extendhfdf2 which was added.

Fixes #50374
2025-04-16 20:02:56 +02:00
yingopq
1ee8fe810f
[Mips] Fix clang compile error when -march=p5600 with -mmsa (#132679)
When -march=p5600 with -mmsa, the result of getISARev is 0, so report
error. Append p5600 to cases mips32r5.

Fix #91948.
2025-04-15 17:01:36 +08:00
Ulrich Weigand
80267f8148
Support z17 processor name and scheduler description (#135254)
The recently announced IBM z17 processor implements the architecture
already supported as "arch15" in LLVM. This patch adds support for "z17"
as an alternate architecture name for arch15.

This patch also add the scheduler description for the z17 processor,
provided by Jonas Paulsson.
2025-04-11 00:20:58 +02:00
Nathan Gauër
a625bc60e2
[HLSL][SPIR-V] Add hlsl_private address space for SPIR-V (#133464)
This is an alternative to
https://github.com/llvm/llvm-project/pull/122103

In SPIR-V, private global variables have the Private storage class. This
PR adds a new address space which allows frontend to emit variable with
this storage class when targeting this backend.

This is covered in this proposal: llvm/wg-hlsl@4c9e11a

This PR will cause addrspacecast to show up in several cases, like class
member functions or assignment. Those will have to be handled in the
backend later on, particularly to fixup pointer storage classes in some
functions.

Before this change, global variable were emitted with the 'Function'
storage class, which was wrong.
2025-04-10 10:55:10 +02:00
Nick Sarnie
68ee56d150
[clang][OpenMP][SPIR-V] Fix addrspace of global constants (#134399)
SPIR-V has strict address space rules, constant globals cannot be in the
default address space.

The OMPIRBuilder change was required for lit tests to pass, we were
missing an addrspacecast.

---------

Signed-off-by: Sarnie, Nick <nick.sarnie@intel.com>
2025-04-09 15:41:53 +00:00
Farzon Lotfi
16c84c4475
[DirectX] Add target builtins (#134439)
- fixes #132303
- Moves dot2add from a language builtin to a target builtin.
-  Sets the scaffolding for Sema checks for DX builtins
-  Setup DirectX backend as able to have target builtins
- Adds a DX TargetBuiltins emitter in
`clang/lib/CodeGen/TargetBuiltins/DirectX.cpp`
2025-04-07 12:06:57 -04:00
Juan Manuel Martinez Caamaño
beae0e9f1a
[AMDGPU] Use a target feature to enable __builtin_amdgcn_global_load_lds on gfx9/10 (#133055)
This patch introduces the `vmem-to-lds-load-insts` target feature, which
can be used to enable builtins `__builtin_amdgcn_global_load_lds` and
`__builtin_amdgcn_raw_ptr_buffer_load_lds` on platforms which have this
feature.

This feature is only available on gfx9/10.

A limitation of using a common target feature for both builtins is that
we could have made `__builtin_amdgcn_raw_ptr_buffer_load_lds` available
on gfx6,7,8.
2025-04-02 20:00:09 +02:00
Cassandra Beckley
9ce77255b9
[HLSL] Add __spirv__ macro (#132848)
This macro can be used by HLSL code to detect that it is being compiled
for the SPIR-V target.
2025-03-28 10:49:19 -04:00
Mallikarjuna Gouda
0ca10ef51b
[MIPS] Add MIPS i6400 and i6500 processors (#130587)
The i6400 and i6500 are high performance multi-core microprocessors from
MIPS that provide best in class power efficiency for use in
system-on-chip (SoC) applications. i6400 and i6500 implements Release 6
of the MIPS64 Instruction Set Architecture with full hardware
multithreading and hardware virtualization support.
2025-03-20 23:08:33 -04:00
Shilei Tian
dccc0a836c
[NFC][AMDGPU] Replace more direct arch comparison with isAMDGCN() (#131379)
This is an extension of #131357. Hopefully this would be the last one.
2025-03-14 17:02:15 -04:00
zhijian lin
737a0aeb6b
[NFC][PowerPC] cleaned dead code of PPC.cpp and PPC.h (#130994)
There are some variables in the PPC.h which are defined and assigned a
value to them,
but never be used, remove the code related to the variables.
2025-03-14 09:24:44 -04:00
Hubert Tong
e0e80dbe43
[Clang codegen][PPC] Produce AIX-specific "target features" only for AIX (#130864)
Listing AIX-specific "target features" in the IR are a source of
confusion on PPC Linux. Generate them only for AIX (at least by
default).
2025-03-13 18:13:03 -04:00
Nick Sarnie
7a5e4f5405
[clang][NFCI] Fix getGridValues for unsupported targets (#131023)
I broke this in
f3cd223838,
I should have added this to the `SPIRV64` subclass, but I accidentally
added it to base `TargetInfo`.

Using an unsupported target should error in the driver way before this
though.

Signed-off-by: Sarnie, Nick <nick.sarnie@intel.com>
2025-03-13 14:28:49 +00:00
A. Jiang
6abe19ac58
[clang] Predefine _CRT_USE_BUILTIN_OFFSETOF in MS-compatible modes (#127568)
This patch makes Clang predefine `_CRT_USE_BUILTIN_OFFSETOF` in
MS-compatible modes. The macro can make the `offsetof` provided by MS
UCRT's `<stddef.h>` to select the `__builtin_offsetof` version, so with
it Clang (Clang-cl) can directly consume UCRT's `offsetof`.

MSVC predefines the macro as `1` since at least VS 2017 19.14, but I
think it's also OK to define it in "older" compatible modes.

Fixes #59689.
2025-03-13 14:02:44 +08:00
Prabhuk
45ca613c13
[clang] Use TargetInfo to decide Mangling for C (#129920)
Instead of hardcoding the decision on what mangling scheme to use based
on targets, use TargetInfo to make the decision.
2025-03-05 17:26:52 -08:00
Peilin Ye
17bfc00f7c
[BPF] Add load-acquire and store-release instructions under -mcpu=v4 (#108636)
As discussed in [1], introduce BPF instructions with load-acquire and
store-release semantics under -mcpu=v4.  Define 2 new flags:

  BPF_LOAD_ACQ    0x100
  BPF_STORE_REL   0x110

A "load-acquire" is a BPF_STX | BPF_ATOMIC instruction with the 'imm'
field set to BPF_LOAD_ACQ (0x100).

Similarly, a "store-release" is a BPF_STX | BPF_ATOMIC instruction with
the 'imm' field set to BPF_STORE_REL (0x110).

Unlike existing atomic read-modify-write operations that only support
BPF_W (32-bit) and BPF_DW (64-bit) size modifiers, load-acquires and
store-releases also support BPF_B (8-bit) and BPF_H (16-bit).  An 8- or
16-bit load-acquire zero-extends the value before writing it to a 32-bit
register, just like ARM64 instruction LDAPRH and friends.

As an example (assuming little-endian):

  long foo(long *ptr) {
      return __atomic_load_n(ptr, __ATOMIC_ACQUIRE);
  }

foo() can be compiled to:

  db 10 00 00 00 01 00 00  r0 = load_acquire((u64 *)(r1 + 0x0))
  95 00 00 00 00 00 00 00  exit

  opcode (0xdb): BPF_ATOMIC | BPF_DW | BPF_STX
  imm (0x00000100): BPF_LOAD_ACQ

Similarly:

  void bar(short *ptr, short val) {
      __atomic_store_n(ptr, val, __ATOMIC_RELEASE);
  }

bar() can be compiled to:

  cb 21 00 00 10 01 00 00  store_release((u16 *)(r1 + 0x0), w2)
  95 00 00 00 00 00 00 00  exit

  opcode (0xcb): BPF_ATOMIC | BPF_H | BPF_STX
  imm (0x00000110): BPF_STORE_REL

Inline assembly is also supported.

Add a pre-defined macro, __BPF_FEATURE_LOAD_ACQ_STORE_REL, to let
developers detect this new feature.  It can also be disabled using a new
llc option, -disable-load-acq-store-rel.

Using __ATOMIC_RELAXED for __atomic_store{,_n}() will generate a "plain"
store (BPF_MEM | BPF_STX) instruction:

  void foo(short *ptr, short val) {
      __atomic_store_n(ptr, val, __ATOMIC_RELAXED);
  }

  6b 21 00 00 00 00 00 00  *(u16 *)(r1 + 0x0) = w2
  95 00 00 00 00 00 00 00  exit

Similarly, using __ATOMIC_RELAXED for __atomic_load{,_n}() will generate
a zero-extending, "plain" load (BPF_MEM | BPF_LDX) instruction:

  int foo(char *ptr) {
      return __atomic_load_n(ptr, __ATOMIC_RELAXED);
  }

  71 11 00 00 00 00 00 00  w1 = *(u8 *)(r1 + 0x0)
  bc 10 08 00 00 00 00 00  w0 = (s8)w1
  95 00 00 00 00 00 00 00  exit

Currently __ATOMIC_CONSUME is an alias for __ATOMIC_ACQUIRE.  Using
__ATOMIC_SEQ_CST ("sequentially consistent") is not supported yet and
will cause an error:

  $ clang --target=bpf -mcpu=v4 -c bar.c > /dev/null
bar.c:1:5: error: sequentially consistent (seq_cst) atomic load/store is
not supported
1 | int foo(int *ptr) { return __atomic_load_n(ptr, __ATOMIC_SEQ_CST); }
      |     ^
  ...

Finally, rename those isST*() and isLD*() helper functions in
BPFMISimplifyPatchable.cpp based on what the instructions actually do,
rather than their instruction class.

[1]
https://lore.kernel.org/all/20240729183246.4110549-1-yepeilin@google.com/
2025-03-04 09:19:39 -08:00
Brandon Wu
c804e86f55
[RISCV][VLS] Support RISCV VLS calling convention (#100346)
This patch adds a function attribute `riscv_vls_cc` for RISCV VLS
calling
convention which takes 0 or 1 argument, the argument is the `ABI_VLEN`
which is the `VLEN` for passing the fixed-vector arguments, it wraps the
argument as a scalable vector(VLA) using the `ABI_VLEN` and uses the
corresponding mechanism to handle it. The range of `ABI_VLEN` is [32,
65536],
if not specified, the default value is 128.

Here is an example of VLS argument passing:
Non-VLS call:
```
  void original_call(__attribute__((vector_size(16))) int arg) {}
=>
  define void @original_call(i128 noundef %arg) {
  entry:
    ...
    ret void
  }
```
VLS call:
```
  void __attribute__((riscv_vls_cc(256))) vls_call(__attribute__((vector_size(16))) int arg) {}
=>
  define riscv_vls_cc void @vls_call(<vscale x 1 x i32> %arg) {
  entry:
    ...
    ret void
  }
}
```

The first Non-VLS call passes generic vector argument of 16 bytes by
flattened integer.
On the contrary, the VLS call uses `ABI_VLEN=256` which wraps the
vector to <vscale x 1 x i32> where the number of scalable vector
elements
is calaulated by: `ORIG_ELTS * RVV_BITS_PER_BLOCK / ABI_VLEN`.
Note: ORIG_ELTS = Vector Size / Type Size = 128 / 32 = 4.

PsABI PR: https://github.com/riscv-non-isa/riscv-elf-psabi-doc/pull/418
C-API PR: https://github.com/riscv-non-isa/riscv-c-api-doc/pull/68
2025-03-03 12:39:35 +08:00
Yaxun (Sam) Liu
240f2269ff
Add clang atomic control options and attribute (#114841)
Add option and statement attribute for controlling emitting of
target-specific
metadata to atomicrmw instructions in IR.

The RFC for this attribute and option is

https://discourse.llvm.org/t/rfc-add-clang-atomic-control-options-and-pragmas/80641,
Originally a pragma was proposed, then it was changed to clang
attribute.

This attribute allows users to specify one, two, or all three options
and must be applied
to a compound statement. The attribute can also be nested, with inner
attributes
overriding the options specified by outer attributes or the target's
default
options. These options will then determine the target-specific metadata
added to atomic
instructions in the IR.

In addition to the attribute, three new compiler options are introduced:
`-f[no-]atomic-remote-memory`, `-f[no-]atomic-fine-grained-memory`,
 `-f[no-]atomic-ignore-denormal-mode`.
These compiler options allow users to override the default options
through the
Clang driver and front end. `-m[no-]unsafe-fp-atomics` is aliased to
`-f[no-]ignore-denormal-mode`.

In terms of implementation, the atomic attribute is represented in the
AST by the
existing AttributedStmt, with minimal changes to AST and Sema.

During code generation in Clang, the CodeGenModule maintains the current
atomic options,
which are used to emit the relevant metadata for atomic instructions.
RAII is used
to manage the saving and restoring of atomic options when entering
and exiting nested AttributedStmt.
2025-02-27 10:41:04 -05:00