1643 Commits

Author SHA1 Message Date
Phoebe Wang
99a1d5f7fa
[X86][APX] Remove CF feature from APXF and Diamond Rapids (#153751)
Due to it results in more losses than gains.
2025-08-20 03:07:56 +00:00
Aditi Medhane
948abf1bf5
[PowerPC] Add BCDCOPYSIGN and BCDSETSIGN Instruction Support (#144874)
Support the following BCD format conversion builtins for PowerPC.

- `__builtin_bcdcopysign` – Conversion that returns the decimal value of
the first parameter combined with the sign code of the second parameter.
`
- `__builtin_bcdsetsign` – Conversion that sets the sign code of the
input parameter in packed decimal format.

> Note: This built-in function is valid only when all following
conditions are met:
> -qarch is set to utilize POWER9 technology.
> The bcd.h file is included.

## Prototypes

```c
vector unsigned char __builtin_bcdcopysign(vector unsigned char, vector unsigned char);
vector unsigned char __builtin_bcdsetsign(vector unsigned char, unsigned char);
```

## Usage Details

`__builtin_bcdsetsign`: Returns the packed decimal value of the first
parameter combined with the sign code.
The sign code is set according to the following rules:
- If the packed decimal value of the first parameter is positive, the
following rules apply:
     - If the second parameter is 0, the sign code is set to 0xC.
     - If the second parameter is 1, the sign code is set to 0xF.
- If the packed decimal value of the first parameter is negative, the
sign code is set to 0xD.
> notes:
>     The second parameter can only be 0 or 1.
> You can determine whether a packed decimal value is positive or
negative as follows:
> - Packed decimal values with sign codes **0xA, 0xC, 0xE, or 0xF** are
interpreted as positive.
> - Packed decimal values with sign codes **0xB or 0xD** are interpreted
as negative.

---------

Co-authored-by: Aditi-Medhane <aditi.medhane@ibm.com>
2025-08-19 14:47:27 +05:30
Nikita Popov
246a64a12e
[Clang] Rename HasLegalHalfType -> HasFastHalfType (NFC) (#153163)
This option is confusingly named. What it actually controls is whether,
under the default of `-ffloat16-excess-precision=standard`, it is
beneficial for performance to perform calculations on float (without
intermediate rounding) or not. For `-ffloat16-excess-precision=none` the
LLVM `half` type will always be used, and all backends are expected to
legalize it correctly.
2025-08-18 09:23:48 +02:00
Ami-zhang
a1b6e7ff39
[clang][LoongArch] Ensure target("lasx") implies LSX support (#153542)
Currently, `__attribute__((target("lasx")))` does not automatically
enable LSX support, causing Clang to fail with `-mno-lsx`. Since
LASX depends on LSX, enabling LASX should implicitly enable LSX to
avoid clang error.
    
Fixes #149512.

Depends on #153541
2025-08-15 09:53:08 +08:00
Nikita Popov
2bf2b6b24d
[AVR] Only specify one legal int string in data layout (#153010)
There should not be both `n8` and `n16:8`. This is a single list of
legal integers. Additionally, this should use the standard order in
increasing size `n8:16`.
2025-08-11 20:42:28 +02:00
Tom Vijlbrief
97f0ff0c80
[AVR] Fix Avr indvar detection and strength reduction (missed optimization) (#152028)
Fix https://github.com/llvm/llvm-project/issues/151080
2025-08-09 12:46:32 +08:00
Alex Voicu
ff8b4f8151
[AMDGCNSPIRV][NFC] Match AMDGPU's __builtin_va_list type (#152044)
AMDGCN flavoured SPIRV should math AMDGPU TI as much as possible, and
the va_list difference was spurious.
2025-08-05 15:30:16 +01:00
Eli Friedman
558277ae4d
[clang][ARM] Fix setting of MaxAtomicInlineWidth. (#151404)
2f497ec3a0056f15727ee6008211aeb2c4a8f455 updated the backend's rules for
when lock-free atomics are available, but we never made a corresponding
change to the frontend. Fix it to be consistent. This only affects
targets older than v7.
2025-08-01 11:03:21 -07:00
Artem Belevich
507b879b6e
[CUDA] add support for targeting sm_103/sm_121 with CUDA-12.9 (#151587) 2025-07-31 13:38:54 -07:00
Hood Chatham
8dd91996f0
[WebAssembly] Add gc target feature to addBleedingEdgeFeatures (#151294)
Also alphebetize feature list, add `-mgc` and `-mno-gc` flags, and add
some missing feature tests.

Reland of #151107.

https://github.com/llvm/llvm-project/pull/150201#discussion_r2237982637
2025-07-30 13:04:02 -07:00
ronlieb
a7e029bd0b
Revert "[WebAssembly] Add gc target feature to addBleedingEdgeFeatures" (#151268)
Reverts llvm/llvm-project#151107
2025-07-29 22:07:17 -07:00
Hood Chatham
fe25445ded
[WebAssembly] Add gc target feature to addBleedingEdgeFeatures (#151107)
See suggestion here:
https://github.com/llvm/llvm-project/pull/150201#discussion_r2237982637
2025-07-29 17:54:12 -07:00
Changpeng Fang
30ad2e24ab
[AMDGPU] Allow readonly features to be written to IR when there is no target (#148141)
Fixes: SWDEV-541399
2025-07-29 08:18:00 -07:00
Simon Tatham
d5985905ae
[Clang][ARM][Sema] Reject bad sizes of __builtin_arm_ldrex (#150419)
Depending on the particular version of the AArch32 architecture,
load/store exclusive operations might be available for various subset of
8, 16, 32, and 64-bit quantities. Sema knew nothing about this and was
accepting all four sizes, leading to a compiler crash at isel time if
you used a size not available on the target architecture.

Now the Sema checking stage emits a more sensible diagnostic, pointing
at the location in the code.

In order to allow Sema to query the set of supported sizes, I've moved
the enum of LDREX_x sizes out of its Arm-specific header into
`TargetInfo.h`.

Also, in order to allow the diagnostic to specify the correct list of
supported sizes, I've filled it with `%select{}`. (The alternative was
to make separate error messages for each different list of sizes.)
2025-07-29 09:01:37 +01:00
Hood Chatham
15b03687ff
[WebAssembly,clang] Add __builtin_wasm_test_function_pointer_signature (#150201)
Tests if the runtime type of the function pointer matches the static
type. If this returns false, calling the function pointer will trap.
Uses `@llvm.wasm.ref.test.func` added in #147486.

Also adds a "gc" wasm feature to gate the use of the ref.test
instruction.
2025-07-25 16:52:39 -07:00
Alexandros Lamprineas
3ab64c5b29
[NFC][Clang][FMV] Make FMV priority data type future proof. (#150079)
FMV priority is the returned value of a polymorphic function. On RISC-V
and X86 targets a 32-bit value is enough. On AArch64 we currently need
64 bits and we will soon exceed that. APInt seems to be a suitable
replacement for uint64_t, presumably with minimal compile time overhead.
It allows bit manipulation, comparison and variable bit width.
2025-07-23 10:37:29 +01:00
Slava "nerfur" Voronzoff
5eecbc018b
Adding Loongarch64 to OpenBSD parts (#149737)
Adding Loongarch64 to OpenBSD parts
2025-07-22 18:18:12 -04:00
Simon Tatham
34f59d7920
[Clang][ARM] Fix __ARM_FEATURE_LDREX on Armv8-M (#149538)
The Armv8-M architecture doesn't have the LDREXD and STREXD
instructions, for exclusive load/store of a 64-bit quantity split across
two registers. But the `__ARM_FEATURE_LDREX` macro was set to a value
that claims it does, because the case for Armv8 was missing a check for
M profile.

The Armv7 case got it right, so I've just made the two cases the same.
2025-07-22 08:53:45 +01:00
Hervé Poussineau
13906724ff
[Mips] Correctly define IntPtrType (#145158)
Mips was the only architecture having PtrDiffType = SignedInt and
IntPtrType = SignedLong

This fixes a problem on mipsel-windows-gnu triple, where uintptr_t was
wrongly defined as unsigned long instead of unsigned int, leading to
problems in compiler-rt.

compiler-rt/lib/interception/interception_type_test.cpp:24:17: error:
static assertion failed due to requirement
'__sanitizer::is_same<unsigned long, unsigned int>::value':
24 | COMPILER_CHECK((__sanitizer::is_same<__sanitizer::uptr,
::uintptr_t>::value));
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

compiler-rt/lib/interception/../sanitizer_common/sanitizer_internal_defs.h:369:44:
note: expanded from macro 'COMPILER_CHECK'
      369 | #define COMPILER_CHECK(pred) static_assert(pred, "")
          |                                            ^~~~
compiler-rt/lib/interception/interception_type_test.cpp:25:17: error:
static assertion failed due to requirement '__sanitizer::is_same<long,
int>::value':
25 | COMPILER_CHECK((__sanitizer::is_same<__sanitizer::sptr,
::intptr_t>::value));
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

compiler-rt/lib/interception/../sanitizer_common/sanitizer_internal_defs.h:369:44:
note: expanded from macro 'COMPILER_CHECK'
      369 | #define COMPILER_CHECK(pred) static_assert(pred, "")
          |                                            ^~~~
compiler-rt/lib/interception/interception_type_test.cpp:27:17: error:
static assertion failed due to requirement '__sanitizer::is_same<long,
int>::value':
27 | COMPILER_CHECK((__sanitizer::is_same<::PTRDIFF_T,
::ptrdiff_t>::value));
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

compiler-rt/lib/interception/../sanitizer_common/sanitizer_internal_defs.h:369:44:
note: expanded from macro 'COMPILER_CHECK'
      369 | #define COMPILER_CHECK(pred) static_assert(pred, "")
2025-07-21 18:17:45 +02:00
Wenju He
b41398294c
[SPIR] Set MaxAtomicInlineWidth minimum size to 32 for spir32 and 64 for spir64 (#148997)
Set MaxAtomicInlineWidth the same way as SPIR-V targets in 3cfd0c0d3697.
This PR fixes build warning in scoped atomic built-in in #146814:
`warning: large atomic operation may incur significant performance
penalty; ; the access size (2 bytes) exceeds the max lock-free size (0
bytes) [-Watomic-alignment]`
2025-07-17 09:02:10 +08:00
Brad Smith
0d2e11f3e8
Remove Native Client support (#133661)
Remove the Native Client support now that it has finally reached end of life.
2025-07-15 13:22:33 -04:00
Yaxun (Sam) Liu
beea2a9414
[Clang] Respect MS layout attributes during CUDA/HIP device compilation (#146620)
This patch fixes an issue where Microsoft-specific layout attributes,
such as __declspec(empty_bases), were ignored during CUDA/HIP device
compilation on a Windows host. This caused a critical memory layout
mismatch between host and device objects, breaking libraries that rely
on these attributes for ABI compatibility.

The fix introduces a centralized hasMicrosoftRecordLayout() check within
the TargetInfo class. This check is aware of the auxiliary (host) target
and is set during TargetInfo::adjust if the host uses a Microsoft ABI.

The empty_bases, layout_version, and msvc::no_unique_address attributes
now use this centralized flag, ensuring device code respects them and
maintains layout consistency with the host.

Fixes: https://github.com/llvm/llvm-project/issues/146047
2025-07-09 08:53:10 -04:00
Ivan Kosarev
a7a7e95720
[AMDGPU][Clang] Support bfloat16 arithmetic. (#147541)
Co-authored-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2025-07-08 17:30:06 +01:00
Nikita Popov
18f7655178
[Clang][Wasm] Set __float128 alignment to 64 for emscripten (#146494)
https://reviews.llvm.org/D104808 set the alignment of long double to 64
bits. This is also the alignment specified in the LLVM data layout.
However, the alignment of __float128 was left at 128 bits.

I assume that this was just an oversight, rather than an intentional
divergence. The C ABI document currently does not make any statement
about `__float128`:
https://github.com/WebAssembly/tool-conventions/blob/main/BasicCABI.md
2025-07-08 10:20:43 +02:00
Eli Friedman
2aa0f0a3bd
[AArch64] Add option -msve-streaming-vector-bits= . (#144611)
This is similar to -msve-vector-bits, but for streaming mode: it
constrains the legal values of "vscale", allowing optimizations based on
that constraint.

This also fixes conversions between SVE vectors and fixed-width vectors
in streaming functions with -msve-vector-bits and
-msve-streaming-vector-bits.

This rejects any use of arm_sve_vector_bits types in streaming
functions; if it becomes relevant, we could add
arm_sve_streaming_vector_bits types in the future.

This doesn't touch the __ARM_FEATURE_SVE_BITS define.
2025-07-03 13:44:38 -07:00
Sander de Smalen
cd10ded697
[Clang] Remove AArch64TargetInfo::setArchFeatures (#146107)
When compiling with `-march=armv9-a+nosve` we found that Clang still
defines the `__ARM_FEATURE_SVE2` macro, which is explicitly set in
`setArchFeatures` when compiling for armv9-a.

After some experimenting, I found out that the list of features passed
into `AArch64TargetInfo::handleTargetFeatures` has already been expanded
and takes into account `+no[feature]` and has already expanded features
like `armv9-a`.

From that I conclude that `setArchFeatures` is no longer required.
2025-07-01 10:20:40 +01:00
Nikita Popov
7e830f7671
[Clang] Partially fix m68k alignments (#144740)
As the data layout a few lines further up specifies, the int, long and
pointer alignment should be 16 instead of the default of 32.

The long long alignment is also incorrect, but that would require a
change to the data layout as well.

Comparison with GCC, which consistently uses 2 byte alignment:
https://gcc.godbolt.org/z/K3x6a7dEf At least based on some spot checks,
the changes to bit field layout also make use match GCC now.

This was found by https://github.com/llvm/llvm-project/pull/144720.
2025-07-01 09:06:41 +02:00
Ami-zhang
8d9cdb65f0
[Clang][LoongArch] Fixed incorrect _BitInt(N>64) alignment (#145297)
This patch makes determining alignment and width of BitInt to be target
ABI specific and makes it consistent with [Procedure Call Standard for
the LoongArch™ Architecture] for LoongArch target
(https://github.com/loongson/la-abi-specs/blob/release/lapcs.adoc).
2025-07-01 08:42:16 +08:00
Yao Zi
0ba456fcc6
[Clang][LoongArch] Match GCC behaviour when parsing FPRs in asm clobbers (#138391)
There're four possible formats to refer a register in inline assembly,

1. Numeric name without dollar sign ("f0")
2. Numeric name with dollar sign ("$f0")
3. ABI name without dollar sign ("fa0")
4. ABI name with dollar sign ("$fa0")

LoongArch GCC accepts 1 and 2 for FPRs before r15-8284[1] and all these
formats after the chagne. But Clang supports only 2 and 4 for FPRs. The
inconsistency has caused compatibility issues, such as QEMU's case[2].

This patch follows 0bbf3ddf5fea ("[Clang][LoongArch] Add GPR alias
handling without `$` prefix") and accepts FPRs without dollar sign
prefixes as well to keep aligned with GCC, avoiding future compatibility
problems.

Link:
https://gcc.gnu.org/cgit/gcc/commit/?id=d0110185eb78f14a8e485f410bee237c9c71548d
[1]
Link:
https://lore.kernel.org/qemu-devel/20250314033150.53268-3-ziyao@disroot.org/
[2]
2025-06-28 16:47:05 +08:00
Brian Cain
c6bd020714
Revert "[Hexagon] NFC: Reduce the amount of version-specific code" (#146193)
Reverts llvm/llvm-project#145812
2025-06-27 23:13:36 -05:00
Alexey Karyakin
e9c9adcefe
[Hexagon] NFC: Reduce the amount of version-specific code (#145812)
There is a lot of redundant code that needs to be modified when new
Hexagon versions are added. Reduce the amount of this redundancy.

- compute ELF flags and attributes based on version feature names;
- simplify EnableHVX option handling by using arch features instead of
arch version enums;
- simplify completeHVXFeatures() by using features;
- delete several unused or redundant functions and constants:
isCPUValid, getCpu, getHexagonCPUSuffix;
- do not set HexagonArchVersion in initializeSubtargetDependencies, it
is set in ParseSubtargetFeatures;

Signed-off-by: Alexey Karyakin <akaryaki@quicinc.com>
2025-06-27 15:05:28 -05:00
Sean Perry
a0c5f1992d
[SystemZ][zOS] disable _Float16 support on z/OS (#145532)
The new half float type (aka _Float16 ) isn't supported on z/OS.
2025-06-26 15:21:04 -04:00
Himadhith
32febe60f3
[PowerPC] Support for Packed BCD conversion builtins (#142723)
Support the following packed BCD builtins for PowerPC.
  
```
__builtin_national2packed - Conversion of National format to Packed decimal format.
__builtin_packed2national - Conversion of Packed decimal format to national format.
__builtin_packed2zoned    - Conversion of Packed decimal format to Zoned decimal format.
__builtin_zoned2packed    - Conversion of Zoned decimal format to Packed decimal format.
```
### Prototypes: 
`vector unsigned char __builtin_national2packed(vector unsigned char a,
unsigned char b);`
`vector unsigned char __builtin_packed2zoned(vector unsigned char,
unsigned char);`
`vector unsigned char __builtin_zoned2packed(vector unsigned char,
unsigned char);`

The condition for the 2nd parameter is consistent over all the 3
prototypes (0 or 1 only).

`vector unsigned char __builtin_packed2national(vector unsigned char);`

Co-authored-by: himadhith <himadhith.v@ibm.com>
Co-authored-by: Tony Varghese <tonypalampalliyil@gmail.com>
2025-06-25 14:47:38 +05:30
no92
0f302f38b0
[clang] Add managarm support (#144791)
This is a repost of the quickly reverted #139271. The failing buildbot
tests have been fixed and pass on my machine now.
2025-06-20 02:40:20 -04:00
Kazu Hirata
7cbb141155
[clang] Migrate away from ArrayRef(std::nullopt) (NFC) (#144982)
ArrayRef has a constructor that accepts std::nullopt.  This
constructor dates back to the days when we still had llvm::Optional.

Since the use of std::nullopt outside the context of std::optional is
kind of abuse and not intuitive to new comers, I would like to move
away from the constructor and eventually remove it.

This patch takes care of the clang side of the migration.
2025-06-19 23:29:50 -07:00
Stanislav Mekhanoshin
69974658f0
[AMDGPU] Initial support for gfx1250 target. (#144965)
This is just a stub for now.
2025-06-19 22:52:51 -07:00
zhijian lin
bf79d4819e
[Reland] [PowerPC] frontend get target feature from backend with cpu name (#144594)
1. The PR proceeds with a backend target hook to allow front-ends to
determine what target features are available in a compilation based on
the CPU name.
2. Fix a backend target feature bug that supports HTM for
Power8/9/10/11. However, HTM is only supported on Power8/9 according to
the ISA.
3. All target features that are hardcoded in PPC.cpp can be retrieved
from the backend target feature. I have double-checked that the
hardcoded logic for inferring target features from the CPU in the
frontend(PPC.cpp) is the same as in PPC.td.

The reland patch addressed the comment
https://github.com/llvm/llvm-project/pull/137670#discussion_r2143541120
2025-06-19 09:22:16 -04:00
Aaron Ballman
3377b56338
Revert "[clang] Add managarm support" (#144514)
Reverts llvm/llvm-project#139271

There are multiple failing build bots:
https://lab.llvm.org/buildbot/#/builders/10/builds/7482
https://lab.llvm.org/buildbot/#/builders/11/builds/17473
2025-06-17 08:39:15 -04:00
no92
e86740e600
[clang] Add managarm support (#139271)
This PR is part of a series to upstream managarm support, as laid out in
the
[RFC](https://discourse.llvm.org/t/rfc-new-proposed-managarm-support-for-llvm-and-clang-87845/85884/1).
This PR is a follow-up to #87845 and #138854.
2025-06-17 01:51:46 -04:00
Tom Vijlbrief
ad94f77a6a
[AVR] Add many new AVR MCU model definitions (#144229)
1. Added the missing XMEGA2 definition. The avr64 devices use xmega2 which has SPM(X) defined.

2. The avr16/avr32 devices do have SPM and SPMX features, but the current xmega3 definition has not.
   Xmega3 is also used for modern attiny series which do not have SPM(X), so that is correct.
   Leave the avr16/avr32 devices unchanged (using xmega3 to be in sync with gcc definitions).

Fixes https://github.com/llvm/llvm-project/issues/116116
2025-06-16 09:25:40 +08:00
Kazu Hirata
62d8e001da Revert "[AVR] Add support for many new AVR MCUs (#143914)"
This reverts commit 10bc17fc3676b82c7240046a948d2925dd2045d3.

Multiple buildbot failures have been reported:
https://github.com/llvm/llvm-project/pull/143914
2025-06-14 08:18:50 -07:00
Tom Vijlbrief
10bc17fc36
[AVR] Add support for many new AVR MCUs (#143914)
fixes https://github.com/llvm/llvm-project/issues/116116
2025-06-14 23:10:04 +08:00
Martin Wehking
fbea0fc5c7
Add Macro for CSSC Feature (#143148)
Add a new __ARM_FEATURE_CSSC macro that can be utilized during the
preprocessing stage.

__ARM_FEATURE_CSSC is defined to 1 if there is hardware support for
CSSC.

Implements the ACLE change:
https://github.com/ARM-software/acle/pull/394
2025-06-13 13:33:46 +01:00
Reid Kleckner
cbf27bf711 Revert " [PowerPC] frontend get target feature from backend with cpu name (#137670)"
This reverts commit 9208b343e962b9f1140ee345c0050a3920bdcbf2.

TargetParser shouldn't re-run the PPC subtarget tablegen target, it
should define its own `-gen-ppc-target-def` rule like all the other
targets do in llvm/include/llvm/TargetParser/CMakeLists.txt .

One user reported that there are incorrect CMake dependencies after this
change, so I will roll this back in the meantime.
2025-06-12 19:56:41 +00:00
zhijian lin
9208b343e9
[PowerPC] frontend get target feature from backend with cpu name (#137670)
1. The PR proceeds with a backend target hook to allow front-ends to
determine what target features are available in a compilation based on
the CPU name.
2. Fix a backend target feature bug that supports HTM for
Power8/9/10/11. However, HTM is only supported on Power8/9 according to
the ISA.
3. All target features that are hardcoded in PPC.cpp can be retrieved
from the backend target feature. I have double-checked that the
hardcoded logic for inferring target features from the CPU in the
frontend(PPC.cpp) is the same as in PPC.td.
2025-06-12 13:38:13 -04:00
Nathan Gauër
33fee56499
[HLSL][SPIR-V] Change SPV AS map for groupshared (#143519)
The previous mapping we setting the hlsl_groupshared AS to 0, which
translated to either Generic or Function.
Changing this to 3, which translated to Workgroup.

Related to #142804
2025-06-11 14:22:45 +02:00
Tomohiro Kashiwada
830a74092a
[Clang] [Cygwin] va_list must be treated like normal Windows (#143115)
Handling of va_list on Cygwin environment must be matched to normal
Windows environment.

The existing test `test/CodeGen/ms_abi.c` seems relevant, but it
contains `__attribute__((sysv_abi))`, which is not supported on Cygwin.
The new test is based on the `__attribute__((ms_abi))` portion of that
test.

---------

Co-authored-by: jeremyd2019 <github@jdrake.com>
2025-06-10 23:42:36 +03:00
Tomohiro Kashiwada
e7739eb6cc
[Clang] [Cygwin] wint_t is unsigned int (#143117)
On Cygwin environment, wint_t is unsigned int as shown here:
```
$ echo | gcc -m32 -xc - -E -dM | grep WINT_T
145:#define __SIZEOF_WINT_T__ 4
315:#define __WINT_TYPE__ unsigned int
```

```
$ echo | gcc -xc - -E -dM | grep WINT_T
147:#define __SIZEOF_WINT_T__ 4
317:#define __WINT_TYPE__ unsigned int
```

```
$ LANG=C gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-pc-cygwin/15/lto-wrapper.exe
Target: x86_64-pc-cygwin
(snip)
```
2025-06-09 22:19:06 +03:00
Ami-zhang
0ed5d9aff6
[LoongArch][BF16] Add support for the __bf16 type (#142548)
The LoongArch psABI recently added __bf16 type support. Now we can
enable this new type in clang.

Currently, bf16 operations are automatically supported by promoting to
float. This patch adds bf16 support by ensuring that load extension /
truncate store operations are properly expanded.

And this commit implements support for bf16 truncate/extend on hard FP
targets. The extend operation is implemented by a shift just as in the
standard legalization. This requires custom lowering of the truncate
libcall on hard float ABIs (the normal libcall code path is used on soft
ABIs).
2025-06-09 11:15:41 +08:00
Nick Sarnie
3b9ebe9201
[clang] Simplify device kernel attributes (#137882)
We have multiple different attributes in clang representing device
kernels for specific targets/languages. Refactor them into one attribute
with different spellings to make it more easily scalable for new
languages/targets.

---------

Signed-off-by: Sarnie, Nick <nick.sarnie@intel.com>
2025-06-05 14:15:38 +00:00