This PR implements Function Multi-versioning on AIX using `__attribute__
((target_clones(<feature-list>)))`.
Initially, we will only support specifying a cpu in the version list.
Feature strings (such as "altivec" or "isel") on target_clones will be
implemented in a future PR.
Accepted syntax:
```
__attribute__((target_clones(<OPTIONS>)))
```
where `<OPTIONS>` is a comma separated list of strings, each string is
either:
1) the default string `"default"`
2) a cpu string `"cpu=<CPU>"`, where `<CPU>`is a value accepted by the
`-mcpu` flag.
For example, specifying the following on a function
```
__attribute__((target_clones("default", "cpu=power8", "cpu=power9")))
int foo(int x) { return x + 1; }
```
Would generate 3 versions of `foo`: (1) `foo.default`, (2)
`foo.cpu_power8`, and (3) `foo.cpu_power9`,
an IFUNC `foo`, and the resolver function `foo.resolver`, for the IFUNC,
that chooses one of the three versions at runtime.
---------
Co-authored-by: Wael Yehia <wyehia@ca.ibm.com>
Remove `_mma` from the following built-ins as they are not related to
MMA:
* __builtin_mma_dmsetdmrz
* __builtin_mma_dmmr
* __builtin_mma_dmxor
* __builtin_mma_build_dmr
* __builtin_mma_disassemble_dmr
AI Assisted.
This commit adds support for lwat/ldat atomic operations with function
code 16 (Compare and Swap Not Equal) via 4 clang builtins:
__builtin_amo_lwat_csne for 32-bit unsigned operations
__builtin_amo_ldat_csne for 64-bit unsigned operations
__builtin_amo_lwat_csne_s for 32-bit signed operations
__builtin_amo_ldat_csne_s for 64-bit signed operations
This commit adds 4 Clang builtins for PowerPC AMO store operations:
__builtin_amo_stwat for 32-bit unsigned operations
__builtin_amo_stdat for 64-bit unsigned operations
__builtin_amo_stwat_s for 32-bit signed operations
__builtin_amo_stdat_s for 64-bit signed operations
and maps GCC's AMO store functions to these Clang builtins for
compatibility.
This commit adds 4 Clang builtins for PowerPC AMO load conditional
increment and decrement operations:
__builtin_amo_lwat_cond for 32-bit unsigned operations
__builtin_amo_ldat_cond for 64-bit unsigned operations
__builtin_amo_lwat_cond_s for 32-bit signed operations
__builtin_amo_ldat_cond_s for 64-bit signed operations
This commit adds two Clang builtins for AMO load signed operations:
__builtin_amo_lwat_s for 32-bit signed operations
__builtin_amo_ldat_s for 64-bit signed operations
`UnqualPtrTy` didn't always match `llvm::PointerType::getUnqual`:
sometimes it returned a pointer that is not in address space 0 (notably
for SPIRV).
Since `UnqualPtrTy` was used as the "generic" or "default" pointer type,
this patch renames it to `DefaultPtrTy` to avoid confusion with LLVM's
`PointerType::getUnqual`.
Define the __dmr2048 type to represent the DMR pair introduced by the
Dense Math Facility on PowerPC, and add three Clang builtins
corresponding to DMF cryptography:
__builtin_mma_dmsha2hash
__builtin_mma_dmsha3hash
__builtin_mma_dmxxshapad
The __dmr2048 type is required for the dmsha3hash crypto builtin, and,
as withother PPC MMA and DMR types, its use is strongly restricted.
clang/lib/CodeGen/CGBuiltin.cpp is over 1MB long (>23k LoC), and can
take minutes to recompile (depending on compiler and host system) when
modified, and 5 seconds for clangd to update for every edit. Splitting
this file was discussed in this thread:
https://discourse.llvm.org/t/splitting-clang-s-cgbuiltin-cpp-over-23k-lines-long-takes-1min-to-compile/
and the idea has received a number of +1 votes, hence this change.