Remove `_mma` from the following built-ins as they are not related to
MMA:
* __builtin_mma_dmsetdmrz
* __builtin_mma_dmmr
* __builtin_mma_dmxor
* __builtin_mma_build_dmr
* __builtin_mma_disassemble_dmr
AI Assisted.
This commit adds support for lwat/ldat atomic operations with function
code 16 (Compare and Swap Not Equal) via 4 clang builtins:
__builtin_amo_lwat_csne for 32-bit unsigned operations
__builtin_amo_ldat_csne for 64-bit unsigned operations
__builtin_amo_lwat_csne_s for 32-bit signed operations
__builtin_amo_ldat_csne_s for 64-bit signed operations
This commit adds 4 Clang builtins for PowerPC AMO store operations:
__builtin_amo_stwat for 32-bit unsigned operations
__builtin_amo_stdat for 64-bit unsigned operations
__builtin_amo_stwat_s for 32-bit signed operations
__builtin_amo_stdat_s for 64-bit signed operations
and maps GCC's AMO store functions to these Clang builtins for
compatibility.
This commit adds 4 Clang builtins for PowerPC AMO load conditional
increment and decrement operations:
__builtin_amo_lwat_cond for 32-bit unsigned operations
__builtin_amo_ldat_cond for 64-bit unsigned operations
__builtin_amo_lwat_cond_s for 32-bit signed operations
__builtin_amo_ldat_cond_s for 64-bit signed operations
This commit adds two Clang builtins for AMO load signed operations:
__builtin_amo_lwat_s for 32-bit signed operations
__builtin_amo_ldat_s for 64-bit signed operations
`UnqualPtrTy` didn't always match `llvm::PointerType::getUnqual`:
sometimes it returned a pointer that is not in address space 0 (notably
for SPIRV).
Since `UnqualPtrTy` was used as the "generic" or "default" pointer type,
this patch renames it to `DefaultPtrTy` to avoid confusion with LLVM's
`PointerType::getUnqual`.
Define the __dmr2048 type to represent the DMR pair introduced by the
Dense Math Facility on PowerPC, and add three Clang builtins
corresponding to DMF cryptography:
__builtin_mma_dmsha2hash
__builtin_mma_dmsha3hash
__builtin_mma_dmxxshapad
The __dmr2048 type is required for the dmsha3hash crypto builtin, and,
as withother PPC MMA and DMR types, its use is strongly restricted.
clang/lib/CodeGen/CGBuiltin.cpp is over 1MB long (>23k LoC), and can
take minutes to recompile (depending on compiler and host system) when
modified, and 5 seconds for clangd to update for every edit. Splitting
this file was discussed in this thread:
https://discourse.llvm.org/t/splitting-clang-s-cgbuiltin-cpp-over-23k-lines-long-takes-1min-to-compile/
and the idea has received a number of +1 votes, hence this change.