The new framework makes it explicit which processor feature is being
used and allows for easier per platform customization:
- ARM cpu now uses trivial implementations to reduce code size.
- Memcmp, Bcmp and Memmove have been optimized for x86
- Bcmp has been optimized for aarch64.
This is a reland of https://reviews.llvm.org/D135134 (b3f1d58, 028414881381)
Reviewed By: courbet
Differential Revision: https://reviews.llvm.org/D136595
This patch seems to introduce bugs on aarch64.
Reverting while we investigate the root cause.
This reverts commit 02841488138160f9064f334a833d4bf3e80385c6.
The new framework makes it explicit which processor feature is being
used and allows for easier per platform customization:
- ARM cpu now uses trivial implementations to reduce code size.
- Memcmp, Bcmp and Memmove have been optimized for x86
- Bcmp has been optimized for aarch64.
This is a reland of https://reviews.llvm.org/D135134 (b3f1d58)
Differential Revision: https://reviews.llvm.org/D136595
This reverts commit https://reviews.llvm.org/D135134 (b3f1d58a131eb546aaf1ac165c77ccb89c40d758)
That revision appears to have broken Arm memcpy in some subtle
ways. Am communicating with the original author to get a
good reproduction.
This version is more composable and also simpler at the expense of being more explicit and more verbose. It also provides minimal implementations for ARM platforms.
Codegen can be checked here https://godbolt.org/z/chf1Y6eGM
Differential Revision: https://reviews.llvm.org/D135134
This version is more composable and also simpler at the expense of being more explicit and more verbose. It also provides minimal implementations for ARM platforms.
Codegen can be checked here https://godbolt.org/z/x19zvE59v
Differential Revision: https://reviews.llvm.org/D135134
This version is more composable and also simpler at the expense of being more explicit and more verbose. It also provides minimal implementations for ARM platforms.
Codegen can be checked here https://godbolt.org/z/x19zvE59v
Differential Revision: https://reviews.llvm.org/D135134
This version is more composable and also simpler at the expense of being more explicit and more verbose. It also provides minimal implementations for ARM platforms.
Codegen can be checked here https://godbolt.org/z/x19zvE59v
Differential Revision: https://reviews.llvm.org/D135134
This version is more composable and also simpler at the expense of being more explicit and more verbose.
This patch is not meant to be submitted but gives an idea of the change.
Codegen can be checked in https://godbolt.org/z/6z1dEoWbs by removing the "static inline" before individual functions.
Unittests are coming.
Suggested review order:
- utils
- op_base
- op_builtin
- op_generic
- op_x86 / op_aarch64
- *_implementations.h
Differential Revision: https://reviews.llvm.org/D135134
Similar to D113097 although not strictly necessary for now. It helps
keeping the same structure for all memory functions.
Differential Revision: https://reviews.llvm.org/D113103
We may want to restrict the detected platforms to only `x86_64` and `aarch64`.
There are still custom detection in api.td but I don't think we can handle these:
- config/linux/api.td:205
- config/linux/api.td:199
Differential Revision: https://reviews.llvm.org/D112818
We may want to restrict the detected platforms to only `x86_64` and `aarch64`.
There are still custom detection in api.td but I don't think we can handle these:
- config/linux/api.td:205
- config/linux/api.td:199
Differential Revision: https://reviews.llvm.org/D112818
Summary:
The new macro also inserts the C alias for the C++ implementations
without needing an objcopy based post processing step. The CMake
rules have been updated to reflect this. More CMake cleanup can be
taken up in future rounds and appropriate TODOs have been added for them.
Reviewers: mcgrathr, sivachandra
Subscribers: