This is a simplified implementation of std::optional that can be used
in the offload builds for the device code. The methods are properly
marked with RT_API_ATTRS so that the device compilation succedes.
Reviewers: klausler, jeanPerier
Reviewed By: jeanPerier
Pull Request: https://github.com/llvm/llvm-project/pull/85177
This patch enables more numeric (mod, sum, matmul, etc.) APIs,
and some others.
I added new macros to disable warnings about using C++ STD methods
like operators of std::complex, which do not have __device__ attribute.
This may probably result in unresolved references, if the header files
implementation relies on libstdc++. I will need to follow up on this.
This patch mostly affects performance of the code produced by
HLIFR lowering. If MATMUL argument is an array slice, then
HLFIR lowering passes the slice to the runtime, whereas
FIR lowering would create a contiguous temporary for the slice.
Performance might be better than the generic implementation
for cases where the leading dimension is contiguous.
This patch improves CPU2000/178.galgel making HLFIR version
faster than FIR version (due to avoiding the temporary copies
for MATMUL arguments).
Reviewed By: klausler
Differential Revision: https://reviews.llvm.org/D159134
When MatmulTranpose reports incorrect shapes of the arguments
it cannot represent itself as MATMUL, because the reading
of the first argument's shape will be confusing.
Reviewed By: kiranchandramohan
Differential Revision: https://reviews.llvm.org/D155911
This fused operation should run a lot faster than first transposing the
lhs array and then multiplying the matrices separately.
Based on flang/runtime/matmul.cpp
Depends on D145959
Reviewed By: klausler
Differential Revision: https://reviews.llvm.org/D145960