llvm-project

Author	SHA1	Message	Date
Slava Zakharin	104f3c1806	Reland "[flang][runtime] Use cuda::std::complex in F18 runtime CUDA build. (#109078 )" (#109207 ) `std::complex` operators do not work for the CUDA device compilation of F18 runtime. This change makes use of `cuda::std::complex` from `libcudacxx`. `cuda::std::complex` does not have specializations for `long double`, so the change is accompanied with a clean-up for `long double` usage. Additional change on top of #109078 is to use `cuda::std::complex` only for the device compilation, otherwise the host compilation fails because `libcudacxx` may not support `long double` specialization at all (depending on the compiler).	2024-09-18 17:41:33 -07:00
Slava Zakharin	36192fdfb9	Revert "[flang][runtime] Use cuda::std::complex in F18 runtime CUDA build." (#109173 ) Reverts llvm/llvm-project#109078	2024-09-18 11:22:31 -07:00
Slava Zakharin	be187a6812	[flang][runtime] Use cuda::std::complex in F18 runtime CUDA build. (#109078 ) `std::complex` operators do not work for the CUDA device compilation of F18 runtime. This change makes use of `cuda::std::complex` from `libcudacxx`. `cuda::std::complex` does not have specializations for `long double`, so the change is accompanied with a clean-up for `long double` usage.	2024-09-18 10:59:05 -07:00
Slava Zakharin	8ce1aed55f	[flang] Lower MATMUL to type specific runtime calls. (#97547 ) Lower MATMUL to the new runtime entries added in #97406.	2024-07-03 21:18:56 -07:00
Slava Zakharin	dd22085308	[flang][runtime] Split MATMUL[_TRANSPOSE] into separate entries. (#97406 ) Device compilation is much faster for separate MATMUL[_TRANPOSE] entries than for a single one that covers all data types. The lowering changes and the removal of the generic entries will follow.	2024-07-02 21:30:37 -07:00
Pete Steinfeld	e55aa027f8	[flang] Fix runtime error messages for the MATMUL intrinsic (#96928 ) There are three forms of MATMUL -- where the first argument is a rank 1 array, where the second argument is a rank 1 array, and where both arguments are rank 2 arrays. There's code in the runtime that detects when the array shapes are incorrect. But the code that emits an error message assumes that both arguments are rank 2 arrays. This change contains code for the other two cases.	2024-06-27 14:54:02 -07:00
Slava Zakharin	71e0261fb0	[flang][runtime] Added Fortran::common::optional for use on device. This is a simplified implementation of std::optional that can be used in the offload builds for the device code. The methods are properly marked with RT_API_ATTRS so that the device compilation succedes. Reviewers: klausler, jeanPerier Reviewed By: jeanPerier Pull Request: https://github.com/llvm/llvm-project/pull/85177	2024-03-15 14:25:47 -07:00
Slava Zakharin	76facde32c	[flang][runtime] Enable more APIs in the offload build. (#76486 )	2023-12-28 13:50:43 -08:00
Slava Zakharin	b4b23ff7f8	[flang][runtime] Enable more APIs in the offload build. (#75996 ) This patch enables more numeric (mod, sum, matmul, etc.) APIs, and some others. I added new macros to disable warnings about using C++ STD methods like operators of std::complex, which do not have __device__ attribute. This may probably result in unresolved references, if the header files implementation relies on libstdc++. I will need to follow up on this.	2023-12-20 11:52:51 -08:00
Slava Zakharin	4d9771741d	[flang] Improved performance of runtime Matmul/MatmulTranspose. This patch mostly affects performance of the code produced by HLIFR lowering. If MATMUL argument is an array slice, then HLFIR lowering passes the slice to the runtime, whereas FIR lowering would create a contiguous temporary for the slice. Performance might be better than the generic implementation for cases where the leading dimension is contiguous. This patch improves CPU2000/178.galgel making HLFIR version faster than FIR version (due to avoiding the temporary copies for MATMUL arguments). Reviewed By: klausler Differential Revision: https://reviews.llvm.org/D159134	2023-08-29 17:04:00 -07:00
Peter Klausler	f5884fd9de	[flang][runtime] Improve error message for incompatible MATMUL arguments Print the full shapes of both argument when the dimensions that must match do not do so. Differential Revision: https://reviews.llvm.org/D132153	2022-08-18 13:59:13 -07:00
Peter Klausler	a5a493e192	[flang] Speed common runtime cases of DOT_PRODUCT & MATMUL Look for contiguous numeric argument arrays at runtime and use specialized code for them. Differential Revision: https://reviews.llvm.org/D112239	2021-10-22 14:36:13 -07:00
Peter Klausler	830c0b9023	[flang] Move runtime API headers to flang/include/flang/Runtime Move the closure of the subset of flang/runtime/*.h header files that are referenced by source files outside flang/runtime (apart from unit tests) into a new directory (flang/include/flang/Runtime) so that relative include paths into ../runtime need not be used. flang/runtime/pgmath.h.inc is moved to flang/include/flang/Evaluate; it's not used by the runtime. Differential Revision: https://reviews.llvm.org/D109107	2021-09-03 11:08:34 -07:00
peter klausler	5e1421b22f	[flang] Implement MATMUL in the runtime Define an API for the transformational intrinsic function MATMUL, implement it, and add some basic unit tests. The large number of possible argument type combinations are covered by a set of generalized templates that are instantiated for each valid pair of possible argument types. Places where BLAS-2/3 routines could be called for acceleration are marked with TODOs. Handling for other special cases (e.g., known-shape 3x3 matrices and vectors) are deferred. Some minor tweaks were made to the recent related implementation of DOT_PRODUCT to reflect lessons learned. Differential Revision: https://reviews.llvm.org/D102652	2021-05-18 10:59:52 -07:00

14 Commits