llvm-project

Author	SHA1	Message	Date
Romaric Jodin	7baa7edc00	[libclc]: clspv: add a dummy implememtation for mul_hi (#134094 ) clspv uses a better implementation that is not using a bigger side when not available. Add a dummy implementation for mul_hi to avoid to override the implementation of clspv with the one in libclc.	2025-04-03 10:18:39 +01:00
Fraser Cormack	e7ad07ffb8	[libclc] Move fma to the CLC library (#126052 ) This builtin is a little more involved than others as targets deal with fma in various different ways. Fundamentally, the CLC __clc_fma builtin compiles to __builtin_elementwise_fma, which compiles to the @llvm.fma intrinsic. However, in the case of fp32 fma some targets call the __clc_sw_fma function, which provides a software implementation of the builtin. This in principle is controlled by the __CLC_HAVE_HW_FMA32 macro and may be a runtime decision, depending on how the target defines that macro. All targets build the CLC fma functions for all types. This is to the CLC library can have a reliable internal implementation for its own purposes. For AMD/NVPTX targets there are no meaningful changes to the generated LLVM bytecode. Some blocks of code have moved around, which confounds llvm-diff. For the clspv and SPIR-V/Mesa targets, only fp32 fma is of interest. Its use in libclc is tightly controlled by checking __CLC_HAVE_HW_FMA32 first. This can either be a compile-time constant (1, for clspv) or a runtime function for SPIR-V/Mesa. The SPIR-V/Mesa target only provided fp32 fma in the OpenCL layer. It unconditionally mapped that to the __clc_sw_fma builtin, even though the generic version in theory had a runtime toggle through __CLC_HAVE_HW_FMA32 specifically for that target. Callers of fma, though, would end up using the ExtInst fma, not calling the _Z3fmafff function provided by libclc. This commit keeps this system in place in the OpenCL layer, by mapping fma to __clc_sw_fma. Where other builtins would previously call fma (i.e., result in the ExtInst), they now call __clc_fma. This function checks the __CLC_HAVE_HW_FMA32 runtime toggle, which selects between the slow version or the quick version. The quick version is the LLVM fma intrinsic which llvm-spirv translates to the ExtInst. The clspv target had its own software implementation of fp32 fma, which it called unconditionally - even though __CLC_HAVE_HW_FMA32 is 1 for that target. This is potentially just so its library ships a software version which it can fall back on. In the OpenCL layer, the target doesn't provide fp64 fma, and maps fp16 fma to fp32 mad. This commit keeps this system roughly in place: in the OpenCL layer it maps fp32 fma to __clc_sw_fma, and fp16 fma to mad. Where builtins would previously call into fma, they now call __clc_fma, which compiles to the LLVM intrinsic. If this goes through a translation to SPIR-V it will become the fma ExtInst, or the intrinsic could be replaced by the _Z3fmafff software implementation. The clspv and SPIR-V/Mesa targets could potentially be cleaned up later, depending on their needs.	2025-02-24 10:10:51 +00:00
Fraser Cormack	4dec3909e9	[libclc] Have all targets build all CLC functions (#124779 ) This removes all remaining SPIR-V workarounds for CLC functions, in an effort to streamline the CLC implementation and prevent further issues that #124614 had to fix. This commit fixes the same issue for the SPIR-V targets. Target-specific CLC implementations can and will exist, but for now they're all identical and so the target-specific SOURCES files have been removed. Target implementations now always include the 'generic' CLC directory, meaning we can avoid unnecessary duplication of SOURCES listings.	2025-02-10 10:19:22 +00:00
Fraser Cormack	76d1cb22c1	[libclc] Move rotate to CLC library; optimize (#125713 ) This commit moves the rotate builtin to the CLC library. It also optimizes rotate(x, n) to generate the @llvm.fshl(x, x, n) intrinsic, for both scalar and vector types. The previous implementation was too cautious in its handling of the shift amount; the OpenCL rules state that the shift amount is always treated as an unsigned value modulo the bitwidth.	2025-02-05 10:38:23 +00:00
Fraser Cormack	fe694b18dc	[libclc] Move mad_sat to CLC; optimize for vector types (#125517 ) This commit moves the mad_sat builtin to the CLC library. It also optimizes it for vector types by avoiding scalarization. To help do this it transforms the previous control-flow code into vector select code. This has also been done for the scalar versions for simplicity.	2025-02-03 17:50:42 +00:00
Fraser Cormack	7441e87fe0	[libclc] Move several integer functions to CLC library (#116786 ) This commit moves over the OpenCL clz, hadd, mad24, mad_hi, mul24, mul_hi, popcount, rhadd, and upsample builtins to the CLC library. This commit also optimizes the vector forms of the mul_hi and upsample builtins to consistently remain in vector types, instead of recursively splitting vectors down to the scalar form. The OpenCL mad_hi builtin wasn't previously publicly available from the CLC libraries, as it was hash-defined to mul_hi in the header files. That issue has been fixed, and mad_hi is now exposed. The custom AMD implementation/workaround for popcount has been removed as it was only required for clang < 7. There are still two integer functions which haven't been moved over. The OpenCL mad_sat builtin uses many of the other integer builtins, and would benefit from optimization for vector types. That can take place in a follow-up commit. The rotate builtin could similarly use some more dedicated focus, potentially using clang builtins.	2025-01-29 13:45:33 +00:00
Fraser Cormack	12cdf4330d	[libclc] Move (add\|sub)_sat to CLC; optimize (#124903 ) Using the `__builtin_elementwise_(add\|sub)_sat` functions allows us to directly optimize to the desired intrinsic, and avoid scalarization for vector types.	2025-01-29 11:12:40 +00:00
Romaric Jodin	9d8d538e40	libclc: clspv: add missing clc_isnan.cl dependency (#124614 ) clc_isnan.cl is needed since https://github.com/llvm/llvm-project/pull/124097	2025-01-28 14:47:08 +00:00
Fraser Cormack	cfc8ef0ad8	[libclc] Move copysign to CLC library; fix & optimize (#124598 ) This commit moves the implementation of the copysign builtin to the CLC library. It simultaneously optimizes it for vector types by avoiding scalarization. It does so by using the __builtin_elementwise_copysign clang builtins, which can handle vector types. It also fixes a bug in the half/fp16 implementation of the builtin. This version was using an incorrect mask (0x7FFFF instead of 0x7FFF) and was thus preserving the original sign bit, rather than masking it out.	2025-01-28 09:18:34 +00:00
Fraser Cormack	9705500582	[libclc] Move nextafter to the CLC library (#124097 ) There were two implementations of this - one that implemented nextafter in software, and another that called a clang builtin. No in-tree targets called the builtin, so all targets build the software version. The builtin version has been removed, and the software version has been renamed to be the "default". This commit also optimizes nextafter, to avoid scalarization as much as possible. Note however that the (CLC) relational builtins still scalarize; those will be optimized in a separate commit. Since nextafter is used by some convert_type builtins, the diff to IR codegen is not limited to the builtin itself.	2025-01-23 12:24:16 +00:00
Fraser Cormack	d96ec48068	[libclc] Route select through __clc_select (#123647 ) This was missed during the introduction of select. This also unifies the various .inc files used for each, as they were essentially identical. The __clc_select function is now also built for SPIR-V targets.	2025-01-21 10:05:39 +00:00
Fraser Cormack	c8eb865747	[libclc] Move mad to the CLC library (#123607 ) All targets build `__clc_mad` -- even SPIR-V targets -- since it compiles to the optimal `llvm.fmuladd` intrinsic. There is no change to the bytecode generated for non-SPIR-V targets. The `mix` builtin, which is implemented as a wrapper around `mad`, is left as an OpenCL-layer wrapper of `__clc_mad`. I don't know if it's worth having a specific CLC version of `mix`. The changes to the other CLC files/functions are moving uses of `mad` to `__clc_mad`, and reformatting. There is an additional instance of `trunc` becoming `__clc_trunc`, which was missed before.	2025-01-20 16:27:51 +00:00
Fraser Cormack	b7e20147ad	[libclc] Move smoothstep to CLC and optimize its codegen (#123183 ) This commit moves the implementation of the smoothstep function to the CLC library, whilst optimizing the codegen. This commit also adds support for 'half' versions of smoothstep, which were previously missing. The CLC smoothstep implementation now keeps everything in vectors, rather than recursively splitting vectors by half down to the scalar base form. This should result in more optimal codegen across the board. This commit also removes some non-standard overloads of smoothstep with mixed types, such as 'double smoothstep(float, float, float)'. There aren't any mixed-(element )type versions of smoothstep as far as I can see: gentype smoothstep(gentype edge0, gentype edge1, gentype x) gentypef smoothstep(float edge0, float edge1, gentypef x) gentyped smoothstep(double edge0, double edge1, gentyped x) gentypeh smoothstep(half edge0, half edge1, gentypeh x) The CLC library only defines the first type, for simplicity; the OpenCL layer is responsible for handling the scalar/scalar/vector forms. Note that the scalar/scalar/vector forms now splat the scalars to the vector type, rather than recursively split vectors as before. The macro that used to 'vectorize' smoothstep in this way has been moved out of the shared clcmacro.h header as it was only used for the smoothstep builtin. Note that the CLC clamp function is now built for both SPIR-V targets. This is to help build the CLC smoothstep function for the Mesa SPIR-V target.	2025-01-16 11:44:09 +00:00
Fraser Cormack	06789ccb16	[libclc] Optimize ceil/fabs/floor/rint/trunc (#119596 ) These functions all map to the corresponding LLVM intrinsics, but the vector intrinsics weren't being generated. The intrinsic mapping from CLC vector function to vector intrinsic was working correctly, but the mapping from OpenCL builtin to CLC function was suboptimally recursively splitting vectors in halves. For example, with this change, `ceil(float16)` calls `llvm.ceil.v16f32` directly once optimizations are applied. Now also, instead of generating LLVM intrinsics through `__asm` we now call clang elementwise builtins for each CLC builtin. This should be a more standard way of achieving the same result The CLC versions of each of these builtins are also now built and enabled for SPIR-V targets. The LLVM -> SPIR-V translator maps the intrinsics to the appropriate OpExtInst, so there should be no difference in semantics, despite the newly introduced indirection from OpenCL builtin through the CLC builtin to the intrinsic. The AMDGPU targets make use of the same `_CLC_DEFINE_UNARY_BUILTIN` macro to override `sqrt`, so those functions also appear more optimal with this change, calling the vector `llvm.sqrt.vXf32` intrinsics directly.	2024-12-13 08:47:13 +00:00
Fraser Cormack	b2bdd8bd39	[libclc] Create an internal 'clc' builtins library Some libclc builtins currently use internal builtins prefixed with '__clc_' for various reasons, e.g., to avoid naming clashes. This commit formalizes this concept by starting to isolate the definitions of these internal clc builtins into a separate self-contained bytecode library, which is linked into each target's libclc OpenCL builtins before optimization takes place. The goal of this step is to allow additional libraries of builtins that provide entry points (or bindings) that are not written in OpenCL C but still wish to expose OpenCL-compatible builtins. By moving the implementations into a separate self-contained library, entry points can share as much code as possible without going through OpenCL C. The overall structure of the internal clc library is similar to the current OpenCL structure, with SOURCES files and targets being able to override the definitions of builtins as needed. The idea is that the OpenCL builtins will begin to need fewer target-specific overrides, as those will slowly move over to the clc builtins instead. Another advantage of having a separate bytecode library with the CLC implementations is that we can internalize the symbols when linking it (separately), whereas currently the CLC symbols make it into the final builtins library (and perhaps even the final compiled binary). This patch starts of with 'dot' as it's relatively self-contained, as opposed to most of the maths builtins which tend to pull in other builtins. We can also start to clang-format the builtins as we go, which should help to modernize the codebase.	2024-10-29 13:09:56 +00:00

15 Commits