llvm-project

Author	SHA1	Message	Date
Yaxun (Sam) Liu	beea2a9414	[Clang] Respect MS layout attributes during CUDA/HIP device compilation (#146620 ) This patch fixes an issue where Microsoft-specific layout attributes, such as __declspec(empty_bases), were ignored during CUDA/HIP device compilation on a Windows host. This caused a critical memory layout mismatch between host and device objects, breaking libraries that rely on these attributes for ABI compatibility. The fix introduces a centralized hasMicrosoftRecordLayout() check within the TargetInfo class. This check is aware of the auxiliary (host) target and is set during TargetInfo::adjust if the host uses a Microsoft ABI. The empty_bases, layout_version, and msvc::no_unique_address attributes now use this centralized flag, ensuring device code respects them and maintains layout consistency with the host. Fixes: https://github.com/llvm/llvm-project/issues/146047	2025-07-09 08:53:10 -04:00
Ivan Kosarev	a7a7e95720	[AMDGPU][Clang] Support bfloat16 arithmetic. (#147541 ) Co-authored-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2025-07-08 17:30:06 +01:00
Nick Sarnie	3b9ebe9201	[clang] Simplify device kernel attributes (#137882 ) We have multiple different attributes in clang representing device kernels for specific targets/languages. Refactor them into one attribute with different spellings to make it more easily scalable for new languages/targets. --------- Signed-off-by: Sarnie, Nick <nick.sarnie@intel.com>	2025-06-05 14:15:38 +00:00
Fraser Cormack	1e31f4b5eb	[AMDGPU] Support the OpenCL generic addrspace feature by default (#137636 ) This feature should be supported on AMDGCN architectures with flat addressing.	2025-04-29 14:14:00 +01:00
Shilei Tian	dccc0a836c	[NFC][AMDGPU] Replace more direct arch comparison with isAMDGCN() (#131379 ) This is an extension of #131357. Hopefully this would be the last one.	2025-03-14 17:02:15 -04:00
Chandler Carruth	cd269fee05	[StrTable] Switch Clang builtins to use string tables This both reapplies #118734, the initial attempt at this, and updates it significantly. First, it uses the newly added `StringTable` abstraction for string tables, and simplifies the construction to build the string table and info arrays separately. This should reduce any `constexpr` compile time memory or CPU cost of the original PR while significantly improving the APIs throughout. It also restructures the builtins to support sharding across several independent tables. This accomplishes two improvements from the original PR: 1) It improves the APIs used significantly. 2) When builtins are defined from different sources (like SVE vs MVE in AArch64), this allows each of them to build their own string table independently rather than having to merge the string tables and info structures. 3) It allows each shard to factor out a common prefix, often cutting the size of the strings needed for the builtins by a factor two. The second point is important both to allow different mechanisms of construction (for example a `.def` file and a tablegen'ed `.inc` file, or different tablegen'ed `.inc files), it also simply reduces the sizes of these tables which is valuable given how large they are in some cases. The third builds on that size reduction. Initially, we use this new sharding rather than merging tables in AArch64, LoongArch, RISCV, and X86. Mostly this helps ensure the system works, as without further changes these still push scaling limits. Subsequent commits will more deeply leverage the new structure, including using the prefix capabilities which cannot be easily factored out here and requires deep changes to the targets.	2025-02-04 18:04:57 +00:00
Chandler Carruth	ca79ff07d8	Revert "Switch builtin strings to use string tables" (#119638 ) Reverts llvm/llvm-project#118734 There are currently some specific versions of MSVC that are miscompiling this code (we think). We don't know why as all the other build bots and at least some folks' local Windows builds work fine. This is a candidate revert to help the relevant folks catch their builders up and have time to debug the issue. However, the expectation is to roll forward at some point with a workaround if at all possible.	2024-12-13 23:58:48 -08:00
Chandler Carruth	be2df95e92	Switch builtin strings to use string tables (#118734 ) The Clang binary (and any binary linking Clang as a library), when built using PIE, ends up with a pretty shocking number of dynamic relocations to apply to the executable image: roughly 400k. Each of these takes up binary space in the executable, and perhaps most interestingly takes start-up time to apply the relocations. The largest pattern I identified were the strings used to describe target builtins. The addresses of these string literals were stored into huge arrays, each one requiring a dynamic relocation. The way to avoid this is to design the target builtins to use a single large table of strings and offsets within the table for the individual strings. This switches the builtin management to such a scheme. This saves over 100k dynamic relocations by my measurement, an over 25% reduction. Just looking at byte size improvements, using the `bloaty` tool to compare a newly built `clang` binary to an old one: ``` FILE SIZE VM SIZE -------------- -------------- +1.4% +653Ki +1.4% +653Ki .rodata +0.0% +960 +0.0% +960 .text +0.0% +197 +0.0% +197 .dynstr +0.0% +184 +0.0% +184 .eh_frame +0.0% +96 +0.0% +96 .dynsym +0.0% +40 +0.0% +40 .eh_frame_hdr +114% +32 [ = ] 0 [Unmapped] +0.0% +20 +0.0% +20 .gnu.hash +0.0% +8 +0.0% +8 .gnu.version +0.9% +7 +0.9% +7 [LOAD #2 [R]] [ = ] 0 -75.4% -3.00Ki .relro_padding -16.1% -802Ki -16.1% -802Ki .data.rel.ro -27.3% -2.52Mi -27.3% -2.52Mi .rela.dyn -1.6% -2.66Mi -1.6% -2.66Mi TOTAL ``` We get a 16% reduction in the `.data.rel.ro` section, and nearly 30% reduction in `.rela.dyn` where those reloctaions are stored. This is also visible in my benchmarking of binary start-up overhead at least: ``` Benchmark 1: ./old_clang --version Time (mean ± σ): 17.6 ms ± 1.5 ms [User: 4.1 ms, System: 13.3 ms] Range (min … max): 14.2 ms … 22.8 ms 162 runs Benchmark 2: ./new_clang --version Time (mean ± σ): 15.5 ms ± 1.4 ms [User: 3.6 ms, System: 11.8 ms] Range (min … max): 12.4 ms … 20.3 ms 216 runs Summary './new_clang --version' ran 1.13 ± 0.14 times faster than './old_clang --version' ``` We get about 2ms faster `--version` runs. While there is a lot of noise in binary execution time, this delta is pretty consistent, and represents over 10% improvement. This is particularly interesting to me because for very short source files, repeatedly starting the `clang` binary is actually the dominant cost. For example, `configure` scripts running against the `clang` compiler are slow in large part because of binary start up time, not the time to process the actual inputs to the compiler. ---- This PR implements the string tables using `constexpr` code and the existing macro system. I understand that the builtins are moving towards a TableGen model, and if complete that would provide more options for modeling this. Unfortunately, that migration isn't complete, and even the parts that are migrated still rely on the ability to break out of the TableGen model and directly expand an X-macro style `BUILTIN(...)` textually. I looked at trying to complete the move to TableGen, but it would both require the difficult migration of the remaining targets, and solving some tricky problems with how to move away from any macro-based expansion. I was also able to find a reasonably clean and effective way of doing this with the existing macros and some `constexpr` code that I think is clean enough to be a pretty good intermediate state, and maybe give a good target for the eventual TableGen solution. I was also able to factor the macros into set of consistent patterns that avoids a significant regression in overall boilerplate.	2024-12-08 19:00:14 -08:00
Joseph Huber	f84903486c	[AMDGPU] Do not allow the region address space to be converted to generic (#117171 ) Summary: Previous changes relaxed the address space rules based on what the target says about them. This accidentally included the AS(2) region as convertible to generic. Simply check for AS(2) and reject it.	2024-11-22 07:13:49 -06:00
Joseph Huber	b9d678d22f	[Clang] Use TargetInfo when deciding if an address space is compatible (#115777 ) Summary: Address spaces are used in several embedded and GPU targets to describe accesses to different types of memory. Currently we use the address space enumerations to control which address spaces are considered supersets of eachother, however this is also a target level property as described by the C standard's passing mentions. This patch allows the address space checks to use the target information to decide if a pointer conversion is legal. For AMDGPU and NVPTX, all supported address spaces can be converted to the default address space. More semantic checks can be added on top of this, for now I'm mainly looking to get more standard semantics working for C/C++. Right now the address space conversions must all be done explicitly in C/C++ unlike the offloading languages which define their own custom address spaces that just map to the same target specific ones anyway. The main question is if this behavior is a function of the target or the language.	2024-11-15 06:58:36 -06:00
Joseph Huber	4fb953ac34	[AMDGPU] Make `__GCC_DESTRUCTIVE_SIZE` 128 on AMDGPU (#115241 ) Summary: The cache line size on AMDGPU varies between 64 and 128 (The lowest L2 cache also goes to 256 on some architectures.) This macro is intended to present a size that will not cause destructive interference, so we choose the larger of those values.	2024-11-07 04:59:58 -08:00
Jay Foad	4dd55c567a	[clang] Use {} instead of std::nullopt to initialize empty ArrayRef (#109399 ) Follow up to #109133.	2024-10-24 10:23:40 +01:00
mmoadeli	f540044751	[NVPTX][AMDGPU][CodeGen] Fix `local_space nullptr` handling for NVPTX and local/private `nullptr` value for AMDGPU. (#78759 ) - Address space cast of nullptr in local_space into a generic_space for the CUDA backend. The reason for this cast was having invalid local memory base address for the associated variable. - In the context of AMD GPU, assigns a NULL value as ~0 for the address spaces of sycl_local and sycl_private to match the ones for opencl_local and opencl_private.	2024-02-26 21:19:02 +05:30
Kazu Hirata	ffaedc2735	[Basic] Simplify uses of StringRef::consume_front (NFC)	2024-02-04 14:57:26 -08:00
Kazu Hirata	d34ac450a7	[Basic] Use StringRef::consume_front (NFC)	2024-01-15 21:25:48 -08:00
Dominik Adamski	276a024b49	[NFC][AMDGPU] Unify AMDGPU address space enum (#73944 ) Types of AMDGPU address space were defined not only in Clang-specific class but also in LLVM header. If we unify the AMD GPU address space enumeration, then we can reuse it in Clang, Flang and LLVM.	2023-12-11 10:45:21 +01:00
Yaxun (Sam) Liu	c0f0d50653	[HIP] emit macro `__HIP_NO_IMAGE_SUPPORT` HIP texture/image support is optional as some devices do not have image instructions. A macro __HIP_NO_IMAGE_SUPPORT is defined for device not supporting images (`d0448aa4c4/docs/reference/kernel_language.md (L426)` ) Currently the macro is defined by HIP header based on predefined macros for GPU, e.g __gfx*__ , which is error prone. This patch let clang emit the predefined macro. Reviewed by: Matt Arsenault, Artem Belevich Differential Revision: https://reviews.llvm.org/D151349	2023-06-14 22:53:41 -04:00
M. Zeeshan Siddiqui	e621757365	[Clang][BFloat16] Upgrade __bf16 to arithmetic type, change mangling, and extend excess precision support Pursuant to discussions at https://discourse.llvm.org/t/rfc-c-23-p1467r9-extended-floating-point-types-and-standard-names/70033/22, this commit enhances the handling of the __bf16 type in Clang. - Firstly, it upgrades __bf16 from a storage-only type to an arithmetic type. - Secondly, it changes the mangling of __bf16 to DF16b on all architectures except ARM. This change has been made in accordance with the finalization of the mangling for the std::bfloat16_t type, as discussed at https://github.com/itanium-cxx-abi/cxx-abi/pull/147. - Finally, this commit extends the existing excess precision support to the __bf16 type. This applies to hardware architectures that do not natively support bfloat16 arithmetic. Appropriate tests have been added to verify the effects of these changes and ensure no regressions in other areas of the compiler. Reviewed By: rjmccall, pengfei, zahiraam Differential Revision: https://reviews.llvm.org/D150913	2023-05-27 13:33:50 +08:00
Yaxun (Sam) Liu	6adb9a0602	[AMDGPU] Emit predefined macro `__AMDGCN_CUMODE__` Predefine __AMDGCN_CUMODE__ as 1 or 0 when compilation assumes CU or WGP modes. If WGP mode is not supported, ignore -mno-cumode and emit a warning. This is needed for implementing device functions like __smid (`312dff7b79/include/hip/amd_detail/amd_device_functions.h (L957)`) Reviewed by: Matt Arsenault, Artem Belevich, Brian Sumner Differential Revision: https://reviews.llvm.org/D145343	2023-05-12 18:50:52 -04:00
Stoorx	42d758bfa6	[clang] Return `std::string_view` from `TargetInfo::getClobbers()` Change the return type of `getClobbers` function from `const char` to `std::string_view`. Update the function usages in CodeGen module. The reasoning of these changes is to remove unsafe `const char` strings and prevent unnecessary allocations for constructing the `std::string` in usages of `getClobbers()` function. Differential Revision: https://reviews.llvm.org/D148799	2023-04-24 12:16:54 +03:00
Kazu Hirata	7eaa7b0553	[clang] Use *{Map,Set}::contains (NFC)	2023-03-15 18:06:34 -07:00
Archibald Elliott	62c7f035b4	[NFC][TargetParser] Remove llvm/ADT/Triple.h I also ran `git clang-format` to get the headers in the right order for the new location, which has changed the order of other headers in two files.	2023-02-07 12:39:46 +00:00
Archibald Elliott	8e3d7cf5de	[NFC][TargetParser] Remove llvm/Support/TargetParser.h	2023-02-07 11:08:21 +00:00
Kazu Hirata	6ad0788c33	[clang] Use std::optional instead of llvm::Optional (NFC) This patch replaces (llvm::\|)Optional< with std::optional<. I'll post a separate patch to remove #include "llvm/ADT/Optional.h". This is part of an effort to migrate from llvm::Optional to std::optional: https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716	2023-01-14 12:31:01 -08:00
serge-sans-paille	d9ab3e82f3	[clang] Use a StringRef instead of a raw char pointer to store builtin and call information This avoids recomputing string length that is already known at compile time. It has a slight impact on preprocessing / compile time, see https://llvm-compile-time-tracker.com/compare.php?from=3f36d2d579d8b0e8824d9dd99bfa79f456858f88&to=e49640c507ddc6615b5e503144301c8e41f8f434&stat=instructions:u This a recommit of e953ae5bbc313fd0cc980ce021d487e5b5199ea4 and the subsequent fixes caa713559bd38f337d7d35de35686775e8fb5175 and 06b90e2e9c991e211fecc97948e533320a825470. The above patchset caused some version of GCC to take eons to compile clang/lib/Basic/Targets/AArch64.cpp, as spotted in aa171833ab0017d9732e82b8682c9848ab25ff9e. The fix is to make BuiltinInfo tables a compilation unit static variable, instead of a private static variable. Differential Revision: https://reviews.llvm.org/D139881	2022-12-27 09:55:19 +01:00
Pierre van Houtryve	678d8946ba	[AMDGPU] Add bf16 storage support - [Clang] Declare AMDGPU target as supporting BF16 for storage-only purposes on amdgcn - Add Sema & CodeGen tests cases. - Also add cases that D138651 would have covered as this patch replaces it. - [AMDGPU] Add BF16 storage-only support - Support legalization/dealing with bf16 operations in DAGIsel. - bf16 as a type remains illegal and is represented as i16 for storage purposes. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D139398	2022-12-13 10:34:26 -05:00
Kazu Hirata	35b4fbb559	[clang] Use std::nullopt instead of None in comments (NFC) This is part of an effort to migrate from llvm::Optional to std::optional: https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716	2022-12-04 15:57:24 -08:00
Krzysztof Parzyszek	0ca43d4488	DebugInfoMetadata: convert Optional to std::optional	2022-12-04 11:52:02 -06:00
Kazu Hirata	eeee3fee37	[Basic] Use std::nullopt instead of None (NFC) This patch mechanically replaces None with std::nullopt where the compiler would warn if None were deprecated. The intent is to reduce the amount of manual work required in migrating from Optional to std::optional. This is part of an effort to migrate from llvm::Optional to std::optional: https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716	2022-12-03 11:34:27 -08:00
Alex Richardson	a602f76a24	[clang][TargetInfo] Use LangAS for getPointer{Width,Align}() Mixing LLVM and Clang address spaces can result in subtle bugs, and there is no need for this hook to use the LLVM IR level address spaces. Most of this change is just replacing zero with LangAS::Default, but it also allows us to remove a few calls to getTargetAddressSpace(). This also removes a stale comment+workaround in CGDebugInfo::CreatePointerLikeType(): ASTContext::getTypeSize() does return the expected size for ReferenceType (and handles address spaces). Differential Revision: https://reviews.llvm.org/D138295	2022-11-30 20:24:01 +00:00
Kazu Hirata	f5ef2c5838	[clang] Convert for_each to range-based for loops (NFC)	2022-06-10 22:39:45 -07:00
Jon Chesterfield	83c431fb9e	[amdgpu] Add amdgpu_kernel calling conv attribute to clang Allows emitting define amdgpu_kernel void @func() IR from C or C++. This replaces the current workflow which is to write a stub in opencl that calls an external C function implemented in C++ combined through llvm-link. Calling the resulting function still requires a manual implementation of the ABI from the host side. The primary application is for more rapid debugging of the amdgpu backend by permuting a C or C++ test file instead of manually updating an IR file. Implementation closely follows D54425. Non-amd reviewers from there. Reviewed By: yaxunl Differential Revision: https://reviews.llvm.org/D125970	2022-05-20 08:50:37 +01:00
Aaron Ballman	6c75ab5f66	Introduce _BitInt, deprecate _ExtInt WG14 adopted the _ExtInt feature from Clang for C23, but renamed the type to be _BitInt. This patch does the vast majority of the work to rename _ExtInt to _BitInt, which accounts for most of its size. The new type is exposed in older C modes and all C++ modes as a conforming extension. However, there are functional changes worth calling out: * Deprecates _ExtInt with a fix-it to help users migrate to _BitInt. * Updates the mangling for the type. * Updates the documentation and adds a release note to warn users what is going on. * Adds new diagnostics for use of _BitInt to call out when it's used as a Clang extension or as a pre-C23 compatibility concern. * Adds new tests for the new diagnostic behaviors. I want to call out the ABI break specifically. We do not believe that this break will cause a significant imposition for early adopters of the feature, and so this is being done as a full break. If it turns out there are critical uses where recompilation is not an option for some reason, we can consider using ABI tags to ease the transition.	2021-12-06 12:52:01 -05:00
Kazu Hirata	0e9373a6a6	[Basic] Use llvm::is_contained (NFC)	2021-10-10 08:52:14 -07:00
Jon Chesterfield	78f92c3810	[openmp][amdgpu] Initial gfx10 offloading implementation Lets wavefront size be 32 for amdgpu openmp, as well as 64. Fixes up as little as possible to pass that through the libraries. This change is end to end, as opposed to updating clang/devicertl/plugin separately. It can be broken up for review/commit if preferred. Posting as-is so that others with a gfx10 can try it out. It works roughly as well as gfx9 for me, but there are probably bugs remaining as well as the todo: for letting grid values vary more. Reviewed By: ronlieb Differential Revision: https://reviews.llvm.org/D108708	2021-08-27 12:34:03 +01:00
Jon Chesterfield	c2574e63ff	[openmp][nfc] Refactor GridValues Remove redundant fields and replace pointer with virtual function Of fourteen fields, three are dead and four can be computed from the remainder. This leaves a couple of currently dead fields in place as they are expected to be used from the deviceRTL shortly. Two of the fields that can be computed are only used from codegen and require a log2() implementation so are inlined into codegen instead. This change leaves the new methods in the same location in the struct as the previous fields for convenience at review. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D108380	2021-08-23 16:19:11 +01:00
Jon Chesterfield	b1efeface7	Revert "[openmp][nfc] Refactor GridValues" Failed a nvptx codegen test This reverts commit 2a47a84b40115b01e03e4d89c1d47ba74beb7bf3.	2021-08-20 18:17:27 +01:00
Jon Chesterfield	2a47a84b40	[openmp][nfc] Refactor GridValues Remove redundant fields and replace pointer with virtual function Of fourteen fields, three are dead and four can be computed from the remainder. This leaves a couple of currently dead fields in place as they are expected to be used from the deviceRTL shortly. Two of the fields that can be computed are only used from codegen and require a log2() implementation so are inlined into codegen instead. This change leaves the new methods in the same location in the struct as the previous fields for convenience at review. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D108380	2021-08-20 16:41:26 +01:00
Anshil Gandhi	7063ac1afa	[HIP] Allow target addr space in target builtins This patch allows target specific addr space in target builtins for HIP. It inserts implicit addr space cast for non-generic pointer to generic pointer in general, and inserts implicit addr space cast for generic to non-generic for target builtin arguments only. It is NFC for non-HIP languages. Differential Revision: https://reviews.llvm.org/D102405	2021-08-19 23:51:58 -06:00
Anshil Gandhi	f5d5f17d3a	Revert "[HIP] Allow target addr space in target builtins" This reverts commit a35008955fa606487f79a050f5cc80fc7ee84dda.	2021-08-18 21:38:42 -06:00
Anshil Gandhi	a35008955f	[HIP] Allow target addr space in target builtins This patch allows target specific addr space in target builtins for HIP. It inserts implicit addr space cast for non-generic pointer to generic pointer in general, and inserts implicit addr space cast for generic to non-generic for target builtin arguments only. It is NFC for non-HIP languages. Differential Revision: https://reviews.llvm.org/D102405	2021-08-09 16:38:04 -06:00
Anton Zabaznov	f16a4fcbe5	[OpenCL] Add support of __opencl_c_3d_image_writes feature macro This feature requires support of __opencl_c_images, so diagnostics for that is provided as well. Also, ensure that cl_khr_3d_image_writes feature macro is set to the same value. Reviewed By: Anastasia Differential Revision: https://reviews.llvm.org/D106260	2021-07-30 04:54:28 +03:00
Melanie Blower	aaba37187f	[clang][PATCH][nfc] Refactor TargetInfo::adjust to pass DiagnosticsEngine to allow diagnostics on target-unsupported options Reviewed By: aaron.ballman Differential Revision: https://reviews.llvm.org/D104729	2021-06-29 13:26:23 -04:00
Melanie Blower	1d85d0879a	Revert "[clang][PATCH][nfc] Refactor TargetInfo::adjust to pass DiagnosticsEngine to allow diagnostics on target-unsupported options" This reverts commit 2dbe1c675fe94eeb7973dcc25b049d25f4ca4fa0. More buildbot failures	2021-06-28 15:47:21 -04:00
Melanie Blower	2dbe1c675f	[clang][PATCH][nfc] Refactor TargetInfo::adjust to pass DiagnosticsEngine to allow diagnostics on target-unsupported options Reviewed By: aaron.ballman Differential Revision: https://reviews.llvm.org/D104729	2021-06-28 15:09:53 -04:00
Melanie Blower	8815ef823c	Revert "[clang][PATCH][nfc] Refactor TargetInfo::adjust to pass DiagnosticsEngine to allow diagnostics on target-unsupported options" This reverts commit 2c02b0c3f45414ac6c64583e006a26113c028304. buildbot fails	2021-06-28 12:42:59 -04:00
Melanie Blower	2c02b0c3f4	[clang][PATCH][nfc] Refactor TargetInfo::adjust to pass DiagnosticsEngine to allow diagnostics on target-unsupported options Reviewed By: aaron.ballman Differential Revision: https://reviews.llvm.org/D104729	2021-06-28 12:26:53 -04:00
Anastasia Stulova	237c6924bd	[OpenCL] Add clang extension for bit-fields. Allow use of bit-fields as a clang extension in OpenCL. The extension can be enabled using pragma directives. This fixes PR45339! Differential Revision: https://reviews.llvm.org/D101843	2021-05-24 12:42:17 +01:00
Anton Zabaznov	826905787a	[OpenCL] Add support of OpenCL C 3.0 __opencl_c_fp64 There already exists cl_khr_fp64 extension. So OpenCL C 3.0 and higher should use the feature, earlier versions still use the extension. OpenCL C 3.0 API spec states that extension will be not described in the option string if corresponding optional functionality is not supported (see 4.2. Querying Devices). Due to that fact the usage of features for OpenCL C 3.0 must be as follows: ``` $ clang -Xclang -cl-ext=+cl_khr_fp64,+__opencl_c_fp64 ... $ clang -Xclang -cl-ext=-cl_khr_fp64,-__opencl_c_fp64 ... ``` e.g. the feature and the equivalent extension (if exists) must be set to the same values Reviewed By: Anastasia Differential Revision: https://reviews.llvm.org/D96524	2021-05-21 15:01:19 +03:00
Anastasia Stulova	e994e74bca	[OpenCL] Add clang extension for non-portable kernel parameters. Added __cl_clang_non_portable_kernel_param_types extension that allows using non-portable types as kernel parameters. This allows bypassing the portability guarantees from the restrictions specified in C++ for OpenCL v1.0 s2.4. Currently this only disables the restrictions related to the data layout. The programmer should ensure the compiler generates the same layout for host and device or otherwise the argument should only be accessed on the device side. This extension could be extended to other case (e.g. permitting size_t) if desired in the future. Patch by olestrohm (Ole Strohm)! https://reviews.llvm.org/D101168	2021-05-05 14:58:23 +01:00

1 2

92 Commits