llvm-project

Author	SHA1	Message	Date
Justin Cai	faf4e8af74	[Clang][SYCL] Add initial set of Intel OffloadArch values (#138158 ) Following #137070, this PR adds an initial set of Intel `OffloadArch` values with corresponding predicates that will be used in SYCL offloading. More Intel architectures will be added in a future PR.	2025-05-01 16:29:48 -05:00
modiking	d6a68be7af	[NVPTX] Add support for Shared Cluster Memory address space [1/2] (#135444 ) Adds support for new Shared Cluster Memory Address Space (SHARED_CLUSTER, addrspace 7). See https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#distributed-shared-memory for details. 1. Update address space structures and datalayout to contain the new space 2. Add new intrinsics that use this new address space 3. Update NVPTX alias analysis The existing intrinsics are updated in https://github.com/llvm/llvm-project/pull/136768	2025-04-22 15:14:39 -07:00
Sebastian Jodłowski	0127f169dc	[CUDA] Add support for sm101 and sm120 target architectures (#127187 ) Add support for sm101 and sm120 target architectures. It requires CUDA 12.8. --------- Co-authored-by: Sebastian Jodlowski <sjodlowski@nuro.ai>	2025-02-19 14:41:07 -08:00
Fabian Ritter	029c8e783d	[AMDGPU][clang] Replace gfx940 and gfx941 with gfx942 in clang (#126762 ) gfx940 and gfx941 are no longer supported. This is one of a series of PRs to remove them from the code base. This PR removes all occurrences of gfx940/gfx941 from clang that can be removed without changes in the llvm directory. The target-invalid-cpu-note/amdgcn.c test is not included here since it tests a list of targets that is defined in llvm/lib/TargetParser/TargetParser.cpp. For SWDEV-512631	2025-02-19 10:11:48 +01:00
Chandler Carruth	212ecb9d5c	[StrTable] Teach main builtin TableGen to use direct enums, strings, and info This moves the main builtins and several targets to use nice generated string tables and info structures rather than X-macros. Even without obvious prefixes factored out, the resulting tables are significantly smaller and much cheaper to compile with out all the X-macro overhead. This leaves the X-macros in place for atomic builtins which have a wide range of uses that don't seem reasonable to fold into TableGen. As future work, these should move to their own file (whether as X-macros or just generated patterns) so the AST headers don't have to include all the data for other builtins.	2025-02-04 18:04:58 +00:00
Chandler Carruth	cd269fee05	[StrTable] Switch Clang builtins to use string tables This both reapplies #118734, the initial attempt at this, and updates it significantly. First, it uses the newly added `StringTable` abstraction for string tables, and simplifies the construction to build the string table and info arrays separately. This should reduce any `constexpr` compile time memory or CPU cost of the original PR while significantly improving the APIs throughout. It also restructures the builtins to support sharding across several independent tables. This accomplishes two improvements from the original PR: 1) It improves the APIs used significantly. 2) When builtins are defined from different sources (like SVE vs MVE in AArch64), this allows each of them to build their own string table independently rather than having to merge the string tables and info structures. 3) It allows each shard to factor out a common prefix, often cutting the size of the strings needed for the builtins by a factor two. The second point is important both to allow different mechanisms of construction (for example a `.def` file and a tablegen'ed `.inc` file, or different tablegen'ed `.inc files), it also simply reduces the sizes of these tables which is valuable given how large they are in some cases. The third builds on that size reduction. Initially, we use this new sharding rather than merging tables in AArch64, LoongArch, RISCV, and X86. Mostly this helps ensure the system works, as without further changes these still push scaling limits. Subsequent commits will more deeply leverage the new structure, including using the prefix capabilities which cannot be easily factored out here and requires deep changes to the targets.	2025-02-04 18:04:57 +00:00
Durgadoss R	91cb8f5d32	[NVPTX] Add tcgen05 alloc/dealloc intrinsics (#124961 ) This patch adds intrinsics for the tcgen05 alloc/dealloc family of PTX instructions. This patch also adds an addrspace 6 for tensor memory which is used by these intrinsics. lit tests are added and verified with a ptxas-12.8 executable. Documentation for these additions is also added in NVPTXUsage.rst. Signed-off-by: Durgadoss R <durgadossr@nvidia.com>	2025-02-04 14:31:40 +05:30
Chandler Carruth	b968fd9502	[StrTable] Mechanically convert NVPTX builtins to use TableGen (#122873 ) This switches them to use tho common TableGen layer, extending it to support the missing features needed by the NVPTX backend. The biggest thing was to build a TableGen system that computes the cumulative SM and PTX feature sets the same way the macros did. That's done with some string concatenation tricks in TableGen, but they worked out pretty neatly and are very comparable in complexity to the macro version. Then the actual defines were mapped over using a very hacky Python script. It was never productionized or intended to work in the future, but for posterity: https://gist.github.com/chandlerc/10bdf8fb1312e252b4a501bace184b66 Last but not least, there was a very odd "bug" in one of the converted builtins' prototype in the TableGen model: it didn't handle uses of `Z` and `U` both as qualifiers of a single type, treating `Z` as its own `int32_t` type. So my hacky Python script converted `ZUi` into two types, an `int32_t` and an `unsigned int`. This produced a very wrong prototype. But the tests caught this nicely and I fixed it manually rather than trying to improve the Python script as it occurred in exactly one place I could find. This should provide direct benefits of allowing future refactorings to more directly leverage TableGen to express builtins more structurally rather than textually. It will also make my efforts to move builtins to string tables significantly more effective for the NVPTX backend where the X-macro approach resulted in significantly less efficient string tables than other targets due to the long repeated feature strings.	2025-01-27 22:45:37 -08:00
Sergey Kozub	616979ebd7	[NVPTX] Add support for PTX 8.6 and CUDA 12.6 (12.8) (#123398 ) Add CUDA versions 12.7, 12.8, 12.9 which support PTX8.6+ (enables using Blackwell-specific instructions).	2025-01-21 11:00:24 +01:00
Chandler Carruth	ca79ff07d8	Revert "Switch builtin strings to use string tables" (#119638 ) Reverts llvm/llvm-project#118734 There are currently some specific versions of MSVC that are miscompiling this code (we think). We don't know why as all the other build bots and at least some folks' local Windows builds work fine. This is a candidate revert to help the relevant folks catch their builders up and have time to debug the issue. However, the expectation is to roll forward at some point with a workaround if at all possible.	2024-12-13 23:58:48 -08:00
Chandler Carruth	be2df95e92	Switch builtin strings to use string tables (#118734 ) The Clang binary (and any binary linking Clang as a library), when built using PIE, ends up with a pretty shocking number of dynamic relocations to apply to the executable image: roughly 400k. Each of these takes up binary space in the executable, and perhaps most interestingly takes start-up time to apply the relocations. The largest pattern I identified were the strings used to describe target builtins. The addresses of these string literals were stored into huge arrays, each one requiring a dynamic relocation. The way to avoid this is to design the target builtins to use a single large table of strings and offsets within the table for the individual strings. This switches the builtin management to such a scheme. This saves over 100k dynamic relocations by my measurement, an over 25% reduction. Just looking at byte size improvements, using the `bloaty` tool to compare a newly built `clang` binary to an old one: ``` FILE SIZE VM SIZE -------------- -------------- +1.4% +653Ki +1.4% +653Ki .rodata +0.0% +960 +0.0% +960 .text +0.0% +197 +0.0% +197 .dynstr +0.0% +184 +0.0% +184 .eh_frame +0.0% +96 +0.0% +96 .dynsym +0.0% +40 +0.0% +40 .eh_frame_hdr +114% +32 [ = ] 0 [Unmapped] +0.0% +20 +0.0% +20 .gnu.hash +0.0% +8 +0.0% +8 .gnu.version +0.9% +7 +0.9% +7 [LOAD #2 [R]] [ = ] 0 -75.4% -3.00Ki .relro_padding -16.1% -802Ki -16.1% -802Ki .data.rel.ro -27.3% -2.52Mi -27.3% -2.52Mi .rela.dyn -1.6% -2.66Mi -1.6% -2.66Mi TOTAL ``` We get a 16% reduction in the `.data.rel.ro` section, and nearly 30% reduction in `.rela.dyn` where those reloctaions are stored. This is also visible in my benchmarking of binary start-up overhead at least: ``` Benchmark 1: ./old_clang --version Time (mean ± σ): 17.6 ms ± 1.5 ms [User: 4.1 ms, System: 13.3 ms] Range (min … max): 14.2 ms … 22.8 ms 162 runs Benchmark 2: ./new_clang --version Time (mean ± σ): 15.5 ms ± 1.4 ms [User: 3.6 ms, System: 11.8 ms] Range (min … max): 12.4 ms … 20.3 ms 216 runs Summary './new_clang --version' ran 1.13 ± 0.14 times faster than './old_clang --version' ``` We get about 2ms faster `--version` runs. While there is a lot of noise in binary execution time, this delta is pretty consistent, and represents over 10% improvement. This is particularly interesting to me because for very short source files, repeatedly starting the `clang` binary is actually the dominant cost. For example, `configure` scripts running against the `clang` compiler are slow in large part because of binary start up time, not the time to process the actual inputs to the compiler. ---- This PR implements the string tables using `constexpr` code and the existing macro system. I understand that the builtins are moving towards a TableGen model, and if complete that would provide more options for modeling this. Unfortunately, that migration isn't complete, and even the parts that are migrated still rely on the ability to break out of the TableGen model and directly expand an X-macro style `BUILTIN(...)` textually. I looked at trying to complete the move to TableGen, but it would both require the difficult migration of the remaining targets, and solving some tricky problems with how to move away from any macro-based expansion. I was also able to find a reasonably clean and effective way of doing this with the existing macros and some `constexpr` code that I think is clean enough to be a pretty good intermediate state, and maybe give a good target for the eventual TableGen solution. I was also able to factor the macros into set of consistent patterns that avoids a significant regression in overall boilerplate.	2024-12-08 19:00:14 -08:00
Matt Arsenault	a6fc489bb7	AMDGPU: Add gfx950 subtarget definitions (#116307 ) Mostly a stub, but adds some baseline tests and tests for removed instructions.	2024-11-18 10:41:14 -08:00
Shilei Tian	de0fd64bed	[AMDGPU] Introduce a new generic target `gfx9-4-generic` (#115190 ) This patch introduces a new generic target, `gfx9-4-generic`. Since it doesn’t support FP8 and XF32-related instructions, the patch includes several code reorganizations to accommodate these changes.	2024-11-12 23:11:05 -05:00
Carl Ritson	076aac59ac	[AMDGPU] Add a new target for gfx1153 (#113138 )	2024-10-23 12:56:58 +09:00
Artem Belevich	30a06e8022	[CUDA] Add support for CUDA-12.6 and sm_100 (#112028 ) This is a copy of #97402(with minor updates), which is now ready to land. --------- Co-authored-by: Sergey Kozub <skozub@nvidia.com>	2024-10-14 11:51:05 -07:00
Jakub Chlanda	ab20086422	[CUDA][NFC] CudaArch to OffloadArch rename (#97028 ) Rename `CudaArch` to `OffloadArch` to better reflect its content and the use. Apply a similar rename to helpers handling the enum.	2024-06-30 07:56:07 +02:00
Alex Voicu	9acb533c38	[clang][Driver] Add HIPAMD Driver support for AMDGCN flavoured SPIR-V (#95061 ) This patch augments the HIPAMD driver to allow it to target AMDGCN flavoured SPIR-V compilation. It's mostly straightforward, as we re-use some of the existing SPIRV infra, however there are a few notable additions: - we introduce an `amdgcnspirv` offload arch, rather than relying on using `generic` (this is already fairly overloaded) or simply using `spirv` or `spirv64` (we'll want to use these to denote unflavoured SPIRV, once we bring up that capability) - initially it is won't be possible to mix-in SPIR-V and concrete AMDGPU targets, as it would require some relatively intrusive surgery in the HIPAMD Toolchain and the Driver to deal with two triples (`spirv64-amd-amdhsa` and `amdgcn-amd-amdhsa`, respectively) - in order to retain user provided compiler flags and have them available at JIT time, we rely on embedding the command line via `-fembed-bitcode=marker`, which the bitcode writer had previously not implemented for SPIRV; we only allow it conditionally for AMDGCN flavoured SPIRV, and it is handled correctly by the Translator (it ends up as a string literal) Once the SPIRV BE is no longer experimental we'll switch to using that rather than the translator. There's some additional work that'll come via a separate PR around correctly piping through AMDGCN's implementation of `printf`, for now we merely handle its flags correctly.	2024-06-25 12:19:28 +01:00
Shilei Tian	1ca0055f45	[AMDGPU] Add a new target gfx1152 (#94534 )	2024-06-06 12:16:11 -04:00
Konstantin Zhuravlyov	2bfa26d30f	AMDGPU: Add missing gfx* generic targets handling in clang (NVPTX, OpenMP runtime) (#94483 )	2024-06-05 11:57:17 -04:00
Joseph Huber	9e7aab951f	[CUDA] Rename SM_32 to SM_32_ to work around AIX headers (#88779 ) Summary: AIX headers define this, so we need to work around it. In the future this will be removed but for now we should just rename it to avoid these issues.	2024-04-16 07:43:13 -05:00
Joseph Huber	53e96984b6	[NVPTX] Enable the _Float16 type for NVPTX compilation (#82436 ) Summary: The PTX target supports the f16 type natively and we alreaqdy have a few LLVM backend tests that support the LLVM-IR. We should be able to enable this for generic use. This is done prior the f16 math functions being written in the GPU libc case.	2024-02-20 18:12:27 -06:00
Joseph Huber	7155c1ef65	[NVPTX] Allow compiling LLVM-IR without `-march` set (#79873 ) Summary: The NVPTX tools require an architecture to be used, however if we are creating generic LLVM-IR we should be able to leave it unspecified. This will result in the `target-cpu` attributes not being set on the functions so it can be changed when linked into code. This allows the standalone `--target=nvptx64-nvidia-cuda` toolchain to create LLVM-IR simmilar to how CUDA's deviceRTL looks from C/C++	2024-01-30 21:44:43 -06:00
Jonas Paulsson	34dd8ec8ae	[clang, SystemZ] Support -munaligned-symbols (#73511 ) When this option is passed to clang, external (and/or weak) symbols are not assumed to have the minimum ABI alignment normally required. Symbols defined locally that are not weak are however still given the minimum alignment. This is implemented by passing a new parameter to getMinGlobalAlign() named HasNonWeakDef that is used to return the right alignment value. This is needed when external symbols created from a linker script may not get the ABI minimum alignment and must therefore be treated as unaligned by the compiler.	2024-01-27 18:29:37 +01:00
Kazu Hirata	f3dcc2351c	[clang] Use StringRef::{starts,ends}_with (NFC) (#75149 ) This patch replaces uses of StringRef::{starts,ends}with with StringRef::{starts,ends}_with for consistency with std::{string,string_view}::{starts,ends}_with in C++20. I'm planning to deprecate and eventually remove StringRef::{starts,ends}with.	2023-12-13 08:54:13 -08:00
Artem Belevich	631c6e834c	[CUDA] Add support for CUDA-12.3 and sm_90a (#74895 )	2023-12-11 12:18:28 -08:00
Jay Foad	cf1e0c0b07	[AMDGPU] Define new targets gfx1200 and gfx1201 (#73133 ) Define target names and ELF numbers for new GFX12 targets gfx1200 and gfx1201. For now they behave identically to GFX11.	2023-11-23 16:44:05 +00:00
Jay Foad	92542f2a40	[AMDGPU] Add targets gfx1150 and gfx1151 This is the target definition only. Currently they are treated the same as GFX 11.0.x. Differential Revision: https://reviews.llvm.org/D155429	2023-07-17 13:06:12 +01:00
Sergio Afonso	63ca93c7d1	[OpenMP][OMPIRBuilder] Rename IsEmbedded and IsTargetCodegen flags This patch renames the `OpenMPIRBuilderConfig` flags to reduce confusion over their meaning. `IsTargetCodegen` becomes `IsGPU`, whereas `IsEmbedded` becomes `IsTargetDevice`. The `-fopenmp-is-device` compiler option is also renamed to `-fopenmp-is-target-device` and the `omp.is_device` MLIR attribute is renamed to `omp.is_target_device`. Getters and setters of all these renamed properties are also updated accordingly. Many unit tests have been updated to use the new names, but an alias for the `-fopenmp-is-device` option is created so that external programs do not stop working after the name change. `IsGPU` is set when the target triple is AMDGCN or NVIDIA PTX, and it is only valid if `IsTargetDevice` is specified as well. `IsTargetDevice` is set by the `-fopenmp-is-target-device` compiler frontend option, which is only added to the OpenMP device invocation for offloading-enabled programs. Differential Revision: https://reviews.llvm.org/D154591	2023-07-10 14:14:16 +01:00
Konstantin Zhuravlyov	9d05727972	AMDGPU: Add basic gfx942 target Differential Revision: https://reviews.llvm.org/D149983	2023-05-10 11:51:06 -04:00
Konstantin Zhuravlyov	1fc70210a6	AMDGPU: Add basic gfx941 target Differential Revision: https://reviews.llvm.org/D149982	2023-05-10 11:51:06 -04:00
Stoorx	830b359d3a	[clang] Return std::unique_ptr<TargetInfo> from AllocateTarget In file 'clang/lib/Basic/Targets.cpp' the function 'AllocateTarget' had a raw pointer as a return type, which have been wrapped in the 'std::unique_ptr' in all usages. This commit changes the signature of the function to return an instance of 'std::unique_ptr' directly. Reviewed By: DavidSpickett Differential Revision: https://reviews.llvm.org/D148574	2023-04-18 10:07:26 +00:00
Joseph Huber	bed7005eb4	[NVPTX] Add __CUDA_ARCH__ macro to standalone NVPTX compilations We can now target the NVPTX architecture directly via `--target=nvptx64-nvidia-cuda`. This currently does not define the `__CUDA_ARCH__` macro with is used to allow code to target different codes based on support. This patch simply adds this support. Reviewed By: tra, jdoerfert Differential Revision: https://reviews.llvm.org/D146975	2023-03-27 18:08:15 -05:00
Joseph Huber	af54d1e852	[NVPTX] Set the atomic inling threshold when targeting NVPTX directly Since Clang 16.0.0 users can target the `NVPTX` architecture directly via `--target=nvptx64-nvidia-cuda`. However, this does not set the atomic inlining size correctly. This leads to spurious warnings and emission of runtime atomics that are never implemented. This patch ensures that we set this to the appropriate pointer width. This will always be 64 in the future as `nvptx64` will only be supported moving forward. Fixes: https://github.com/llvm/llvm-project/issues/61410 Reviewed By: tra Differential Revision: https://reviews.llvm.org/D146750	2023-03-23 16:30:07 -05:00
serge-sans-paille	5a7f47cc02	[clang] Optimize clang::Builtin::Info density Reorganize clang::Builtin::Info to have them naturally align on 4 bytes boundaries. Instead of storing builtin headers as a straight char pointer, enumerate them and store the enum. It allows to use a small enum instead of a pointer to reference them. On a 64 bit machine, this brings sizeof(clang::Builtin::Info) from 56 down to 48 bytes. On a release build on my Linux 64 bit machine, it shrinks the size of libclang-cpp.so by 193kB. The impact on performance is negligible in terms of instruction count, but the wall time seems better, see https://llvm-compile-time-tracker.com/compare.php?from=b3d8639f3536a4876b511aca9fb7948ff9266cee&to=a89b56423f98b550260a58c41e64aff9e56b76be&stat=task-clock Differential Revision: https://reviews.llvm.org/D142024	2023-01-23 14:27:44 +01:00
serge-sans-paille	a3c248db87	Move from llvm::makeArrayRef to ArrayRef deduction guides - clang/ part This is a follow-up to https://reviews.llvm.org/D140896, split into several parts as it touches a lot of files. Differential Revision: https://reviews.llvm.org/D141139	2023-01-09 12:15:24 +01:00
serge-sans-paille	d9ab3e82f3	[clang] Use a StringRef instead of a raw char pointer to store builtin and call information This avoids recomputing string length that is already known at compile time. It has a slight impact on preprocessing / compile time, see https://llvm-compile-time-tracker.com/compare.php?from=3f36d2d579d8b0e8824d9dd99bfa79f456858f88&to=e49640c507ddc6615b5e503144301c8e41f8f434&stat=instructions:u This a recommit of e953ae5bbc313fd0cc980ce021d487e5b5199ea4 and the subsequent fixes caa713559bd38f337d7d35de35686775e8fb5175 and 06b90e2e9c991e211fecc97948e533320a825470. The above patchset caused some version of GCC to take eons to compile clang/lib/Basic/Targets/AArch64.cpp, as spotted in aa171833ab0017d9732e82b8682c9848ab25ff9e. The fix is to make BuiltinInfo tables a compilation unit static variable, instead of a private static variable. Differential Revision: https://reviews.llvm.org/D139881	2022-12-27 09:55:19 +01:00
Alex Richardson	a602f76a24	[clang][TargetInfo] Use LangAS for getPointer{Width,Align}() Mixing LLVM and Clang address spaces can result in subtle bugs, and there is no need for this hook to use the LLVM IR level address spaces. Most of this change is just replacing zero with LangAS::Default, but it also allows us to remove a few calls to getTargetAddressSpace(). This also removes a stale comment+workaround in CGDebugInfo::CreatePointerLikeType(): ASTContext::getTypeSize() does return the expected size for ReferenceType (and handles address spaces). Differential Revision: https://reviews.llvm.org/D138295	2022-11-30 20:24:01 +00:00
Artem Belevich	0e8a414ab3	[CUDA, NVPTX] Added basic __bf16 support for NVPTX. Recent Clang changes expose _bf16 types for SSE2-enabled host compilations and that makes those types visible furing GPU-side compilation, where it currently fails with Sema complaining that __bf16 is not supported. Considering that __bf16 is a storage-only type, enabling it for NVPTX if it's enabled on the host should pose no issues, correctness-wise. Recent NVIDIA GPUs have introduced bf16 support, so we'll likely grow better support for __bf16 on NVPTX going forward. Differential Revision: https://reviews.llvm.org/D136311	2022-10-25 11:08:06 -07:00
Artem Belevich	9a01cca660	Add support for CUDA-11.8 and sm_{87,89,90} GPUs. Differential Revision: https://reviews.llvm.org/D135306	2022-10-07 13:59:28 -07:00
Artem Belevich	f3a2cbcf97	Refactored CUDA version housekeeping to use less boilerplate. Differential Revision: https://reviews.llvm.org/D135328	2022-10-07 13:59:23 -07:00
Joseph Huber	002a63f937	[OpenMP] Add `__CUDA_ARCH__` definition when offloading with OpenMP Currently we define the `__CUDA_ARCH__` macro only in CUDA mode. This patch allows us to use this macro in OpenMP-offloading mode when targeting NVPTX. Reviewed By: tra, tianshilei1992 Differential Revision: https://reviews.llvm.org/D125256	2022-05-13 14:38:35 -04:00
Joe Nash	8bdfc73f63	[AMDGPU][clang] Definition of gfx11 subtarget Contributors: Jay Foad <jay.foad@amd.com> Konstantin Zhuravlyov <kzhuravl_dev@outlook.com> Patch 2/N for upstreaming of AMDGPU gfx11 architecture Depends on D124536 Reviewed By: foad, kzhuravl, #amdgpu, arsenm Differential Revision: https://reviews.llvm.org/D124537	2022-04-29 13:55:56 -04:00
Aakanksha	840695814a	[AMDGPU] Add gfx1036 target Differential Revision: https://reviews.llvm.org/D120846	2022-03-02 23:26:38 +00:00
Stanislav Mekhanoshin	2e2e64df4a	[AMDGPU] Add gfx940 target This is target definition only. Differential Revision: https://reviews.llvm.org/D120688	2022-03-02 13:54:48 -08:00
Yaxun (Sam) Liu	a6786cdd57	[HIPSPV][3/4] Enable SPIR-V emission for HIP This patch enables SPIR-V binary emission for HIP device code via the HIPSPV tool chain. ‘--offload’ option, which is envisioned in [1], is added for specifying offload targets. This option is used to override default device target (amdgcn-amd-amdhsa) for HIP compilation for emitting device code as SPIR-V binary. The option is handled in getHIPOffloadTargetTriple(). getOffloadingDeviceToolChain() function (based on the design in the SYCL repository) is added to select HIPSPVToolChain when HIP offload target is ‘spirv64’. The HIPActionBuilder is modified to produce LLVM IR at the backend phase. HIPSPV tool chain expects to receive HIP device code as LLVM IR so it can run external LLVM passes over them. HIPSPV TC is also responsible for emitting the SPIR-V binary. A Cuda GPU architecture ‘generic’ is added. The name is picked from the LLVM SPIR-V Backend. In the HIPSPV code path the architecture name is inserted to the bundle entry ID as target ID. Target ID is expected to be always present so a component in the target triple is not mistaken as target ID. Tests are added for checking the HIPSPV tool chain. [1]: https://lists.llvm.org/pipermail/cfe-dev/2020-December/067362.html Patch by: Henry Linjamäki Reviewed by: Yaxun Liu, Artem Belevich, Alexey Bader Differential Revision: https://reviews.llvm.org/D110622	2021-12-20 10:45:09 -05:00
Carlos Galvez	7ecec3f0f5	[CUDA] Bump supported CUDA version to 11.5 Differential Revision: https://reviews.llvm.org/D113249	2021-11-09 08:20:53 +00:00
Artem Belevich	49d982d8cb	[CUDA] Add support for CUDA-11.4 Differential Revision: https://reviews.llvm.org/D108239	2021-08-23 13:24:46 -07:00
Jon Chesterfield	c2574e63ff	[openmp][nfc] Refactor GridValues Remove redundant fields and replace pointer with virtual function Of fourteen fields, three are dead and four can be computed from the remainder. This leaves a couple of currently dead fields in place as they are expected to be used from the deviceRTL shortly. Two of the fields that can be computed are only used from codegen and require a log2() implementation so are inlined into codegen instead. This change leaves the new methods in the same location in the struct as the previous fields for convenience at review. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D108380	2021-08-23 16:19:11 +01:00
Jon Chesterfield	b1efeface7	Revert "[openmp][nfc] Refactor GridValues" Failed a nvptx codegen test This reverts commit 2a47a84b40115b01e03e4d89c1d47ba74beb7bf3.	2021-08-20 18:17:27 +01:00
Jon Chesterfield	2a47a84b40	[openmp][nfc] Refactor GridValues Remove redundant fields and replace pointer with virtual function Of fourteen fields, three are dead and four can be computed from the remainder. This leaves a couple of currently dead fields in place as they are expected to be used from the deviceRTL shortly. Two of the fields that can be computed are only used from codegen and require a log2() implementation so are inlined into codegen instead. This change leaves the new methods in the same location in the struct as the previous fields for convenience at review. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D108380	2021-08-20 16:41:26 +01:00

1 2

85 Commits