85 Commits

Author SHA1 Message Date
Justin Cai
faf4e8af74
[Clang][SYCL] Add initial set of Intel OffloadArch values (#138158)
Following #137070, this PR adds an initial set of Intel `OffloadArch`
values with corresponding predicates that will be used in SYCL
offloading. More Intel architectures will be added in a future PR.
2025-05-01 16:29:48 -05:00
modiking
d6a68be7af
[NVPTX] Add support for Shared Cluster Memory address space [1/2] (#135444)
Adds support for new Shared Cluster Memory Address Space
(SHARED_CLUSTER, addrspace 7). See
https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#distributed-shared-memory
for details.

1. Update address space structures and datalayout to contain the new
space
2. Add new intrinsics that use this new address space
3. Update NVPTX alias analysis

The existing intrinsics are updated in
https://github.com/llvm/llvm-project/pull/136768
2025-04-22 15:14:39 -07:00
Sebastian Jodłowski
0127f169dc
[CUDA] Add support for sm101 and sm120 target architectures (#127187)
Add support for sm101 and sm120 target architectures. It requires CUDA
12.8.

---------

Co-authored-by: Sebastian Jodlowski <sjodlowski@nuro.ai>
2025-02-19 14:41:07 -08:00
Fabian Ritter
029c8e783d
[AMDGPU][clang] Replace gfx940 and gfx941 with gfx942 in clang (#126762)
gfx940 and gfx941 are no longer supported. This is one of a series of
PRs to remove them from the code base.

This PR removes all occurrences of gfx940/gfx941 from clang that can be
removed without changes in the llvm directory. The
target-invalid-cpu-note/amdgcn.c test is not included here since it
tests a list of targets that is defined in
llvm/lib/TargetParser/TargetParser.cpp.

For SWDEV-512631
2025-02-19 10:11:48 +01:00
Chandler Carruth
212ecb9d5c [StrTable] Teach main builtin TableGen to use direct enums, strings, and info
This moves the main builtins and several targets to use nice generated
string tables and info structures rather than X-macros. Even without
obvious prefixes factored out, the resulting tables are significantly
smaller and much cheaper to compile with out all the X-macro overhead.

This leaves the X-macros in place for atomic builtins which have a wide
range of uses that don't seem reasonable to fold into TableGen.

As future work, these should move to their own file (whether as X-macros
or just generated patterns) so the AST headers don't have to include all
the data for other builtins.
2025-02-04 18:04:58 +00:00
Chandler Carruth
cd269fee05 [StrTable] Switch Clang builtins to use string tables
This both reapplies #118734, the initial attempt at this, and updates it
significantly.

First, it uses the newly added `StringTable` abstraction for string
tables, and simplifies the construction to build the string table and
info arrays separately. This should reduce any `constexpr` compile time
memory or CPU cost of the original PR while significantly improving the
APIs throughout.

It also restructures the builtins to support sharding across several
independent tables. This accomplishes two improvements from the
original PR:

1) It improves the APIs used significantly.

2) When builtins are defined from different sources (like SVE vs MVE in
   AArch64), this allows each of them to build their own string table
   independently rather than having to merge the string tables and info
   structures.

3) It allows each shard to factor out a common prefix, often cutting the
   size of the strings needed for the builtins by a factor two.

The second point is important both to allow different mechanisms of
construction (for example a `.def` file and a tablegen'ed `.inc` file,
or different tablegen'ed `.inc files), it also simply reduces the sizes
of these tables which is valuable given how large they are in some
cases. The third builds on that size reduction.

Initially, we use this new sharding rather than merging tables in
AArch64, LoongArch, RISCV, and X86. Mostly this helps ensure the system
works, as without further changes these still push scaling limits.
Subsequent commits will more deeply leverage the new structure,
including using the prefix capabilities which cannot be easily factored
out here and requires deep changes to the targets.
2025-02-04 18:04:57 +00:00
Durgadoss R
91cb8f5d32
[NVPTX] Add tcgen05 alloc/dealloc intrinsics (#124961)
This patch adds intrinsics for the tcgen05 alloc/dealloc
family of PTX instructions. This patch also adds an
addrspace 6 for tensor memory which is used by
these intrinsics.

lit tests are added and verified with a ptxas-12.8 executable.

Documentation for these additions is also added in NVPTXUsage.rst.

Signed-off-by: Durgadoss R <durgadossr@nvidia.com>
2025-02-04 14:31:40 +05:30
Chandler Carruth
b968fd9502
[StrTable] Mechanically convert NVPTX builtins to use TableGen (#122873)
This switches them to use tho common TableGen layer, extending it to
support the missing features needed by the NVPTX backend.

The biggest thing was to build a TableGen system that computes the
cumulative SM and PTX feature sets the same way the macros did. That's
done with some string concatenation tricks in TableGen, but they worked
out pretty neatly and are very comparable in complexity to the macro
version.

Then the actual defines were mapped over using a very hacky Python
script. It was never productionized or intended to work in the future,
but for posterity:

https://gist.github.com/chandlerc/10bdf8fb1312e252b4a501bace184b66

Last but not least, there was a very odd "bug" in one of the converted
builtins' prototype in the TableGen model: it didn't handle uses of `Z`
and `U` both as *qualifiers* of a single type, treating `Z` as its own
`int32_t` type. So my hacky Python script converted `ZUi` into two
types, an `int32_t` and an `unsigned int`. This produced a very wrong
prototype. But the tests caught this nicely and I fixed it manually
rather than trying to improve the Python script as it occurred in
exactly one place I could find.

This should provide direct benefits of allowing future refactorings to
more directly leverage TableGen to express builtins more structurally
rather than textually. It will also make my efforts to move builtins to
string tables significantly more effective for the NVPTX backend where
the X-macro approach resulted in *significantly* less efficient string
tables than other targets due to the long repeated feature strings.
2025-01-27 22:45:37 -08:00
Sergey Kozub
616979ebd7
[NVPTX] Add support for PTX 8.6 and CUDA 12.6 (12.8) (#123398)
Add CUDA versions 12.7, 12.8, 12.9 which support PTX8.6+ (enables using Blackwell-specific instructions).
2025-01-21 11:00:24 +01:00
Chandler Carruth
ca79ff07d8
Revert "Switch builtin strings to use string tables" (#119638)
Reverts llvm/llvm-project#118734

There are currently some specific versions of MSVC that are miscompiling
this code (we think). We don't know why as all the other build bots and
at least some folks' local Windows builds work fine.

This is a candidate revert to help the relevant folks catch their
builders up and have time to debug the issue. However, the expectation
is to roll forward at some point with a workaround if at all possible.
2024-12-13 23:58:48 -08:00
Chandler Carruth
be2df95e92
Switch builtin strings to use string tables (#118734)
The Clang binary (and any binary linking Clang as a library), when built
using PIE, ends up with a pretty shocking number of dynamic relocations
to apply to the executable image: roughly 400k.

Each of these takes up binary space in the executable, and perhaps most
interestingly takes start-up time to apply the relocations.

The largest pattern I identified were the strings used to describe
target builtins. The addresses of these string literals were stored into
huge arrays, each one requiring a dynamic relocation. The way to avoid
this is to design the target builtins to use a single large table of
strings and offsets within the table for the individual strings. This
switches the builtin management to such a scheme.

This saves over 100k dynamic relocations by my measurement, an over 25%
reduction. Just looking at byte size improvements, using the `bloaty`
tool to compare a newly built `clang` binary to an old one:

```
    FILE SIZE        VM SIZE
 --------------  --------------
  +1.4%  +653Ki  +1.4%  +653Ki    .rodata
  +0.0%    +960  +0.0%    +960    .text
  +0.0%    +197  +0.0%    +197    .dynstr
  +0.0%    +184  +0.0%    +184    .eh_frame
  +0.0%     +96  +0.0%     +96    .dynsym
  +0.0%     +40  +0.0%     +40    .eh_frame_hdr
  +114%     +32  [ = ]       0    [Unmapped]
  +0.0%     +20  +0.0%     +20    .gnu.hash
  +0.0%      +8  +0.0%      +8    .gnu.version
  +0.9%      +7  +0.9%      +7    [LOAD #2 [R]]
  [ = ]       0 -75.4% -3.00Ki    .relro_padding
 -16.1%  -802Ki -16.1%  -802Ki    .data.rel.ro
 -27.3% -2.52Mi -27.3% -2.52Mi    .rela.dyn
  -1.6% -2.66Mi  -1.6% -2.66Mi    TOTAL
```

We get a 16% reduction in the `.data.rel.ro` section, and nearly 30%
reduction in `.rela.dyn` where those reloctaions are stored.

This is also visible in my benchmarking of binary start-up overhead at
least:

```
Benchmark 1: ./old_clang --version
  Time (mean ± σ):      17.6 ms ±   1.5 ms    [User: 4.1 ms, System: 13.3 ms]
  Range (min … max):    14.2 ms …  22.8 ms    162 runs

Benchmark 2: ./new_clang --version
  Time (mean ± σ):      15.5 ms ±   1.4 ms    [User: 3.6 ms, System: 11.8 ms]
  Range (min … max):    12.4 ms …  20.3 ms    216 runs

Summary
  './new_clang --version' ran
    1.13 ± 0.14 times faster than './old_clang --version'
```

We get about 2ms faster `--version` runs. While there is a lot of noise
in binary execution time, this delta is pretty consistent, and
represents over 10% improvement. This is particularly interesting to me
because for very short source files, repeatedly starting the `clang`
binary is actually the dominant cost. For example, `configure` scripts
running against the `clang` compiler are slow in large part because of
binary start up time, not the time to process the actual inputs to the
compiler.

----

This PR implements the string tables using `constexpr` code and the
existing macro system. I understand that the builtins are moving towards
a TableGen model, and if complete that would provide more options for
modeling this. Unfortunately, that migration isn't complete, and even
the parts that are migrated still rely on the ability to break out of
the TableGen model and directly expand an X-macro style `BUILTIN(...)`
textually. I looked at trying to complete the move to TableGen, but it
would both require the difficult migration of the remaining targets, and
solving some tricky problems with how to move away from any macro-based
expansion.

I was also able to find a reasonably clean and effective way of doing
this with the existing macros and some `constexpr` code that I think is
clean enough to be a pretty good intermediate state, and maybe give a
good target for the eventual TableGen solution. I was also able to
factor the macros into set of consistent patterns that avoids a
significant regression in overall boilerplate.
2024-12-08 19:00:14 -08:00
Matt Arsenault
a6fc489bb7
AMDGPU: Add gfx950 subtarget definitions (#116307)
Mostly a stub, but adds some baseline tests and
tests for removed instructions.
2024-11-18 10:41:14 -08:00
Shilei Tian
de0fd64bed
[AMDGPU] Introduce a new generic target gfx9-4-generic (#115190)
This patch introduces a new generic target, `gfx9-4-generic`. Since it doesn’t support FP8 and XF32-related instructions, the patch includes several code reorganizations to accommodate these changes.
2024-11-12 23:11:05 -05:00
Carl Ritson
076aac59ac
[AMDGPU] Add a new target for gfx1153 (#113138) 2024-10-23 12:56:58 +09:00
Artem Belevich
30a06e8022
[CUDA] Add support for CUDA-12.6 and sm_100 (#112028)
This is a copy of #97402(with minor updates), which is now ready to land.

---------

Co-authored-by: Sergey Kozub <skozub@nvidia.com>
2024-10-14 11:51:05 -07:00
Jakub Chlanda
ab20086422
[CUDA][NFC] CudaArch to OffloadArch rename (#97028)
Rename `CudaArch` to `OffloadArch` to better reflect its content and the
use.
Apply a similar rename to helpers handling the enum.
2024-06-30 07:56:07 +02:00
Alex Voicu
9acb533c38
[clang][Driver] Add HIPAMD Driver support for AMDGCN flavoured SPIR-V (#95061)
This patch augments the HIPAMD driver to allow it to target AMDGCN
flavoured SPIR-V compilation. It's mostly straightforward, as we re-use
some of the existing SPIRV infra, however there are a few notable
additions:

- we introduce an `amdgcnspirv` offload arch, rather than relying on
using `generic` (this is already fairly overloaded) or simply using
`spirv` or `spirv64` (we'll want to use these to denote unflavoured
SPIRV, once we bring up that capability)
- initially it is won't be possible to mix-in SPIR-V and concrete AMDGPU
targets, as it would require some relatively intrusive surgery in the
HIPAMD Toolchain and the Driver to deal with two triples
(`spirv64-amd-amdhsa` and `amdgcn-amd-amdhsa`, respectively)
- in order to retain user provided compiler flags and have them
available at JIT time, we rely on embedding the command line via
`-fembed-bitcode=marker`, which the bitcode writer had previously not
implemented for SPIRV; we only allow it conditionally for AMDGCN
flavoured SPIRV, and it is handled correctly by the Translator (it ends
up as a string literal)

Once the SPIRV BE is no longer experimental we'll switch to using that
rather than the translator. There's some additional work that'll come
via a separate PR around correctly piping through AMDGCN's
implementation of `printf`, for now we merely handle its flags
correctly.
2024-06-25 12:19:28 +01:00
Shilei Tian
1ca0055f45
[AMDGPU] Add a new target gfx1152 (#94534) 2024-06-06 12:16:11 -04:00
Konstantin Zhuravlyov
2bfa26d30f
AMDGPU: Add missing gfx* generic targets handling in clang (NVPTX, OpenMP runtime) (#94483) 2024-06-05 11:57:17 -04:00
Joseph Huber
9e7aab951f
[CUDA] Rename SM_32 to SM_32_ to work around AIX headers (#88779)
Summary:
AIX headers define this, so we need to work around it. In the future
this will be removed but for now we should just rename it to avoid these
issues.
2024-04-16 07:43:13 -05:00
Joseph Huber
53e96984b6
[NVPTX] Enable the _Float16 type for NVPTX compilation (#82436)
Summary:
The PTX target supports the f16 type natively and we alreaqdy have a few
LLVM backend tests that support the LLVM-IR. We should be able to enable
this for generic use. This is done prior the f16 math functions being
written in the GPU libc case.
2024-02-20 18:12:27 -06:00
Joseph Huber
7155c1ef65
[NVPTX] Allow compiling LLVM-IR without -march set (#79873)
Summary:
The NVPTX tools require an architecture to be used, however if we are
creating generic LLVM-IR we should be able to leave it unspecified. This
will result in the `target-cpu` attributes not being set on the
functions so it can be changed when linked into code. This allows the
standalone `--target=nvptx64-nvidia-cuda` toolchain to create LLVM-IR
simmilar to how CUDA's deviceRTL looks from C/C++
2024-01-30 21:44:43 -06:00
Jonas Paulsson
34dd8ec8ae
[clang, SystemZ] Support -munaligned-symbols (#73511)
When this option is passed to clang, external (and/or weak) symbols
are not assumed to have the minimum ABI alignment normally required.
Symbols defined locally that are not weak are however still given the
minimum alignment.

This is implemented by passing a new parameter to getMinGlobalAlign()
named HasNonWeakDef that is used to return the right alignment value.

This is needed when external symbols created from a linker script may
not get the ABI minimum alignment and must therefore be treated as
unaligned by the compiler.
2024-01-27 18:29:37 +01:00
Kazu Hirata
f3dcc2351c
[clang] Use StringRef::{starts,ends}_with (NFC) (#75149)
This patch replaces uses of StringRef::{starts,ends}with with
StringRef::{starts,ends}_with for consistency with
std::{string,string_view}::{starts,ends}_with in C++20.

I'm planning to deprecate and eventually remove
StringRef::{starts,ends}with.
2023-12-13 08:54:13 -08:00
Artem Belevich
631c6e834c
[CUDA] Add support for CUDA-12.3 and sm_90a (#74895) 2023-12-11 12:18:28 -08:00
Jay Foad
cf1e0c0b07
[AMDGPU] Define new targets gfx1200 and gfx1201 (#73133)
Define target names and ELF numbers for new GFX12 targets gfx1200 and
gfx1201. For now they behave identically to GFX11.
2023-11-23 16:44:05 +00:00
Jay Foad
92542f2a40 [AMDGPU] Add targets gfx1150 and gfx1151
This is the target definition only. Currently they are treated the same
as GFX 11.0.x.

Differential Revision: https://reviews.llvm.org/D155429
2023-07-17 13:06:12 +01:00
Sergio Afonso
63ca93c7d1
[OpenMP][OMPIRBuilder] Rename IsEmbedded and IsTargetCodegen flags
This patch renames the `OpenMPIRBuilderConfig` flags to reduce confusion over
their meaning. `IsTargetCodegen` becomes `IsGPU`, whereas `IsEmbedded` becomes
`IsTargetDevice`. The `-fopenmp-is-device` compiler option is also renamed to
`-fopenmp-is-target-device` and the `omp.is_device` MLIR attribute is renamed
to `omp.is_target_device`. Getters and setters of all these renamed properties
are also updated accordingly. Many unit tests have been updated to use the new
names, but an alias for the `-fopenmp-is-device` option is created so that
external programs do not stop working after the name change.

`IsGPU` is set when the target triple is AMDGCN or NVIDIA PTX, and it is only
valid if `IsTargetDevice` is specified as well. `IsTargetDevice` is set by the
`-fopenmp-is-target-device` compiler frontend option, which is only added to
the OpenMP device invocation for offloading-enabled programs.

Differential Revision: https://reviews.llvm.org/D154591
2023-07-10 14:14:16 +01:00
Konstantin Zhuravlyov
9d05727972 AMDGPU: Add basic gfx942 target
Differential Revision: https://reviews.llvm.org/D149983
2023-05-10 11:51:06 -04:00
Konstantin Zhuravlyov
1fc70210a6 AMDGPU: Add basic gfx941 target
Differential Revision: https://reviews.llvm.org/D149982
2023-05-10 11:51:06 -04:00
Stoorx
830b359d3a [clang] Return std::unique_ptr<TargetInfo> from AllocateTarget
In file 'clang/lib/Basic/Targets.cpp' the function 'AllocateTarget' had a raw pointer as a return type, which have been wrapped in the 'std::unique_ptr' in all usages.
This commit changes the signature of the function to return an instance of 'std::unique_ptr' directly.

Reviewed By: DavidSpickett

Differential Revision: https://reviews.llvm.org/D148574
2023-04-18 10:07:26 +00:00
Joseph Huber
bed7005eb4 [NVPTX] Add __CUDA_ARCH__ macro to standalone NVPTX compilations
We can now target the NVPTX architecture directly via
`--target=nvptx64-nvidia-cuda`. This currently does not define the
`__CUDA_ARCH__` macro with is used to allow code to target different
codes based on support. This patch simply adds this support.

Reviewed By: tra, jdoerfert

Differential Revision: https://reviews.llvm.org/D146975
2023-03-27 18:08:15 -05:00
Joseph Huber
af54d1e852 [NVPTX] Set the atomic inling threshold when targeting NVPTX directly
Since Clang 16.0.0 users can target the `NVPTX` architecture directly
via `--target=nvptx64-nvidia-cuda`. However, this does not set the
atomic inlining size correctly. This leads to spurious warnings and
emission of runtime atomics that are never implemented. This patch
ensures that we set this to the appropriate pointer width. This will
always be 64 in the future as `nvptx64` will only be supported moving
forward.

Fixes: https://github.com/llvm/llvm-project/issues/61410

Reviewed By: tra

Differential Revision: https://reviews.llvm.org/D146750
2023-03-23 16:30:07 -05:00
serge-sans-paille
5a7f47cc02
[clang] Optimize clang::Builtin::Info density
Reorganize clang::Builtin::Info to have them naturally align on 4 bytes
boundaries.

Instead of storing builtin headers as a straight char pointer, enumerate
them and store the enum. It allows to use a small enum instead of a
pointer to reference them.

On a 64 bit machine, this brings sizeof(clang::Builtin::Info) from 56
down to 48 bytes.

On a release build on my Linux 64 bit machine, it shrinks the size of
libclang-cpp.so by 193kB.

The impact on performance is negligible in terms of instruction count,
but the wall time seems better, see
https://llvm-compile-time-tracker.com/compare.php?from=b3d8639f3536a4876b511aca9fb7948ff9266cee&to=a89b56423f98b550260a58c41e64aff9e56b76be&stat=task-clock

Differential Revision: https://reviews.llvm.org/D142024
2023-01-23 14:27:44 +01:00
serge-sans-paille
a3c248db87
Move from llvm::makeArrayRef to ArrayRef deduction guides - clang/ part
This is a follow-up to https://reviews.llvm.org/D140896, split into
several parts as it touches a lot of files.

Differential Revision: https://reviews.llvm.org/D141139
2023-01-09 12:15:24 +01:00
serge-sans-paille
d9ab3e82f3
[clang] Use a StringRef instead of a raw char pointer to store builtin and call information
This avoids recomputing string length that is already known at compile time.

It has a slight impact on preprocessing / compile time, see

https://llvm-compile-time-tracker.com/compare.php?from=3f36d2d579d8b0e8824d9dd99bfa79f456858f88&to=e49640c507ddc6615b5e503144301c8e41f8f434&stat=instructions:u

This a recommit of e953ae5bbc313fd0cc980ce021d487e5b5199ea4 and the subsequent fixes caa713559bd38f337d7d35de35686775e8fb5175 and 06b90e2e9c991e211fecc97948e533320a825470.

The above patchset caused some version of GCC to take eons to compile clang/lib/Basic/Targets/AArch64.cpp, as spotted in aa171833ab0017d9732e82b8682c9848ab25ff9e.
The fix is to make BuiltinInfo tables a compilation unit static variable, instead of a private static variable.

Differential Revision: https://reviews.llvm.org/D139881
2022-12-27 09:55:19 +01:00
Alex Richardson
a602f76a24 [clang][TargetInfo] Use LangAS for getPointer{Width,Align}()
Mixing LLVM and Clang address spaces can result in subtle bugs, and there
is no need for this hook to use the LLVM IR level address spaces.
Most of this change is just replacing zero with LangAS::Default,
but it also allows us to remove a few calls to getTargetAddressSpace().

This also removes a stale comment+workaround in
CGDebugInfo::CreatePointerLikeType(): ASTContext::getTypeSize() does
return the expected size for ReferenceType (and handles address spaces).

Differential Revision: https://reviews.llvm.org/D138295
2022-11-30 20:24:01 +00:00
Artem Belevich
0e8a414ab3 [CUDA, NVPTX] Added basic __bf16 support for NVPTX.
Recent Clang changes expose _bf16 types for SSE2-enabled host compilations and
that makes those types visible furing GPU-side compilation, where it currently
fails with Sema complaining that __bf16 is not supported.

Considering that __bf16 is a storage-only type, enabling it for NVPTX if it's
enabled on the host should pose no issues, correctness-wise.

Recent NVIDIA GPUs have introduced bf16 support, so we'll likely grow better
support for __bf16 on NVPTX going forward.

Differential Revision: https://reviews.llvm.org/D136311
2022-10-25 11:08:06 -07:00
Artem Belevich
9a01cca660 Add support for CUDA-11.8 and sm_{87,89,90} GPUs.
Differential Revision: https://reviews.llvm.org/D135306
2022-10-07 13:59:28 -07:00
Artem Belevich
f3a2cbcf97 Refactored CUDA version housekeeping to use less boilerplate.
Differential Revision: https://reviews.llvm.org/D135328
2022-10-07 13:59:23 -07:00
Joseph Huber
002a63f937 [OpenMP] Add __CUDA_ARCH__ definition when offloading with OpenMP
Currently we define the `__CUDA_ARCH__` macro only in CUDA mode. This
patch allows us to use this macro in OpenMP-offloading mode when
targeting NVPTX.

Reviewed By: tra, tianshilei1992

Differential Revision: https://reviews.llvm.org/D125256
2022-05-13 14:38:35 -04:00
Joe Nash
8bdfc73f63 [AMDGPU][clang] Definition of gfx11 subtarget
Contributors:
Jay Foad <jay.foad@amd.com>
Konstantin Zhuravlyov <kzhuravl_dev@outlook.com>

Patch 2/N for upstreaming of AMDGPU gfx11 architecture

Depends on D124536

Reviewed By: foad, kzhuravl, #amdgpu, arsenm

Differential Revision: https://reviews.llvm.org/D124537
2022-04-29 13:55:56 -04:00
Aakanksha
840695814a [AMDGPU] Add gfx1036 target
Differential Revision: https://reviews.llvm.org/D120846
2022-03-02 23:26:38 +00:00
Stanislav Mekhanoshin
2e2e64df4a [AMDGPU] Add gfx940 target
This is target definition only.

Differential Revision: https://reviews.llvm.org/D120688
2022-03-02 13:54:48 -08:00
Yaxun (Sam) Liu
a6786cdd57 [HIPSPV][3/4] Enable SPIR-V emission for HIP
This patch enables SPIR-V binary emission for HIP device code via the
HIPSPV tool chain.

‘--offload’ option, which is envisioned in [1], is added for specifying
offload targets. This option is used to override default device target
(amdgcn-amd-amdhsa) for HIP compilation for emitting device code as
SPIR-V binary. The option is handled in getHIPOffloadTargetTriple().

getOffloadingDeviceToolChain() function (based on the design in the
SYCL repository) is added to select HIPSPVToolChain when HIP offload
target is ‘spirv64’.

The HIPActionBuilder is modified to produce LLVM IR at the backend
phase. HIPSPV tool chain expects to receive HIP device code as LLVM
IR so it can run external LLVM passes over them. HIPSPV TC is also
responsible for emitting the SPIR-V binary.

A Cuda GPU architecture ‘generic’ is added. The name is picked from
the LLVM SPIR-V Backend. In the HIPSPV code path the architecture
name is inserted to the bundle entry ID as target ID. Target ID is
expected to be always present so a component in the target triple
is not mistaken as target ID.

Tests are added for checking the HIPSPV tool chain.

[1]: https://lists.llvm.org/pipermail/cfe-dev/2020-December/067362.html

Patch by: Henry Linjamäki

Reviewed by: Yaxun Liu, Artem Belevich, Alexey Bader

Differential Revision: https://reviews.llvm.org/D110622
2021-12-20 10:45:09 -05:00
Carlos Galvez
7ecec3f0f5 [CUDA] Bump supported CUDA version to 11.5
Differential Revision: https://reviews.llvm.org/D113249
2021-11-09 08:20:53 +00:00
Artem Belevich
49d982d8cb [CUDA] Add support for CUDA-11.4
Differential Revision: https://reviews.llvm.org/D108239
2021-08-23 13:24:46 -07:00
Jon Chesterfield
c2574e63ff [openmp][nfc] Refactor GridValues
Remove redundant fields and replace pointer with virtual function

Of fourteen fields, three are dead and four can be computed from the
remainder. This leaves a couple of currently dead fields in place as
they are expected to be used from the deviceRTL shortly. Two of the
fields that can be computed are only used from codegen and require a
log2() implementation so are inlined into codegen instead.

This change leaves the new methods in the same location in the struct
as the previous fields for convenience at review.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D108380
2021-08-23 16:19:11 +01:00
Jon Chesterfield
b1efeface7 Revert "[openmp][nfc] Refactor GridValues"
Failed a nvptx codegen test
This reverts commit 2a47a84b40115b01e03e4d89c1d47ba74beb7bf3.
2021-08-20 18:17:27 +01:00
Jon Chesterfield
2a47a84b40 [openmp][nfc] Refactor GridValues
Remove redundant fields and replace pointer with virtual function

Of fourteen fields, three are dead and four can be computed from the
remainder. This leaves a couple of currently dead fields in place as
they are expected to be used from the deviceRTL shortly. Two of the
fields that can be computed are only used from codegen and require a
log2() implementation so are inlined into codegen instead.

This change leaves the new methods in the same location in the struct
as the previous fields for convenience at review.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D108380
2021-08-20 16:41:26 +01:00