39 Commits

Author SHA1 Message Date
Arvind Sudarsanam
6cfec29cb9
[Offload][SYCL] Refactor OffloadKind implementation (#135809)
Following are the changes:

1. Make OffloadKind enum values to be powers of two so we can use them
like a bitfield
2. Include OFK_SYCL enum value
3. Modify ActiveOffloadKinds support in clang-linker-wrapper to use
bitfields instead of a vector.

Thanks

---------

Signed-off-by: Arvind Sudarsanam <arvind.sudarsanam@intel.com>
2025-04-16 14:13:30 +00:00
Joseph Huber
f1e917d07b
[Offload] Unify offloading entries into a single section (#125731)
Summary:
This patch unifies the existing offloading entires into a single section
called `llvm_offload_entires`. This lets us use a more unified
offloading infrastructure so that all targets share the same handling.
The effect is that people in the runtimes now need to check if the kind
is what they expect, but the expectation is that you can combine
multiple potential providers into a compile job. Doesn't fully work
yet because of other runtime issues, but some day. Mostly this helps the
future of liboffload where we want to handle different languages than
OpenMP.
2025-02-06 08:24:01 -06:00
Joseph Huber
13dcc95dcd
[Offload] Rework offloading entry type to be more generic (#124018)
Summary:
The previous offloading entry type did not fit the current use-cases
very well. This widens it and adds a version to prevent further
annoyances. It also includes the kind to better sort who's using it.

The first 64-bytes are reserved as zero so the OpenMP runtime can detect
the old format for binary compatibilitry.
2025-01-28 07:26:13 -06:00
Joseph Huber
70a16b90ff
[HIP] Support managed variables using the new driver (#123437)
Summary:
Previously, managed variables didn't work in rdc mode using the new
driver because we just didn't register them. This was previously ignored
because we didn't have enough space in the current struct format. This
patch amends that by just emitting a struct pair for the two variables
and using the single pointer.

In the future, a more extensible entry format would be nice, but that
can be done later.
2025-01-22 09:13:14 -06:00
Nikita Popov
63dc31b68b Reapply [IR] Avoid creating icmp/fcmp constant expressions (#92885)
Reapply after https://github.com/llvm/llvm-project/pull/93548,
which should address the lldb failure on macos.

-----

Do not create icmp/fcmp constant expressions in IRBuilder etc anymore,
i.e. treat them as "undesirable". This is in preparation for removing
them entirely.

Part of:
https://discourse.llvm.org/t/rfc-remove-most-constant-expressions/63179
2024-05-31 08:55:59 +02:00
Nikita Popov
d10b76552f
[ConstantFold] Remove notional over-indexing fold (#93697)
The data-layout independent constant folding currently has some rather
gnarly code for canonicalizing GEP indices to reduce "notional
overindexing", and then infers inbounds based on that canonicalization.

Now that we canonicalize to i8 GEPs, this canonicalization is
essentially useless, as we'll discard it as soon as the GEP hits the
data-layout aware constant folder anyway. As such, I'd like to remove
this code entirely.

This shouldn't have any impact on optimization capabilities.
2024-05-30 08:36:44 +02:00
Daniel Thornburgh
8baf96f306
Revert "[IR] Avoid creating icmp/fcmp constant expressions" (#93087)
Reverts llvm/llvm-project#92885 due to LLDB CI breakages.
2024-05-22 11:27:55 -07:00
Nikita Popov
108575f02e
[IR] Avoid creating icmp/fcmp constant expressions (#92885)
Do not create icmp/fcmp constant expressions in IRBuilder etc anymore,
i.e. treat them as "undesirable". This is in preparation for removing
them entirely.

Part of:
https://discourse.llvm.org/t/rfc-remove-most-constant-expressions/63179
2024-05-22 07:40:08 +02:00
Joseph Huber
fa9e90f5d2 [Reland][Libomptarget] Statically link all plugin runtimes (#87009)
This patch overhauls the `libomptarget` and plugin interface. Currently,
we define a C API and compile each plugin as a separate shared library.
Then, `libomptarget` loads these API functions and forwards its internal
calls to them. This was originally designed to allow multiple
implementations of a library to be live. However, since then no one has
used this functionality and it prevents us from using much nicer
interfaces. If the old behavior is desired it should instead be
implemented as a separate plugin.

This patch replaces the `PluginAdaptorTy` interface with the
`GenericPluginTy` that is used by the plugins. Each plugin exports a
`createPlugin_<name>` function that is used to get the specific
implementation. This code is now shared with `libomptarget`.

There are some notable improvements to this.
1. Massively improved lifetimes of life runtime objects
2. The plugins can use a C++ interface
3. Global state does not need to be duplicated for each plugin +
   libomptarget
4. Easier to use and add features and improve error handling
5. Less function call overhead / Improved LTO performance.

Additional changes in this plugin are related to contending with the
fact that state is now shared. Initialization and deinitialization is
now handled correctly and in phase with the underlying runtime, allowing
us to actually know when something is getting deallocated.

Depends on https://github.com/llvm/llvm-project/pull/86971
https://github.com/llvm/llvm-project/pull/86875
https://github.com/llvm/llvm-project/pull/86868
2024-05-09 09:38:22 -05:00
Joseph Huber
e5e66073c3 Revert "[Libomptarget] Statically link all plugin runtimes (#87009)"
Caused failures on build-bots, reverting to investigate.

This reverts commit 80f9e814ec896fdc57ee84afad8ac4cb1f8e4627.
2024-05-09 07:05:23 -05:00
Joseph Huber
80f9e814ec
[Libomptarget] Statically link all plugin runtimes (#87009)
This patch overhauls the `libomptarget` and plugin interface. Currently,
we define a C API and compile each plugin as a separate shared library.
Then, `libomptarget` loads these API functions and forwards its internal
calls to them. This was originally designed to allow multiple
implementations of a library to be live. However, since then no one has
used this functionality and it prevents us from using much nicer
interfaces. If the old behavior is desired it should instead be
implemented as a separate plugin.

This patch replaces the `PluginAdaptorTy` interface with the
`GenericPluginTy` that is used by the plugins. Each plugin exports a
`createPlugin_<name>` function that is used to get the specific
implementation. This code is now shared with `libomptarget`.

There are some notable improvements to this.
1. Massively improved lifetimes of life runtime objects
2. The plugins can use a C++ interface
3. Global state does not need to be duplicated for each plugin +
   libomptarget
4. Easier to use and add features and improve error handling
5. Less function call overhead / Improved LTO performance.

Additional changes in this plugin are related to contending with the
fact that state is now shared. Initialization and deinitialization is
now handled correctly and in phase with the underlying runtime, allowing
us to actually know when something is getting deallocated.

Depends on https://github.com/llvm/llvm-project/pull/86971
https://github.com/llvm/llvm-project/pull/86875
https://github.com/llvm/llvm-project/pull/86868
2024-05-09 06:35:54 -05:00
Joseph Huber
421085fd74
[Offload] Change unregister library to use atexit instead of destructor (#86830)
Summary:
The 'new driver' sets up the lifetime of a registered liftime using
global constructors and destructors. Currently, this is put at priority
1 which isn't strictly conformant as it will conflict with system
utilities. We now use 101 as this is the loweest suggested for
non-system constructors and will still run before user constructors.

Secondly, there were issues with the CUDA runtime when destructed with a
global destructor. Because the global ones are in any order and
potentially run before other things we were hitting an edge case where
the OpenMP runtime was uninitialized *after* `_dl_fini` was called. This
would result in us erroring when we call into a destroyed `libcuda.so`
instance. using `atexit` is what CUDA / HIP use and it prevents this
from happening. Most everything uses `atexit` except system utilities
and because of the constructor priority it will be unregistered *after*
everything else but not after `_fl_fini`.
2024-03-27 14:09:37 -05:00
Joseph Huber
597be90f8b
[Clang][NFC] Remove '--' separator in the linker wrapper usage (#84253)
Summary:
The very first version of the `clang-linker-wrapper` used `--` as a
separator for the host and device arguments. I moved away from this
towards a commandline parsing implementation years ago but never got
around to officially removing this.
2024-03-07 07:40:38 -06:00
Joseph Huber
5c84054223
[LinkerWrapper] Support relocatable linking for offloading (#80066)
Summary:
The standard GPU compilation process embeds each intermediate object
file into the host file at the `.llvm.offloading` section so it can be
linked later. We also use a special section called something like
`omp_offloading_entries` to store all the globals that need to be
registered by the runtime. The linker-wrapper's job is to link the
embedded device code stored at this section and then emit code to
register the linked image and the kernels and globals in the offloading
entry section.

One downside to RDC linking is that it can become quite big for very
large projects that wish to make use of static linking. This patch
changes the support for relocatable linking via `-r` to support a kind
of "partial" RDC compilation for offloading languages.

This primarily requires manually editing the embedded data in the
output object file for the relocatable link. We need to rename the
output section to make it distinct from the input sections that will be
merged. We then delete the old embedded object code so it won't be
linked further. We then need to rename the old offloading section so
that it is private to the module. A runtime solution could also be done
to defer entries that don't belong to the given GPU executable, but this
is easier. Note that this does not work with COFF linking, only the ELF
method for handling offloading entries, that could be made to work
similarly.

Given this support, the following compilation path should produce two
distinct images for OpenMP offloading.
```
$ clang foo.c -fopenmp --offload-arch=native -c
$ clang foo.c -lomptarget.devicertl --offload-link -r -o merged.o
$ clang main.c merged.o -fopenmp --offload-arch=native
$ ./a.out
```

Or similarly for HIP to effectively perform non-RDC mode compilation for
a subset of files.

```
$ clang -x hip foo.c --offload-arch=native --offload-new-driver -fgpu-rdc -c
$ clang -x hip foo.c -lomptarget.devicertl --offload-link -r -o merged.o
$ clang -x hip main.c merged.o --offload-arch=native --offload-new-driver -fgpu-rdc
$ ./a.out
```

One question is whether or not this should be the default behavior of
`-r` when run through the linker-wrapper or a special option. Standard
`-r` behavior is still possible if used without invoking the
linker-wrapper and it guaranteed to be correct.
2024-02-07 08:20:07 -06:00
Joseph Huber
a551703cb5
[Offload] Fix the offloading wrapper when merged multiple times. (#79231)
Summary:
The offloading wrapper is a object file that contains code necessary to
register offloading entries for the given runtime. Currently, we
expected only one of these to be present when we make the final
executable. However, in the case of redistributable linking with `-r` we
can end up with multiple of these being generated before finally
creating the executable.

This patch simply changes the defintiions of these globals to be
mergable. This allows multiples of these to participate in a single link
job. For ELF, we just make the dummy variable internal and used so it
sets up the section as expected. For COFF we make the entries weak_odr
so they merge to a single symbol
2024-01-24 13:50:35 -06:00
Joseph Huber
cb2f340850 [CUDA] Disable registering surfaces and textures with the new driver
Summary:
These runtime calls don't seem to be supported anymore, disable them for
now.
2024-01-18 10:56:33 -06:00
Fabian Mora
9fa9d9a7e1
[llvm][frontend][offloading] Move clang-linker-wrapper/OffloadWrapper.* to llvm/Frontend/Offloading (#78057)
This patch moves `clang/tools/clang-linker-wrapper/OffloadWrapper.*` to
`llvm/Frontend/Offloading` allowing them to be re-utilized by other
projects.

Additionally, it makes minor modifications to the API to make it more
flexible.
Concretely:
 - The `wrap*` methods now have additional arguments `EntryArray`, 
`Suffix` and `EmitSurfacesAndTextures` to specify some additional options.
- The `EntryArray` is now constructed by the caller. This change is needed to
enable JIT compilation, as ORC doesn't fully support `__start_` and `__stop_` 
symbols. Thus, to JIT the code, the `EntryArray` has to be constructed explicitly in the IR.
- The `Suffix` field is used when emitting the descriptor, registration
methods, etc, to make them more readable. It is empty by default.
- The `EmitSurfacesAndTextures` field controls whether to emit surface
and texture registration code, as those functions were removed from `CUDART`
in CUDA 12. It is true by default.
- The function `getOffloadingEntryInitializer` was added to help create
the `EntryArray`, as it returns the constant initializer and not a global
variable.
2024-01-15 16:30:07 -05:00
Joseph Huber
8f76f1816e [OpenMP][Obvious] Fix test failing on BE architectures
Summary:
This accidentally included a byte past the magic, which was out of order
on big endian architectures.
2024-01-07 08:38:50 -06:00
Joseph Huber
5121e2cffd
[OpenMP] Change __tgt_device_image to point to the image (#77003)
Summary:
We use the OffloadBinary to contain bundled offloading objects used to
support many images / targets at the same time. The `__tgt_device_info`
struct used to contain a pointer to this underlying binary format, which
contains information about the triple and architecture. We used to parse
this in the runtime to do image verification.

Recent changes removed the need for this to be used internally, as we
just parse it out of the ELF directly. This patch sets the pointers up
so they point to the ELF without requiring any further parsing.
2024-01-05 14:29:34 -06:00
Joseph Huber
bfd41c3f8c [LinkerWrapper][Obvious] Fix missing use of texture data type
Summary:
This was accidentally linked to the wrong pointer, causing unused
variable warnings and registering the wrong thing.
2023-12-07 16:55:14 -06:00
Joseph Huber
97f3be2c5a
[CUDA][HIP] Improve variable registration with the new driver (#73177)
Summary:
This patch adds support for registering texture / surface variables from
CUDA / HIP. Additionally, we now properly track the `extern` and `const`
flags that are also used in these runtime functions.

This does not implement the `managed` variables yet as those seem to
require some extra handling I'm not familiar with. The issue is that the
current offload entry isn't large enough to carry size and alignment
information along with an extra global.
2023-12-07 15:44:23 -06:00
Joseph Huber
52204a29ab
[Offload] Initial support for registering offloading entries on COFF targets (#72697)
Summary:
This patch provides the initial support to allow handling the new
driver's offloading entries. Normally, the ELF target can emit varibles
at C-identifier named sections and the linker will provide a pointer to
the section. For COFF target, instead the linker merges sections
containing a `$` in alphabetical order. We thus can emit these variables
at sections and then emit two variables that are guaranteed to be sorted
before and after the others to traverse it. Previous patches
consolidated the handling of offloading entries so that this patch more
easily can handle mapping them to the appropriate section.

Ideally, the only remaining step to allow the new driver to run on
Windows targets is to accurately map the following `ld.lld` arguments to
their `llvm-link` equivalents. These are used inside the linker-wrapper,
so we should simply need to remap the arguments to the same
functionality if possible.
```
-o, -output
-l, --library
-L, --library-path
-v, --version
-rpath
-whole-archive, -no-whole-archive
```

I have not tested this at runtime as I do not have access to a windows
machine.

This patch was adapted from some initial efforts in
https://reviews.llvm.org/D137470.
2023-11-21 06:48:34 -06:00
Joseph Huber
9c0e64999b
[Offloading][NFC] Refactor handling of offloading entries (#72544)
Summary:
This patch is a simple refactoring of code out of the linker wrapper
into a common location. The main motivation behind this change is to
make it easier to change the handling in the future to accept a triple
to be used to emit entries that function on that target.
2023-11-17 08:26:20 -06:00
Joseph Huber
278d6f065a [Clang] Clean up some offloading driver tests
Summary:
These tests used an external `dummy.o` and `dummy.bc` files instead of
just generating them. This should be cleaner and less error prone.
2023-01-12 07:54:55 -06:00
Joseph Huber
b9ee2acc9c [LinkerWrapper] report on missing libraries
The linker wrapper does its own library searching for static archives
that can contain device code. The device linking phases happen before
the host linking phases so that we can generate the necessary
registration code and link it in with the rest of the code. Previously,
If a library containing needed device code was not found the execution
would continue silently until it failed with undefined symbols. This
patch allows the linker wrapper to perform its own check beforehand to
catch these errors.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D137180
2022-11-02 15:28:09 -05:00
Fangrui Song
3a79f1caa9 [Driver][test] Restore %clang -cc1 in test/Driver
This partially reverts commit 1609a5d7715c06ff52c13af8b20ee64811a8ec7b (the
test/Driver part). We want to discourage %clang_cc1 and clang -cc1 in
test/Driver. The clang -cc1 uses in hlsl/offload/etc are not good examples.
2022-09-29 12:21:57 -07:00
Michał Górny
1609a5d771 [clang] [test] Use %clang_cc1 substitution consistently
Use the `%clang_cc1` substitution consistently across the test suite,
replacing inline `%clang -cc1` invocations, except for one Preprocessor
test where this is causing breakage.  This is necessary to ensure that
additional parameters passed via `%clang` do not interfere with `-cc1`
that must always be passed as the first command-line argument.

Remove the additional substitution blocking `%clang_cc1` use in Driver
tests.  It has been added in 2013 and was supposed to prevent tests
calling `clang -cc1` from being added to Driver.  The state of the test
suite proves that it did not succeed at all.

Differential Revision: https://reviews.llvm.org/D134880
2022-09-29 20:59:00 +02:00
Joseph Huber
080022d8ed [LinkerWrapper] Embed OffloadBinaries for OpenMP offloading images
The OpenMP offloading runtine currently uses an array of linked
offloading images. One downside to this is that we cannot know the
architecture or triple associated with the given image. In this patch,
instead of embedding the image itself, we embed an offloading binary
instead. This binary is simply a binary format that wraps around the
original linked image to provide some additional metadata. This will
allow us to support offloading to multiple architecture, or performing
future JIT compilation inside of the runtime, more clearly.
Additionally, these can be placed at a special section such that the
supported architectures can be identified using objdump with the support
from D126904. This needs to be stored in a new section name
`.llvm.offloading.images` because the `.llvm.offloading` section
implicitly uses the `SHF_EXCLUDE` flag and will always be stripped.

This patch does not contain the necessary code to parse these in
libomptarget.

Depends on D127246

Reviewed By: saiislam

Differential Revision: https://reviews.llvm.org/D127304
2022-07-21 13:20:01 -04:00
Joseph Huber
fe6a391357 [Clang] Fix tests failing due to invalid syntax for host triple
Summary:
We use the `--host-triple=` argument to manually set the target triple.
This was changed to include the `=` previously but was not included in
these additional test cases, causing it for fail on some unsupported
systems.
2022-07-11 21:31:56 -04:00
Joseph Huber
ce091eb3b9 [HIP] Add support for handling HIP in the linker wrapper
This patch adds the necessary changes required to bundle and wrap HIP
files. The bundling is done using `clang-offload-bundler` currently to
mimic `fatbinary` and the wrapping is done using very similar runtime
calls to CUDA. This still does not support managed / surface / texture
variables, that would require some additional information in the entry.

One difference in the codegeneration with AMD is that I don't check if
the handle is null before destructing it, I'm not sure if that's
required.

With this we should be able to support HIP with the new driver.

Depends on D128850

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D128914
2022-07-11 15:49:23 -04:00
Joseph Huber
d2ead9e324 [LinkerWrapper][NFC] Rework command line argument handling in the linker wrapper
Summary:
This patch reworks the command line argument handling in the linker
wrapper from using the LLVM `cl` interface to using the `Option`
interface with TableGen. This has several benefits compared to the old
method.

We use arguments from the linker arguments in the linker
wrapper, such as the libraries and input files, this allows us to
properly parse these. Additionally we can now easily set up aliases to
the linker wrapper arguments and pass them in the linker input directly.
That is, pass an option like `cuda-path=` as `--offload-arg=cuda-path=`
in the linker's inputs. This will allow us to handle offloading
compilation in the linker itself some day. Finally, this is also a much
cleaner interface for passing arguments to the individual device linking
jobs.
2022-07-08 11:18:38 -04:00
Joseph Huber
34fc1db9a8 [LinkerWrapper] Change wrapping to include jumps for other variables
Summary:
We don't currently support other variable types, like managed or
surface. This patch simply adds code that checks the flags and does
nothing. This prevents us from registering a surface as a variable as we
do now. In the future, registering these will require adding the flags
to the entry struct.
2022-06-29 14:48:39 -04:00
Nikita Popov
41d5033eb1 [IR] Enable opaque pointers by default
This enabled opaque pointers by default in LLVM. The effect of this
is twofold:

* If IR that contains *neither* explicit ptr nor %T* types is passed
  to tools, we will now use opaque pointer mode, unless
  -opaque-pointers=0 has been explicitly passed.
* Users of LLVM as a library will now default to opaque pointers.
  It is possible to opt-out by calling setOpaquePointers(false) on
  LLVMContext.

A cmake option to toggle this default will not be provided. Frontends
or other tools that want to (temporarily) keep using typed pointers
should disable opaque pointers via LLVMContext.

Differential Revision: https://reviews.llvm.org/D126689
2022-06-02 09:40:56 +02:00
Joseph Huber
26eb04268f [Clang] Introduce clang-offload-packager tool to bundle device files
In order to do offloading compilation we need to embed files into the
host and create fatbainaries. Clang uses a special binary format to
bundle several files along with their metadata into a single binary
image. This is currently performed using the `-fembed-offload-binary`
option. However this is not very extensibile since it requires changing
the command flag every time we want to add something and makes optional
arguments difficult. This patch introduces a new tool called
`clang-offload-packager` that behaves similarly to CUDA's `fatbinary`.
This tool takes several input files with metadata and embeds it into a
single image that can then be embedded in the host.

Reviewed By: tra

Differential Revision: https://reviews.llvm.org/D125165
2022-05-11 09:39:13 -04:00
Joseph Huber
f49d576a88 [CUDA] Add wrapper code generation for registering CUDA images
This patch adds the necessary code generation to create the wrapper code
that registers all the globals in CUDA. We create the necessary
functions and iterate through the list of
`__start_cuda_offloading_entries` to find which globals must be
registered. This is very similar to the code generation done currently
in Clang for non-rdc builds, but here we are registering a fully linked
fatbinary and finding the globals via the above sections.

With this we should be able to fully support basic RDC / LTO building of CUDA
code.

It's also worth noting that this does not include the necessary PTX to JIT the
image, so to use this support the offloading architecture must match the
system's architecture.

Depends on D123810

Reviewed By: tra

Differential Revision: https://reviews.llvm.org/D123812
2022-05-11 07:30:25 -04:00
Joseph Huber
ee74abaad7 [OpenMP] Add triple to the linker wrapper job
Summary:
I forgot to add the triple to the linker wrapper job, so we were still
generating code for the unintended platforms.
2022-04-20 08:23:43 -04:00
Joseph Huber
1dfe0273fd [OpenMP] Add explicit triple to linker wrapper test
Summary:
Some platforms like Mach-O require different handling of section names.
This is not supported on Mac-OS or Windows yet so we shouldn't be
testing the compilation there. Add an explicit triple to the tests.
2022-04-20 07:24:51 -04:00
Joseph Huber
8c64928887 [OpenMP] Add necessary registered targets for linker wrapper test
Summary:
The linker wrapper needs to use the registered backend to perform LTO.
This was causing problems on the buildbots that didn't support it.
2022-04-19 18:48:58 -04:00
Joseph Huber
260c5df2d5 [OpenMP] Add better testing for the linker wrapper
The linker wrapper is used to perform linking and wrapping of embedded
device object files. Currently its internals are not able to be tested
easily. This patch adds the `--dry-run` and `--print-wrapped-module`
options to investigate the link jobs that will be run along with the
wrapped code that will be created to register the binaries.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D124039
2022-04-19 18:37:09 -04:00