The new driver primarily allows us to support RDC-mode compilations with
proper linking. This is not needed for non-RDC mode compilation, but we
still would like the new driver to be able to handle this mode so we can
transition away from the old driver in the future. This patch adds the
necessary code to support creating a fatbinary for CUDA code generation
as well as removing old assumptions and errors about RDC-mode with the
new driver.
Reviewed By: tra
Differential Revision: https://reviews.llvm.org/D129655
Previously, when using the new driver we created a fatbinary with the
PTX and Cubin output. This was mainly done in an attempt to create some
backwards compatibility with the existing CUDA support that embeds the
fatbinary in each TU. This will most likely be more work than necessary
to actually implement. The linker wrapper cannot do anything with these
embedded PTX files because we do not know how to link them, and if we
did want to include multiple files it should go through the
`clang-offload-packager` instead. Also this didn't repsect the setting
that disables embedding PTX (although it wasn't used anyway).
Reviewed By: tra
Differential Revision: https://reviews.llvm.org/D128441
The new driver uses an augmented linker wrapper to perform the device
linking phase, but to the user looks like a regular linker invocation.
Contrary to the old driver, the new driver contains all the information
necessary to produce a linked device image in the host object itself.
Currently, we infer the usage of the device linker by the user
specifying an offloading toolchain, e.g. (--offload-arch=...) or
(-fopenmp-targets=...), but this shouldn't be strictly necessary.
This patch introduces a new option `--offload-link` to tell
the driver to use the offloading linker instead. So a compilation flow
can now look like this,
```
clang foo.cu --offload-new-driver -fgpu-rdc --offload-arch=sm_70 -c
clang foo.o --offload-link -lcudart
```
I was considering if this could be merged into the `-fuse-ld` option,
but because the device linker wraps over the users linker it would
conflict with that. In the future it's possible to merge this into `lld`
completely or `gold` via a plugin and we would use this option to
enable the device linking feature. Let me know what you think for this.
Reviewed By: tra
Differential Revision: https://reviews.llvm.org/D126398
Summary:
Normally we parse through the CUDA installation to disover the needed
features. However, we may want to build libraries on targets that do not
currently have CUDA installed but still need to know which features to
make use of when creating the PTX or bitcode. This flag is a simple way
to specify this so we can compile certain codes withotu a valid CUDA
installation.
Ideally this could be done via an -Xarch or simimlar flag but currently
they cannot handle this. We would need to support using an -Xarch flag
that takes multiple arguments that then pass them to the -Xclang
functionality.
In order to do offloading compilation we need to embed files into the
host and create fatbainaries. Clang uses a special binary format to
bundle several files along with their metadata into a single binary
image. This is currently performed using the `-fembed-offload-binary`
option. However this is not very extensibile since it requires changing
the command flag every time we want to add something and makes optional
arguments difficult. This patch introduces a new tool called
`clang-offload-packager` that behaves similarly to CUDA's `fatbinary`.
This tool takes several input files with metadata and embeds it into a
single image that can then be embedded in the host.
Reviewed By: tra
Differential Revision: https://reviews.llvm.org/D125165
OpenMP recently moved to the new offloading driver, this had the effect
of making it more difficult to inspect intermediate code for the device.
This patch adds `-foffload-host-only` and `-foffload-device-only` to
control which sides get compiled. This will allow users to more easily
inspect output without needing the temp files.
Reviewed By: tra
Differential Revision: https://reviews.llvm.org/D124220
This patch adds the basic support for the clang driver to compile and link CUDA
using the new offloading driver. This requires handling the CUDA offloading kind
and embedding the generated files into the host. This will allow us to link
OpenMP code with CUDA code in the linker wrapper. More support will be required
to create functional CUDA / HIP binaries using this method.
Depends on D120270 D120271 D120934
Reviewed By: tra
Differential Revision: https://reviews.llvm.org/D120272