llvm-project

Commit Graph

Author	SHA1	Message	Date
Joseph Huber	b370be37cc	[CUDA] Allow the new driver to compile CUDA in non-RDC mode The new driver primarily allows us to support RDC-mode compilations with proper linking. This is not needed for non-RDC mode compilation, but we still would like the new driver to be able to handle this mode so we can transition away from the old driver in the future. This patch adds the necessary code to support creating a fatbinary for CUDA code generation as well as removing old assumptions and errors about RDC-mode with the new driver. Reviewed By: tra Differential Revision: https://reviews.llvm.org/D129655	2022-07-13 21:49:15 -04:00
Joseph Huber	4d3c010f1d	[CUDA] Do not embed a fatbinary when using the new driver Previously, when using the new driver we created a fatbinary with the PTX and Cubin output. This was mainly done in an attempt to create some backwards compatibility with the existing CUDA support that embeds the fatbinary in each TU. This will most likely be more work than necessary to actually implement. The linker wrapper cannot do anything with these embedded PTX files because we do not know how to link them, and if we did want to include multiple files it should go through the `clang-offload-packager` instead. Also this didn't repsect the setting that disables embedding PTX (although it wasn't used anyway). Reviewed By: tra Differential Revision: https://reviews.llvm.org/D128441	2022-06-23 15:40:43 -04:00
Joseph Huber	b7c8c4d8cf	[Clang] Introduce `--offload-link` option to perform offload device linking The new driver uses an augmented linker wrapper to perform the device linking phase, but to the user looks like a regular linker invocation. Contrary to the old driver, the new driver contains all the information necessary to produce a linked device image in the host object itself. Currently, we infer the usage of the device linker by the user specifying an offloading toolchain, e.g. (--offload-arch=...) or (-fopenmp-targets=...), but this shouldn't be strictly necessary. This patch introduces a new option `--offload-link` to tell the driver to use the offloading linker instead. So a compilation flow can now look like this, ``` clang foo.cu --offload-new-driver -fgpu-rdc --offload-arch=sm_70 -c clang foo.o --offload-link -lcudart ``` I was considering if this could be merged into the `-fuse-ld` option, but because the device linker wraps over the users linker it would conflict with that. In the future it's possible to merge this into `lld` completely or `gold` via a plugin and we would use this option to enable the device linking feature. Let me know what you think for this. Reviewed By: tra Differential Revision: https://reviews.llvm.org/D126398	2022-05-25 16:30:53 -04:00
Joseph Huber	7dc23abbd3	[CUDA] Add a flag to manually specify the target feature to use with CUDA Summary: Normally we parse through the CUDA installation to disover the needed features. However, we may want to build libraries on targets that do not currently have CUDA installed but still need to know which features to make use of when creating the PTX or bitcode. This flag is a simple way to specify this so we can compile certain codes withotu a valid CUDA installation. Ideally this could be done via an -Xarch or simimlar flag but currently they cannot handle this. We would need to support using an -Xarch flag that takes multiple arguments that then pass them to the -Xclang functionality.	2022-05-13 16:30:58 -04:00
Joseph Huber	26eb04268f	[Clang] Introduce clang-offload-packager tool to bundle device files In order to do offloading compilation we need to embed files into the host and create fatbainaries. Clang uses a special binary format to bundle several files along with their metadata into a single binary image. This is currently performed using the `-fembed-offload-binary` option. However this is not very extensibile since it requires changing the command flag every time we want to add something and makes optional arguments difficult. This patch introduces a new tool called `clang-offload-packager` that behaves similarly to CUDA's `fatbinary`. This tool takes several input files with metadata and embeds it into a single image that can then be embedded in the host. Reviewed By: tra Differential Revision: https://reviews.llvm.org/D125165	2022-05-11 09:39:13 -04:00
Joseph Huber	47d6625570	[OpenMP] Add options to only compile the host or device when offloading OpenMP recently moved to the new offloading driver, this had the effect of making it more difficult to inspect intermediate code for the device. This patch adds `-foffload-host-only` and `-foffload-device-only` to control which sides get compiled. This will allow users to more easily inspect output without needing the temp files. Reviewed By: tra Differential Revision: https://reviews.llvm.org/D124220	2022-04-29 11:22:21 -04:00
Joseph Huber	c5e5b54350	[CUDA] Add driver support for compiling CUDA with the new driver This patch adds the basic support for the clang driver to compile and link CUDA using the new offloading driver. This requires handling the CUDA offloading kind and embedding the generated files into the host. This will allow us to link OpenMP code with CUDA code in the linker wrapper. More support will be required to create functional CUDA / HIP binaries using this method. Depends on D120270 D120271 D120934 Reviewed By: tra Differential Revision: https://reviews.llvm.org/D120272	2022-04-29 09:14:44 -04:00

7 Commits