llvm-project

Commit Graph

Author	SHA1	Message	Date
Joseph Huber	5966c2ec02	[OpenMP] Fix mismatched device runtime name Summary: The new runtime was deleted. AMD's old runtime used the triple name `amdgcn` while the new runtime used `amdgpu`. This was not updated when the old runtime was removed causing the library to not be found on AMDGPU.	2022-02-04 16:54:31 -05:00
Joseph Huber	034adaf5be	[OpenMP] Completely remove old device runtime This patch completely removes the old OpenMP device runtime. Previously, the old runtime had the prefix `libomptarget-new-` and the old runtime was simply called `libomptarget-`. This patch makes the formerly new runtime the only runtime available. The entire project has been deleted, and all references to the `libomptarget-new` runtime has been replaced with `libomptarget-`. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D118934	2022-02-04 15:31:33 -05:00
Joseph Huber	3762111aa9	[OpenMP] Link the bitcode library late for device LTO Summary: This patch adds support for linking the OpenMP device bitcode library late when doing LTO. This simply passes it in as an additional device file when doing the final device linking phase with LTO. This has the advantage that we don't link it multiple times, and the device references do not get inlined and prevent us from doing needed OpenMP optimizations when we have visiblity of the whole module. Fix some failings where the implicit conversion of an Error to an Expected triggered the deleted copy constructor. Depends on D116675 Differential revision: https://reviews.llvm.org/D117048	2022-01-31 23:11:41 -05:00
Saiyedul Islam	32357266fd	[Clang][NFC] Fix multiline comment prefixes in function headers Cleanup of D105191 after latest clang-format changes. Reviewed By: MyDeveloperDay Differential Revision: https://reviews.llvm.org/D111545	2022-01-04 11:51:31 +00:00
Joseph Huber	c99407e31c	[OpenMP] Make the new device runtime the default This patch changes the `-fopenmp-target-new-runtime` option which controls if the new or old device runtime is used to be true by default. Disabling this to use the old runtime now requires using `-fno-openmp-target-new-runtime`. Reviewed By: JonChesterfield, tianshilei1992, gregrodgers, ronlieb Differential Revision: https://reviews.llvm.org/D114890	2021-12-02 11:11:45 -05:00
Carlos Galvez	7ecec3f0f5	[CUDA] Bump supported CUDA version to 11.5 Differential Revision: https://reviews.llvm.org/D113249	2021-11-09 08:20:53 +00:00
Kazu Hirata	7cc8fa2dd2	Use llvm::is_contained (NFC)	2021-10-24 09:32:57 -07:00
Saiyedul Islam	f56548829c	[Clang][clang-nvlink-wrapper] Pass nvlink path to the wrapper Added support of a "--nvlink-path" option in clang-nvlink-wrapper which takes the path of nvlink binary. Static Device Library support for OpenMP (D105191) now searches for nvlink binary and passes its location via this option. In absence of this option, nvlink binary is searched in locations in PATH. Differential Revision: https://reviews.llvm.org/D111488	2021-10-12 16:15:52 +00:00
Saiyedul Islam	35ebe4cc24	[Clang][OpenMP] Add partial support for Static Device Libraries An archive containing device code object files can be passed to clang command line for linking. For each given offload target it creates a device specific archives which is either passed to llvm-link if the target is amdgpu, or to clang-nvlink-wrapper if the target is nvptx. -L/-l flags are used to specify these fat archives on the command line. E.g. clang++ -fopenmp -fopenmp-targets=nvptx64 main.cpp -L. -lmylib It currently doesn't support linking an archive directly, like: clang++ -fopenmp -fopenmp-targets=nvptx64 main.cpp libmylib.a Linking with x86 offload also does not work. Reviewed By: ye-luo Differential Revision: https://reviews.llvm.org/D105191	2021-10-08 09:37:51 +00:00
Saiyedul Islam	94e2b0258a	Revert "[Clang][OpenMP] Add partial support for Static Device Libraries" This reverts commit `4c41170895`.	2021-10-07 14:13:24 +00:00
Saiyedul Islam	4c41170895	[Clang][OpenMP] Add partial support for Static Device Libraries An archive containing device code object files can be passed to clang command line for linking. For each given offload target it creates a device specific archives which is either passed to llvm-link if the target is amdgpu, or to clang-nvlink-wrapper if the target is nvptx. -L/-l flags are used to specify these fat archives on the command line. E.g. clang++ -fopenmp -fopenmp-targets=nvptx64 main.cpp -L. -lmylib It currently doesn't support linking an archive directly, like: clang++ -fopenmp -fopenmp-targets=nvptx64 main.cpp libmylib.a Linking with x86 offload also does not work. Reviewed By: ye-luo Differential Revision: https://reviews.llvm.org/D105191	2021-10-07 04:45:19 +00:00
Artem Belevich	fd582eeffe	[CUDA] Move CUDA SDK include path further down the include search path. This allows clang to work on Linux distributions like Debian where <CUDA-PATH>/include may be a symlink to /usr/include. We only need `cuda_wrappers` to be present before the standard C++ library headers. The CUDA SDK headers themselves do not need to be found that early. This addresses https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=995122 mentioned in post-commit comments on D108247 Differential Revision: https://reviews.llvm.org/D110596	2021-09-28 11:29:28 -07:00
Artem Belevich	3db8e486e5	[CUDA] Improve CUDA version detection and diagnostics. Always use cuda.h to detect CUDA version. It's a more universal approach compared to version.txt which is no longer present in recent CUDA versions. Split the 'unknown CUDA version' warning in two: * when detected CUDA version is partially supported by clang. It's expected to work in general, at the feature parity with the latest supported CUDA version. and may be missing support for the new features/instructions/GPU variants. Clang will issue a warning. * when detected version is new. Recent CUDA versions have been working with clang reasonably well, and will likely to work similarly to the partially supported ones above. Or it may not work at all. Clang will issue a warning and proceed as if the latest known CUDA version was detected. Differential Revision: https://reviews.llvm.org/D108247	2021-08-23 13:24:48 -07:00
Artem Belevich	49d982d8cb	[CUDA] Add support for CUDA-11.4 Differential Revision: https://reviews.llvm.org/D108239	2021-08-23 13:24:46 -07:00
Artem Belevich	6a9cf21f5a	[CUDA, MemCpyOpt] Add a flag to force-enable memcpyopt and use it for CUDA. Attempt to enable MemCpyOpt unconditionally in D104801 uncovered the fact that there are users that do not expect LLVM to materialize `memset` intrinsic. While other passes can do that, too, MemCpyOpt triggers it more frequently and breaks sanitizers and some downstream users. For now introduce a flag to force-enable the flag and opt-in only CUDA compilation with NVPTX back-end. Differential Revision: https://reviews.llvm.org/D106401	2021-08-06 11:13:52 -07:00
Aaron Ballman	530ea28fef	Correct a lot of diagnostic wordings for the driver Clang diagnostics should not start with a capital letter or use trailing punctuation (https://clang.llvm.org/docs/InternalsManual.html#the-format-string), but quite a few driver diagnostics were not following this advice. This corrects the grammar and punctuation to improve consistency, but does not change the circumstances under which the diagnostics are produced.	2021-08-05 07:04:55 -04:00
Jan Svoboda	60426f33b1	[clang][driver] NFC: Move InputInfo.h from lib to include Moving `InputInfo.h` from `lib/Driver/` into `include/Driver` to be able to expose it in an API consumed from outside of `clangDriver`. Reviewed By: dexonsmith Differential Revision: https://reviews.llvm.org/D106787	2021-07-27 09:17:39 +02:00
Joseph Huber	d297211692	[OpenMP] Add a driver flag to enable the new device runtime library This patch adds a driver flag `-fopenmp-target-new-runtime` to optionally enable the new device runtime bitcode library. This allows users to enable the new experimental runtime before it becomes the default in the future. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D106793	2021-07-26 16:35:56 -04:00
Shilei Tian	53d474abc9	[Clang][OpenMP][NVPTX] Fixed failure in openmp-offload-gpu.c if the system has CUDA https://lists.llvm.org/pipermail/openmp-dev/2021-March/003940.html reports test failure in `openmp-offload-gpu.c`. The failure is, when using `-S` in the clang driver, it still reports bitcode library doesn't exist. However, it is not exposed in my local run and Phabiractor test. The reason it escaped from Phabricator test is, the test machine doesn't have CUDA, so `LibDeviceFile` is empty. In this case, the check of `OPT_S` will be hit, and we get "expected" result. However, if the test machine has CUDA, `LibDeviceFile` will not be empty, then the check will not be done, and it just proceeds, trying to add the bitcode library. The reason it escaped from my local run is, I didn't build ALL targets, so this case was marked UNSUPPORTED. Reviewed By: kkwli0 Differential Revision: https://reviews.llvm.org/D98902	2021-04-13 13:22:49 -04:00
Yaxun (Sam) Liu	907af84396	[CUDA][HIP] rename -fcuda-flush-denormals-to-zero Rename it to -fgpu-flush-denormals-to-zero. Reviewed by: Artem Belevich Differential Revision: https://reviews.llvm.org/D99688	2021-04-05 00:13:51 -04:00
Shilei Tian	c41ae246ac	[OpenMP][Clang][NVPTX] Only build one bitcode library for each SM In D97003, CUDA 9.2 is the minimum requirement for OpenMP offloading on NVPTX target. We don't need to have macros in source code to select right functions based on CUDA version. we don't need to compile multiple bitcode libraries of different CUDA versions for each SM. We don't need to worry about future compatibility with newer CUDA version. `-target-feature +ptx61` is used in this patch, which corresponds to the highest PTX version that CUDA 9.2 can support. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D97198	2021-03-08 12:03:04 -05:00
Shilei Tian	76151acf89	[Clang][OpenMP] Require CUDA 9.2+ for OpenMP offloading on NVPTX target In current implementation of `deviceRTLs`, we're using some functions that are CUDA version dependent (if CUDA_VERSION < 9, it is one; otheriwse, it is another one). As a result, we have to compile one bitcode library for each CUDA version supported. A worse problem is forward compatibility. If a new CUDA version is released, we have to update CMake file as well. CUDA 9.2 has been released for three years. Instead of using various weird tricks to make `deviceRTLs` work with different CUDA versions and still have forward compatibility, we can simply drop support for CUDA 9.1 or lower version. It has at least two benifits: - We don't need to generate bitcode libraries for each CUDA version; - Clang driver doesn't need to search for the bitcode lib based on CUDA version. We can claim that starting from LLVM 12, OpenMP offloading on NVPTX target requires CUDA 9.2+. Reviewed By: jdoerfert, JonChesterfield Differential Revision: https://reviews.llvm.org/D97003	2021-02-22 11:00:33 -05:00
Pushpinder Singh	79401b43ce	[OpenMP][AMDGPU] Add support for linking libomptarget bitcode This patch uses the existing logic of CUDA for searching libomptarget and extracts it to a common method. Reviewed By: JonChesterfield, tianshilei1992 Differential Revision: https://reviews.llvm.org/D96248	2021-02-12 00:42:41 -05:00
Artem Belevich	2aa01ccec3	[CUDA, NVPTX] Allow targeting sm_86 GPUs. The patch only plumbs through the option necessary for targeting sm_86 GPUs w/o adding any new functionality. Differential Revision: https://reviews.llvm.org/D95974	2021-02-09 11:01:10 -08:00
Yaxun (Sam) Liu	1dab94f9ed	[CUDA][HIP] Pass -fgpu-rdc to host clang -cc1 Currently -fgpu-rdc is not passed to host clang -cc1. This causes issue because -fgpu-rdc affects shadow variable linkage in host compilation. Reviewed by: Artem Belevich Differential Revision: https://reviews.llvm.org/D96105	2021-02-08 19:08:20 -05:00
Shilei Tian	7c03f7d7d0	[OpenMP][deviceRTLs] Build the deviceRTLs with OpenMP instead of target dependent language From this patch (plus some landed patches), `deviceRTLs` is taken as a regular OpenMP program with just `declare target` regions. In this way, ideally, `deviceRTLs` can be written in OpenMP directly. No CUDA, no HIP anymore. (Well, AMD is still working on getting it work. For now AMDGCN still uses original way to compile) However, some target specific functions are still required, but they're no longer written in target specific language. For example, CUDA parts have all refined by replacing CUDA intrinsic and builtins with LLVM/Clang/NVVM intrinsics. Here're a list of changes in this patch. 1. For NVPTX, `DEVICE` is defined empty in order to make the common parts still work with AMDGCN. Later once AMDGCN is also available, we will completely remove `DEVICE` or probably some other macros. 2. Shared variable is implemented with OpenMP allocator, which is defined in `allocator.h`. Again, this feature is not available on AMDGCN, so two macros are redefined properly. 3. CUDA header `cuda.h` is dropped in the source code. In order to deal with code difference in various CUDA versions, we build one bitcode library for each supported CUDA version. For each CUDA version, the highest PTX version it supports will be used, just as what we currently use for CUDA compilation. 4. Correspondingly, compiler driver is also updated to support CUDA version encoded in the name of bitcode library. Now the bitcode library for NVPTX is named as `libomptarget-nvptx-cuda_[cuda_version]-sm_[sm_number].bc`, such as `libomptarget-nvptx-cuda_80-sm_20.bc`. With this change, there are also multiple features to be expected in the near future: 1. CUDA will be completely dropped when compiling OpenMP. By the time, we also build bitcode libraries for all supported SM, multiplied by all supported CUDA version. 2. Atomic operations used in `deviceRTLs` can be replaced by `omp atomic` if OpenMP 5.1 feature is fully supported. For now, the IR generated is totally wrong. 3. Target specific parts will be wrapped into `declare variant` with `isa` selector if it can work properly. No target specific macro is needed anymore. 4. (Maybe more...) Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D94745	2021-01-26 12:28:47 -05:00
Shilei Tian	5ad038aafa	[Clang][OpenMP][NVPTX] Replace `libomptarget-nvptx-path` with `libomptarget-nvptx-bc-path` D94700 removed the static library so we no longer need to pass `-llibomptarget-nvptx` to `nvlink`. Since the bitcode library is the only device runtime for now, instead of emitting a warning when it is not found, an error should be raised. We also set a new option `libomptarget-nvptx-bc-path` to let user choose which bitcode library is being used. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D95161	2021-01-23 14:42:38 -05:00
Joseph Huber	1ca5e68aa0	[NVPTX] Fix debugging information being added to NVPTX target if remarks are enabled Summary: Optimized debugging is not supported by ptxas. Debugging information is degraded to line information only if optimizations are enabled, but debugging information would be added back in by the driver if remarks were enabled. This solves https://bugs.llvm.org/show_bug.cgi?id=48153. Reviewers: jdoerfert tra jholewinski serge-sans-paille Differential Revision: https://reviews.llvm.org/D94123	2021-01-06 13:43:22 -05:00
Artem Belevich	e7fe125b77	[CUDA] Extract CUDA version from cuda.h if version.txt is not found If CUDA version can not be determined based on version.txt file, attempt to find CUDA_VERSION macro in cuda.h. This is a follow-up to D89752, Differntial Revision: https://reviews.llvm.org/D89832	2020-10-23 10:03:30 -07:00
Artem Belevich	65d206484c	[CUDA] Improve clang's ability to detect recent CUDA versions. CUDA-11.1 does not carry version.txt which causes clang to assume that it's CUDA-7.0, which used to be the only CUDA version w/o version.txt. In order to tell CUDA-7.0 apart from the new versions, clang now probes for the presence of libdevice.10.bc which is not present in the old CUDA versions. This should keep Clang working for CUDA-11.1. PR47332: https://bugs.llvm.org/show_bug.cgi?id=47332 Differential Revision: https://reviews.llvm.org/D89752	2020-10-23 10:03:29 -07:00
Serge Pavlov	70bf35070a	[Driver] Add output file to properties of Command Object of class `Command` contains various properties of a command to execute, but output file was missed from them. This change adds this property. It is required for reporting consumed time and memory implemented in D78903 and may be used in other cases too. Differential Revision: https://reviews.llvm.org/D78902	2020-10-08 18:23:39 +07:00
Reid Kleckner	3453b6928d	Revert "Recommit "[CUDA][HIP] Defer overloading resolution diagnostics for host device functions"" This reverts commit `e39da8ab6a`. This depends on a change that needs additional design review and needs to be reverted.	2020-09-24 11:16:54 -07:00
Yaxun (Sam) Liu	e39da8ab6a	Recommit "[CUDA][HIP] Defer overloading resolution diagnostics for host device functions" This recommits `7f1f89ec8d` and `40df06cdaf` after fixing memory sanitizer failure.	2020-09-24 08:44:37 -04:00
Yaxun (Sam) Liu	772bd8a7d9	Revert "[CUDA][HIP] Defer overloading resolution diagnostics for host device functions" This reverts commit `7f1f89ec8d`. This reverts commit `40df06cdaf`.	2020-09-17 13:55:31 -04:00
Yaxun (Sam) Liu	40df06cdaf	[CUDA][HIP] Defer overloading resolution diagnostics for host device functions In CUDA/HIP a function may become implicit host device function by pragma or constexpr. A host device function is checked in both host and device compilation. However it may be emitted only on host or device side, therefore the diagnostics should be deferred until it is known to be emitted. Currently clang is only able to defer certain diagnostics. This causes false alarms and limits the usefulness of host device functions. This patch lets clang defer all overloading resolution diagnostics for host device functions. An option -fgpu-defer-diag is added to control this behavior. By default it is off. It is NFC for other languages. Differential Revision: https://reviews.llvm.org/D84364	2020-09-17 11:30:42 -04:00
Benjamin Kramer	f3f1ce4fa9	[Driver] Promote SmallSet of enum to a bitset. NFCI.	2020-07-20 16:54:30 +02:00
James Y Knight	4772b99dff	Clang Driver: refactor support for writing response files to be specified at Command creation, rather than as part of the Tool. This resolves the hack I just added to allow Darwin toolchain to vary its level of support based on `-mlinker-version=`. The change preserves the _current_ settings for response-file support. Some tools look likely to be declaring that they don't support response files in error, however I kept them as-is in order for this change to be a simple refactoring. Differential Revision: https://reviews.llvm.org/D82782	2020-06-29 18:27:02 -04:00
Artem Belevich	d700237f1a	[CUDA,HIP] Use VFS for SDK detection. It's useful for using clang from tools that may need need to provide SDK files from non-standard locations. Clang CLI only provides a way to specify VFS for include files, so there's no good way to test this yet. Differential Revision: https://reviews.llvm.org/D81771	2020-06-15 12:54:44 -07:00
Yaxun (Sam) Liu	8422bc9efc	recommit "[HIP] Add default header and include path" recommit `11d06b9511` with fix for lit tests.	2020-06-06 14:21:22 -04:00
Nico Weber	2920348063	Revert "recommit "[HIP] Add default header and include path"" This reverts commit `1fa43e0b34`. Still breaks tests on several bots, see https://reviews.llvm.org/D81176	2020-06-05 21:50:04 -04:00
Yaxun (Sam) Liu	1fa43e0b34	recommit "[HIP] Add default header and include path" recommit `11d06b9511` with fix for lit tests.	2020-06-05 20:41:15 -04:00
Yaxun (Sam) Liu	8a8c6913a9	Revert "[HIP] Add default header and include path" This reverts commit `11d06b9511`.	2020-06-05 15:42:57 -04:00
Yaxun (Sam) Liu	11d06b9511	[HIP] Add default header and include path To support std::complex and some other standard C/C++ functions in HIP device code, they need to be forced to be __host__ __device__ functions by pragmas. This is done by some clang standard C++ wrapper headers which are shared between cuda-clang and hip-Clang. For these standard C++ wapper headers to work properly, specific include path order has to be enforced: clang C++ wrapper include path standard C++ include path clang include path Also, these C++ wrapper headers require device version of some standard C/C++ functions must be declared before including them. This needs to be done by including a default header which declares or defines these device functions. The default header is always included before any other headers are included by users. This patch adds the the default header and include path for HIP. Differential Revision: https://reviews.llvm.org/D81176	2020-06-05 12:44:57 -04:00
Matt Arsenault	dc89a3efb4	HIP: Fix handling of denormal mode I didn't realize HIP was a distinct offloading kind, so the subtarget was looking for -march, which isn't correct for HIP. We also have the possibility of different denormal defaults in the case of multiple offload targets, so we need to thread the JobAction through the target hook.	2020-04-13 11:48:45 -07:00
Artem Belevich	a9627b7ea7	[CUDA] Add partial support for recent CUDA versions. Generate PTX using newer versions of PTX and allow using sm_80 with CUDA-11. None of the new features of CUDA-10.2+ have been implemented yet, so using these versions will still produce a warning. Differential Revision: https://reviews.llvm.org/D77670	2020-04-08 11:19:44 -07:00
Artem Belevich	33386b20aa	[CUDA] Simplify GPU variant handling. NFC. Instead of hardcoding individual GPU mappings in multiple functions, keep them all in one table and use it to look up the mappings. We also don't care about 'virtual' architecture much, so the API is trimmed down down to a simpler GPU->Virtual arch name lookup. Differential Revision: https://reviews.llvm.org/D77665	2020-04-08 11:19:43 -07:00
Yaxun (Sam) Liu	2ae25647d1	[CUDA][HIP] Add -Xarch_device and -Xarch_host options The argument after -Xarch_device will be added to the arguments for CUDA/HIP device compilation and will be removed for host compilation. The argument after -Xarch_host will be added to the arguments for CUDA/HIP host compilation and will be removed for device compilation. Differential Revision: https://reviews.llvm.org/D76520	2020-03-24 10:13:05 -04:00
Yaxun (Sam) Liu	78957bab55	[NFC] Refactor handling of Xarch option Extract common code to a function. To prepare for adding an option for CUDA/HIP host and device only option. Differential Revision: https://reviews.llvm.org/D76455	2020-03-22 14:42:09 -04:00
Artem Belevich	eb2ba2ea95	[CUDA] Warn about unsupported CUDA SDK version only if it's used. This fixes an issue with clang issuing a warning about unknown CUDA SDK if it's detected during non-CUDA compilation. Differential Revision: https://reviews.llvm.org/D76030	2020-03-12 10:04:10 -07:00
Reid Kleckner	213aea4c58	Remove unused Endian.h includes, NFC Mainly avoids including Host.h everywhere: $ diff -u <(sort thedeps-before.txt) <(sort thedeps-after.txt) \ \| grep '^[-+] ' \| sort \| uniq -c \| sort -nr 3141 - /usr/local/google/home/rnk/llvm-project/llvm/include/llvm/Support/Host.h	2020-03-11 15:45:34 -07:00

1 2 3

115 Commits