llvm-project

Commit Graph

Author	SHA1	Message	Date
Matt Arsenault	a3c814d234	Separately track input and output denormal mode AMDGPU and x86 at least both have separate controls for whether denormal results are flushed on output, and for whether denormals are implicitly treated as 0 as an input. The current DAGCombiner use only really cares about the input treatment of denormals.	2020-02-04 12:59:21 -05:00
Benjamin Kramer	adcd026838	Make llvm::StringRef to std::string conversions explicit. This is how it should've been and brings it more in line with std::string_view. There should be no functional change here. This is mostly mechanical from a custom clang-tidy check, with a lot of manual fixups. It uncovers a lot of minor inefficiencies. This doesn't actually modify StringRef yet, I'll do that in a follow-up.	2020-01-28 23:25:25 +01:00
Artem Belevich	12fefeef20	[CUDA] Assume the latest known CUDA version if we've found an unknown one. This makes clang somewhat forward-compatible with new CUDA releases without having to patch it for every minor release without adding any new function. If an unknown version is found, clang issues a warning (can be disabled with -Wno-cuda-unknown-version) and assumes that it has detected the latest known version. CUDA releases are usually supersets of older ones feature-wise, so it should be sufficient to keep released clang versions working with minor CUDA updates without having to upgrade clang, too. Differential Revision: https://reviews.llvm.org/D73231	2020-01-28 10:11:42 -08:00
Matt Arsenault	a4451d88ee	Consolidate internal denormal flushing controls Currently there are 4 different mechanisms for controlling denormal flushing behavior, and about as many equivalent frontend controls. - AMDGPU uses the fp32-denormals and fp64-f16-denormals subtarget features - NVPTX uses the nvptx-f32ftz attribute - ARM directly uses the denormal-fp-math attribute - Other targets indirectly use denormal-fp-math in one DAGCombine - cl-denorms-are-zero has a corresponding denorms-are-zero attribute AMDGPU wants a distinct control for f32 flushing from f16/f64, and as far as I can tell the same is true for NVPTX (based on the attribute name). Work on consolidating these into the denormal-fp-math attribute, and a new type specific denormal-fp-math-f32 variant. Only ARM seems to support the two different flush modes, so this is overkill for the other use cases. Ideally we would error on the unsupported positive-zero mode on other targets from somewhere. Move the logic for selecting the flush mode into the compiler driver, instead of handling it in cc1. denormal-fp-math/denormal-fp-math-f32 are now both cc1 flags, but denormal-fp-math-f32 is not yet exposed as a user flag. -cl-denorms-are-zero, -fcuda-flush-denormals-to-zero and -fno-cuda-flush-denormals-to-zero will be mapped to -fp-denormal-math-f32=ieee or preserve-sign rather than the old attributes. Stop emitting the denorms-are-zero attribute for the OpenCL flag. It has no in-tree users. The meaning would also be target dependent, such as the AMDGPU choice to treat this as only meaning allow flushing of f32 and not f16 or f64. The naming is also potentially confusing, since DAZ in other contexts refers to instructions implicitly treating input denormals as zero, not necessarily flushing output denormals to zero. This also does not attempt to change the behavior for the current attribute. The LangRef now states that the default is ieee behavior, but this is inaccurate for the current implementation. The clang handling is slightly hacky to avoid touching the existing denormal-fp-math uses. Fixing this will be left for a future patch. AMDGPU is still using the subtarget feature to control the denormal mode, but the new attribute are now emitted. A future change will switch this and remove the subtarget features.	2020-01-17 20:09:53 -05:00
Alexandre Ganea	1abd4c94d7	[Clang] Bypass distro detection on non-Linux hosts Skip distro detection when we're not running on Linux, or when the target triple is not Linux. This saves a few OS calls for each invocation of clang.exe. Differential Revision: https://reviews.llvm.org/D70467	2019-11-28 17:02:06 -05:00
Sergey Dmitriev	a0d83768f1	[Clang][OpenMP Offload] Add new tool for wrapping offload device binaries This patch removes the remaining part of the OpenMP offload linker scripts which was used for inserting device binaries into the output linked binary. Device binaries are now inserted into the host binary with a help of the wrapper bit-code file which contains device binaries as data. Wrapper bit-code file is dynamically created by the clang driver with a help of new tool clang-offload-wrapper which takes device binaries as input and produces bit-code file with required contents. Wrapper bit-code is then compiled to an object and resulting object is appended to the host linking by the clang driver. This is the second part of the patch for eliminating OpenMP linker script (please see https://reviews.llvm.org/D64943). Differential Revision: https://reviews.llvm.org/D68166 llvm-svn: 374219	2019-10-09 20:42:58 +00:00
Yaxun Liu	99d0d3ae90	[HIP] Use option -nogpulib to disable linking device lib Differential Revision: https://reviews.llvm.org/D68300 llvm-svn: 373649	2019-10-03 18:59:56 +00:00
Jonas Devlieghere	2b3d49b610	[Clang] Migrate llvm::make_unique to std::make_unique Now that we've moved to C++14, we no longer need the llvm::make_unique implementation from STLExtras.h. This patch is a mechanical replacement of (hopefully) all the llvm::make_unique instances across the monorepo. Differential revision: https://reviews.llvm.org/D66259 llvm-svn: 368942	2019-08-14 23:04:18 +00:00
Gheorghe-Teodor Bercea	db900e389a	[CUDA][Clang][Bugfix] Add missing CUDA 9.2 case Summary: The bug was reported on the OpenMP-dev list: .../obj-release/lib/clang/9.0.0/include/__clang_cuda_intrinsics.h:173:35: error: '__nvvm_shfl_sync_idx_i32' needs target feature ptx60\|ptx61\|ptx63\|ptx64 __MAKE_SYNC_SHUFFLES(__shfl_sync, __nvvm_shfl_sync_idx_i32, This problem occurs when trying to compile a .cu file that requires a newer ptx version (>ptx60 in this case) than ptx42. Reviewers: tra, ABataev, caomhin Reviewed By: tra Subscribers: jdoerfert, cfe-commits Tags: #clang Differential Revision: https://reviews.llvm.org/D61474 llvm-svn: 359910	2019-05-03 17:59:18 +00:00
Artem Belevich	4cbb235026	[CUDA] Do not pass deprecated option fo fatbinary CUDA 10.1 tools deprecated some command line options. fatbinary no longer needs --cuda. Differential Revision: https://reviews.llvm.org/D61470 llvm-svn: 359838	2019-05-02 22:37:19 +00:00
Artem Belevich	5fe85a003f	[CUDA] Implemented _[bi]mma* builtins. These builtins provide access to the new integer and sub-integer variants of MMA (matrix multiply-accumulate) instructions provided by CUDA-10.x on sm_75 (AKA Turing) GPUs. Also added a feature for PTX 6.4. While Clang/LLVM does not generate any PTX instructions that need it, we still need to pass it through to ptxas in order to be able to compile code that uses the new 'mma' instruction as inline assembly (e.g used by NVIDIA's CUTLASS library https://github.com/NVIDIA/cutlass/blob/master/cutlass/arch/mma.h#L101) Differential Revision: https://reviews.llvm.org/D60279 llvm-svn: 359248	2019-04-25 22:28:09 +00:00
Artem Belevich	4071763bb8	Basic CUDA-10 support. Differential Revision: https://reviews.llvm.org/D57771 llvm-svn: 353232	2019-02-05 22:38:58 +00:00
Artem Belevich	8fa28a0db0	[CUDA] Propagate detected version of CUDA to cc1 ..and use it to control that parts of CUDA compilation that depend on the specific version of CUDA SDK. This patch has a placeholder for a 'new launch API' support which is in a separate patch. The list will be further extended in the upcoming patch to support CUDA-10.1. Differential Revision: https://reviews.llvm.org/D57487 llvm-svn: 352798	2019-01-31 21:32:24 +00:00
Chandler Carruth	2946cd7010	Update the file headers across all of the LLVM projects in the monorepo to reflect the new license. We understand that people may be surprised that we're moving the header entirely to discuss the new license. We checked this carefully with the Foundation's lawyer and we believe this is the correct approach. Essentially, all code in the project is now made available by the LLVM project under our new license, so you will see that the license headers include that license only. Some of our contributors have contributed code under our old license, and accordingly, we have retained a copy of our old license notice in the top-level files in each project and repository. llvm-svn: 351636	2019-01-19 08:50:56 +00:00
Alexey Bataev	c92fc3c8bc	[CUDA][OPENMP][NVPTX]Improve logic of the debug info support. Summary: Added support for the -gline-directives-only option + fixed logic of the debug info for CUDA devices. If optimization level is O0, then options --[no-]cuda-noopt-device-debug do not affect the debug info level. If the optimization level is >O0, debug info options are used + --no-cuda-noopt-device-debug is used or no --cuda-noopt-device-debug is used, the optimization level for the device code is kept and the emission of the debug directives is used. If the opt level is > O0, debug info is requested + --cuda-noopt-device-debug option is used, the optimization is disabled for the device code + required debug info is emitted. Reviewers: tra, echristo Subscribers: aprantl, guansong, JDevlieghere, cfe-commits Differential Revision: https://reviews.llvm.org/D51554 llvm-svn: 348930	2018-12-12 14:52:27 +00:00
Joel E. Denny	6dd34dc3dd	[CUDA] Fix nvidia-cuda-toolkit detection on Ubuntu This just extends D40453 (r319317) to Ubuntu. Reviewed By: Hahnfeld, tra Differential Revision: https://reviews.llvm.org/D55269 llvm-svn: 348504	2018-12-06 17:46:17 +00:00
Jonas Devlieghere	fc51490baf	Lift VFS from clang to llvm (NFC) This patch moves the virtual file system form clang to llvm so it can be used by more projects. Concretely the patch: - Moves VirtualFileSystem.{h\|cpp} from clang/Basic to llvm/Support. - Moves the corresponding unit test from clang to llvm. - Moves the vfs namespace from clang::vfs to llvm::vfs. - Formats the lines affected by this change, mostly this is the result of the added llvm namespace. RFC on the mailing list: http://lists.llvm.org/pipermail/llvm-dev/2018-October/126657.html Differential revision: https://reviews.llvm.org/D52783 llvm-svn: 344140	2018-10-10 13:27:25 +00:00
Yaxun Liu	9767089d00	[HIP] Support early finalization of device code for -fno-gpu-rdc This patch renames -f{no-}cuda-rdc to -f{no-}gpu-rdc and keeps the original options as aliases. When -fgpu-rdc is off, clang will assume the device code in each translation unit does not call external functions except those in the device library, therefore it is possible to compile the device code in each translation unit to self-contained kernels and embed them in the host object, so that the host object behaves like usual host object which can be linked by lld. The benefits of this feature is: 1. allow users to create static libraries which can be linked by host linker; 2. amortized device code linking time. This patch modifies HIP action builder to insert actions for linking device code and generating HIP fatbin, and pass HIP fatbin to host backend action. It extracts code for constructing command for generating HIP fatbin as a function so that it can be reused by early finalization. It also modifies codegen of HIP host constructor functions to embed the device fatbin when it is available. Differential Revision: https://reviews.llvm.org/D52377 llvm-svn: 343611	2018-10-02 17:48:54 +00:00
Jonas Hahnfeld	a981f67bcd	[OpenMP] Improve search for libomptarget-nvptx When looking for the bclib Clang considered the default library path first while it preferred directories in LIBRARY_PATH when constructing the invocation of nvlink. The latter actually makes more sense because during development it allows using a non-default runtime library. So change the search for the bclib to start looking in directories given by LIBRARY_PATH. Additionally add a new option --libomptarget-nvptx-path= which will be searched first. This will be handy for testing purposes. Differential Revision: https://reviews.llvm.org/D51686 llvm-svn: 343230	2018-09-27 16:12:32 +00:00
Artem Belevich	44ecb0e3c2	[CUDA] Added basic support for compiling with CUDA-10.0 llvm-svn: 342924	2018-09-24 23:10:44 +00:00
Matt Arsenault	a13746b7eb	Rename -mlink-cuda-bitcode to -mlink-builtin-bitcode The same semantics work for OpenCL, and probably any offload language. Keep the old name around as an alias. llvm-svn: 340193	2018-08-20 18:16:48 +00:00
Alexey Bataev	b83b4e40fe	[DEBUGINFO] Disable unsupported debug info options for NVPTX target. Summary: Some targets support only default set of the debug options and do not support additional debug options, like NVPTX target. Patch introduced virtual function supportsDebugInfoOptions() that can be overloaded by the toolchain, checks if the target supports some debug options and emits warning when an unsupported debug option is found. Reviewers: echristo Subscribers: aprantl, JDevlieghere, cfe-commits Differential Revision: https://reviews.llvm.org/D49148 llvm-svn: 338155	2018-07-27 19:45:14 +00:00
Artem Belevich	578653a8fc	[CUDA] Fixed the list of GPUs supported by CUDA-9. Differential Revision: https://reviews.llvm.org/D47268 llvm-svn: 333098	2018-05-23 16:45:23 +00:00
Artem Belevich	679dafe69e	[CUDA] Added -f[no-]cuda-short-ptr option The option enables use of 32-bit pointers for accessing const/local/shared memory. The feature is disabled by default. Differential Revision: https://reviews.llvm.org/D46148 llvm-svn: 331938	2018-05-09 23:10:09 +00:00
Artem Belevich	3cce307799	[CUDA] Enable CUDA compilation with CUDA-9.2 Differential Revision: https://reviews.llvm.org/D45827 llvm-svn: 330753	2018-04-24 18:23:19 +00:00
Artem Belevich	0ae8590354	[NVPTX, CUDA] Added support for m8n32k16 and m32n8k16 variants of wmma instructions. The new instructions were added added for sm_70+ GPUs in CUDA-9.1. Differential Revision: https://reviews.llvm.org/D45068 llvm-svn: 330296	2018-04-18 21:51:48 +00:00
Alexey Bataev	e36c67b35c	[NVPTX] Emit debug info in DWARF-2 by default for Cuda devices. Summary: NVPTX target supports debug info in DWARF-2 format. Patch adds emission of debug info in DWARF-2 by default. Reviewers: tra, jlebar Subscribers: aprantl, JDevlieghere, cfe-commits Differential Revision: https://reviews.llvm.org/D42581 llvm-svn: 330272	2018-04-18 16:31:09 +00:00
Artem Belevich	dde3dc27ee	[CUDA] Added --[no-]cuda-include-ptx=sm_XX\|all option. Currently we always include PTX into the fatbin along with the GPU code.It about doubles the size of the GPU binary we need to carry in the executable. These options allow control inclusion of PTX into GPU binary. This patch does not change the defaults, though we may consider making no-PTX the default in the future. Differential Revision: https://reviews.llvm.org/D45495 llvm-svn: 329737	2018-04-10 18:38:22 +00:00
Alexander Kornienko	2a8c18d991	Fix typos in clang Found via codespell -q 3 -I ../clang-whitelist.txt Where whitelist consists of: archtype cas classs checkk compres definit frome iff inteval ith lod methode nd optin ot pres statics te thru Patch by luzpaz! (This is a subset of D44188 that applies cleanly with a few files that have dubious fixes reverted.) Differential revision: https://reviews.llvm.org/D44188 llvm-svn: 329399	2018-04-06 15:14:32 +00:00
Gheorghe-Teodor Bercea	0d5aa84ad9	[OpenMP] Add flag for linking runtime bitcode library Summary: This patch adds an additional flag to the OpenMP device offloading toolchain to link in the runtime library bitcode. Reviewers: Hahnfeld, ABataev, carlo.bertolli, caomhin, grokos, hfinkel Reviewed By: ABataev, grokos Subscribers: jholewinski, guansong, cfe-commits Differential Revision: https://reviews.llvm.org/D43197 llvm-svn: 327460	2018-03-13 23:19:52 +00:00
Gheorghe-Teodor Bercea	0805b80a73	Revert revision 327438. llvm-svn: 327447	2018-03-13 20:50:12 +00:00
Gheorghe-Teodor Bercea	148046c11b	[OpenMP] Add flag for linking runtime bitcode library Summary: This patch adds an additional flag to the OpenMP device offloading toolchain to link in the runtime library bitcode. Reviewers: Hahnfeld, ABataev, carlo.bertolli, caomhin, grokos, hfinkel Reviewed By: ABataev, grokos Subscribers: jholewinski, guansong, cfe-commits Differential Revision: https://reviews.llvm.org/D43197 llvm-svn: 327438	2018-03-13 19:39:19 +00:00
Jonas Hahnfeld	5379c6d6fd	[CUDA] Add option to generate relocatable device code As a first step, pass '-c/--compile-only' to ptxas so that it doesn't complain about references to external function. This will successfully generate object files, but they won't work at runtime because the registration routines need to adapted. Differential Revision: https://reviews.llvm.org/D42921 llvm-svn: 324878	2018-02-12 10:46:45 +00:00
Jonas Hahnfeld	7f9c518423	[CUDA] Detect installation in PATH If the CUDA toolkit is not installed to its default locations in /usr/local/cuda, the user is forced to specify --cuda-path. This is tedious and the driver can be smarter if well-known tools (like ptxas) can already be found in the PATH environment variable. Add option --cuda-path-ignore-env if the user wants to ignore set environment variables. Also use it in the tests to make sure the driver always finds the same CUDA installation, regardless of the user's environment. Differential Revision: https://reviews.llvm.org/D42642 llvm-svn: 323848	2018-01-31 08:26:51 +00:00
Artem Belevich	fbc56a904f	[CUDA] Added partial support for CUDA-9.1 Clang can use CUDA-9.1 now, though new APIs (are not implemented yet. The major change is that headers in CUDA-9.1 went through substantial changes that started in CUDA-9.0 which required substantial changes in the cuda compatibility headers provided by clang. There are two major issues: * CUDA SDK no longer provides declarations for libdevice functions. * A lot of device-side functions have become nvcc's builtins and CUDA headers no longer contain their implementations. This patch changes the way CUDA headers are handled if we compile with CUDA 9.x. Both 9.0 and 9.1 are affected. * Clang provides its own declarations of libdevice functions. * For CUDA-9.x clang now provides implementation of device-side 'standard library' functions using libdevice. This patch should not affect compilation with CUDA-8. There may be some observable differences for CUDA-9.0, though they are not expected to affect functionality. Tested: CUDA test-suite tests for all supported combinations of: CUDA: 7.0,7.5,8.0,9.0,9.1 GPU: sm_20, sm_35, sm_60, sm_70 Differential Revision: https://reviews.llvm.org/D42513 llvm-svn: 323713	2018-01-30 00:00:12 +00:00
Ismail Donmez	64f99dfa08	Fix function call to fix build ../tools/clang/lib/Driver/ToolChains/Cuda.cpp:80:18: error: reference to non-static member function must be called; did you mean to call it with no arguments? if (Distro(D.getVFS).IsDebian()) ~~^~~~~~ () llvm-svn: 319322	2017-11-29 15:18:02 +00:00
Sylvestre Ledru	02082c39f3	Follow up of r319317, add the missing header file llvm-svn: 319319	2017-11-29 15:11:53 +00:00
Sylvestre Ledru	0cfcdc3ffd	Add the nvidia-cuda-toolkit Debian package path to search path Summary: Reported here: http://bugs.debian.org/882505 Patch by Andreas Beckmann Reviewers: Hahnfeld, tra Reviewed By: tra Subscribers: jlebar, cfe-commits Differential Revision: https://reviews.llvm.org/D40453 llvm-svn: 319317	2017-11-29 15:03:28 +00:00
Jonas Hahnfeld	7c78cc5273	[OpenMP] Consistently use cubin extension for nvlink This was previously done in some places, but for example not for bundling so that single object compilation with -c failed. In addition cubin was used for all file types during unbundling which is incorrect for assembly files that are passed to ptxas. Tighten up the tests so that we can't regress in that area. Differential Revision: https://reviews.llvm.org/D40250 llvm-svn: 318763	2017-11-21 14:44:45 +00:00
Justin Lebar	066494d8c1	[CUDA] Print an error if you try to compile with < sm_30 on CUDA 9. Summary: CUDA 9's minimum sm is sm_30. Ideally we should also make sm_30 the default when compiling with CUDA 9, but that seems harder than it should be. Subscribers: sanjoy Differential Revision: https://reviews.llvm.org/D39109 llvm-svn: 316611	2017-10-25 21:32:06 +00:00
Jonas Hahnfeld	30b4418e5a	[CMake][OpenMP] Customize default offloading arch For the shuffle instructions in reductions we need at least sm_30 but the user may want to customize the default architecture. Differential Revision: https://reviews.llvm.org/D38883 llvm-svn: 315996	2017-10-17 13:37:36 +00:00
Jonas Hahnfeld	e2c342fc65	[CUDA] Require libdevice only if needed If the user passes -nocudalib, we can live without it being present. Simplify the code by just checking whether LibDeviceMap is empty. Differential Revision: https://reviews.llvm.org/D38901 llvm-svn: 315902	2017-10-16 13:31:30 +00:00
Gheorghe-Teodor Bercea	5a3608ccfa	[OpenMP] Don't throw cudalib not found error if only front-end is required. Summary: If we only use the compiler front-end, do not throw an error about the cuda device library not being found. This allows the front-end to be run on systems where no Cuda installation is found. Reviewers: Hahnfeld, ABataev, carlo.bertolli, caomhin, tra Reviewed By: tra Subscribers: hfinkel, tra, cfe-commits Differential Revision: https://reviews.llvm.org/D37914 llvm-svn: 314217	2017-09-26 15:36:20 +00:00
Gheorghe-Teodor Bercea	20789a5f09	[OpenMP] Enable the existing nocudalib flag for OpenMP offloading toolchain. Summary: Enable the -nocudalib flag for the OpenMP device offloading toolchain as well. Currently it can only be used for the CUDA toolchain. Reviewers: Hahnfeld, ABataev, carlo.bertolli, caomhin, hfinkel, tra Reviewed By: tra Subscribers: hfinkel, cfe-commits Differential Revision: https://reviews.llvm.org/D37913 llvm-svn: 314164	2017-09-25 21:56:32 +00:00
Gheorghe-Teodor Bercea	5636f4b33a	[OpenMP] Bugfix: output file name drops the absolute path where full path is needed. Summary: When composing the output file name, the path to the file is being dropped. The full path is required. Reviewers: Hahnfeld, ABataev, caomhin, carlo.bertolli, hfinkel, tra Reviewed By: tra Subscribers: hfinkel, tra, cfe-commits Differential Revision: https://reviews.llvm.org/D37912 llvm-svn: 314156	2017-09-25 21:25:38 +00:00
Gheorghe-Teodor Bercea	d45720b55a	Revert commit with wrong message. llvm-svn: 314154	2017-09-25 21:22:49 +00:00
Gheorghe-Teodor Bercea	8cf757ceda	[OpenMP] Don't throw cudalib not found error if only front-end is required. Summary: If we only use the compiler front-end, do not throw an error about the cuda device library not being found. This allows the front-end to be run on systems where no Cuda installation is found. Reviewers: Hahnfeld, ABataev, carlo.bertolli, caomhin, tra Reviewed By: tra Subscribers: hfinkel, tra, cfe-commits Differential Revision: https://reviews.llvm.org/D37914 llvm-svn: 314150	2017-09-25 21:07:16 +00:00
Artem Belevich	4654dc89be	[NVPTX] Implemented shfl.sync instruction and supporting intrinsics/builtins. Differential Revision: https://reviews.llvm.org/D38090 llvm-svn: 313820	2017-09-20 21:23:07 +00:00
Artem Belevich	8af4e23d1e	[CUDA] Added rudimentary support for CUDA-9 and sm_70. For now CUDA-9 is not included in the list of CUDA versions clang searches for, so the path to CUDA-9 must be explicitly passed via --cuda-path=. On LLVM side NVPTX added sm_70 GPU type which bumps required PTX version to 6.0, but otherwise is equivalent to sm_62 at the moment. Differential Revision: https://reviews.llvm.org/D37576 llvm-svn: 312734	2017-09-07 18:14:32 +00:00
Gheorghe-Teodor Bercea	9c52574886	[OpenMP] Enable previously successful offloading tests. Create a separate test file to contain all tests for OpenMP offloading to GPUs. Make libdevice checking more robust by accounting for the case in which no libdevice is found. This changes are in connrection with diff: D29660 llvm-svn: 310718	2017-08-11 15:46:22 +00:00

1 2

65 Commits