Commit Graph

124 Commits

Author SHA1 Message Date
Yaxun (Sam) Liu 8a8c6913a9 Revert "[HIP] Add default header and include path"
This reverts commit 11d06b9511.
2020-06-05 15:42:57 -04:00
Yaxun (Sam) Liu 11d06b9511 [HIP] Add default header and include path
To support std::complex and some other standard C/C++ functions in HIP device code,
they need to be forced to be __host__ __device__ functions by pragmas. This is done
by some clang standard C++ wrapper headers which are shared between cuda-clang and hip-Clang.

For these standard C++ wapper headers to work properly, specific include path order
has to be enforced:

  clang C++ wrapper include path
  standard C++ include path
  clang include path

Also, these C++ wrapper headers require device version of some standard C/C++ functions
must be declared before including them. This needs to be done by including a default
header which declares or defines these device functions. The default header is always
included before any other headers are included by users.

This patch adds the the default header and include path for HIP.

Differential Revision: https://reviews.llvm.org/D81176
2020-06-05 12:44:57 -04:00
Matt Arsenault dc89a3efb4 HIP: Fix handling of denormal mode
I didn't realize HIP was a distinct offloading kind, so the subtarget
was looking for -march, which isn't correct for HIP. We also have the
possibility of different denormal defaults in the case of multiple
offload targets, so we need to thread the JobAction through the target
hook.
2020-04-13 11:48:45 -07:00
Artem Belevich a9627b7ea7 [CUDA] Add partial support for recent CUDA versions.
Generate PTX using newer versions of PTX and allow using sm_80 with CUDA-11.
None of the new features of CUDA-10.2+ have been implemented yet, so using these
versions will still produce a warning.

Differential Revision: https://reviews.llvm.org/D77670
2020-04-08 11:19:44 -07:00
Artem Belevich 33386b20aa [CUDA] Simplify GPU variant handling. NFC.
Instead of hardcoding individual GPU mappings in multiple functions, keep them
all in one table and use it to look up the mappings.

We also don't care about 'virtual' architecture much, so the API is trimmed down
down to a simpler GPU->Virtual arch name lookup.

Differential Revision: https://reviews.llvm.org/D77665
2020-04-08 11:19:43 -07:00
Yaxun (Sam) Liu 2ae25647d1 [CUDA][HIP] Add -Xarch_device and -Xarch_host options
The argument after -Xarch_device will be added to the arguments for CUDA/HIP
device compilation and will be removed for host compilation.

The argument after -Xarch_host will be added to the arguments for CUDA/HIP
host compilation and will be removed for device compilation.

Differential Revision: https://reviews.llvm.org/D76520
2020-03-24 10:13:05 -04:00
Yaxun (Sam) Liu 78957bab55 [NFC] Refactor handling of Xarch option
Extract common code to a function. To prepare for
adding an option for CUDA/HIP host and device only
option.

Differential Revision: https://reviews.llvm.org/D76455
2020-03-22 14:42:09 -04:00
Artem Belevich eb2ba2ea95 [CUDA] Warn about unsupported CUDA SDK version only if it's used.
This fixes an issue with clang issuing a warning about unknown CUDA SDK if it's
detected during non-CUDA compilation.

Differential Revision: https://reviews.llvm.org/D76030
2020-03-12 10:04:10 -07:00
Reid Kleckner 213aea4c58 Remove unused Endian.h includes, NFC
Mainly avoids including Host.h everywhere:

$ diff -u <(sort thedeps-before.txt) <(sort thedeps-after.txt) \
    | grep '^[-+] ' | sort | uniq -c | sort -nr
   3141 - /usr/local/google/home/rnk/llvm-project/llvm/include/llvm/Support/Host.h
2020-03-11 15:45:34 -07:00
Matt Arsenault a3c814d234 Separately track input and output denormal mode
AMDGPU and x86 at least both have separate controls for whether
denormal results are flushed on output, and for whether denormals are
implicitly treated as 0 as an input. The current DAGCombiner use only
really cares about the input treatment of denormals.
2020-02-04 12:59:21 -05:00
Benjamin Kramer adcd026838 Make llvm::StringRef to std::string conversions explicit.
This is how it should've been and brings it more in line with
std::string_view. There should be no functional change here.

This is mostly mechanical from a custom clang-tidy check, with a lot of
manual fixups. It uncovers a lot of minor inefficiencies.

This doesn't actually modify StringRef yet, I'll do that in a follow-up.
2020-01-28 23:25:25 +01:00
Artem Belevich 12fefeef20 [CUDA] Assume the latest known CUDA version if we've found an unknown one.
This makes clang somewhat forward-compatible with new CUDA releases
without having to patch it for every minor release without adding
any new function.

If an unknown version is found, clang issues a warning (can be disabled
with -Wno-cuda-unknown-version) and assumes that it has detected
the latest known version. CUDA releases are usually supersets
of older ones feature-wise, so it should be sufficient to keep
released clang versions working with minor CUDA updates without
having to upgrade clang, too.

Differential Revision: https://reviews.llvm.org/D73231
2020-01-28 10:11:42 -08:00
Matt Arsenault a4451d88ee Consolidate internal denormal flushing controls
Currently there are 4 different mechanisms for controlling denormal
flushing behavior, and about as many equivalent frontend controls.

- AMDGPU uses the fp32-denormals and fp64-f16-denormals subtarget features
- NVPTX uses the nvptx-f32ftz attribute
- ARM directly uses the denormal-fp-math attribute
- Other targets indirectly use denormal-fp-math in one DAGCombine
- cl-denorms-are-zero has a corresponding denorms-are-zero attribute

AMDGPU wants a distinct control for f32 flushing from f16/f64, and as
far as I can tell the same is true for NVPTX (based on the attribute
name).

Work on consolidating these into the denormal-fp-math attribute, and a
new type specific denormal-fp-math-f32 variant. Only ARM seems to
support the two different flush modes, so this is overkill for the
other use cases. Ideally we would error on the unsupported
positive-zero mode on other targets from somewhere.

Move the logic for selecting the flush mode into the compiler driver,
instead of handling it in cc1. denormal-fp-math/denormal-fp-math-f32
are now both cc1 flags, but denormal-fp-math-f32 is not yet exposed as
a user flag.

-cl-denorms-are-zero, -fcuda-flush-denormals-to-zero and
-fno-cuda-flush-denormals-to-zero will be mapped to
-fp-denormal-math-f32=ieee or preserve-sign rather than the old
attributes.

Stop emitting the denorms-are-zero attribute for the OpenCL flag. It
has no in-tree users. The meaning would also be target dependent, such
as the AMDGPU choice to treat this as only meaning allow flushing of
f32 and not f16 or f64. The naming is also potentially confusing,
since DAZ in other contexts refers to instructions implicitly treating
input denormals as zero, not necessarily flushing output denormals to
zero.

This also does not attempt to change the behavior for the current
attribute. The LangRef now states that the default is ieee behavior,
but this is inaccurate for the current implementation. The clang
handling is slightly hacky to avoid touching the existing
denormal-fp-math uses. Fixing this will be left for a future patch.

AMDGPU is still using the subtarget feature to control the denormal
mode, but the new attribute are now emitted. A future change will
switch this and remove the subtarget features.
2020-01-17 20:09:53 -05:00
Alexandre Ganea 1abd4c94d7 [Clang] Bypass distro detection on non-Linux hosts
Skip distro detection when we're not running on Linux, or when the target triple is not Linux. This saves a few OS calls for each invocation of clang.exe.

Differential Revision: https://reviews.llvm.org/D70467
2019-11-28 17:02:06 -05:00
Sergey Dmitriev a0d83768f1 [Clang][OpenMP Offload] Add new tool for wrapping offload device binaries
This patch removes the remaining part of the OpenMP offload linker scripts which was used for inserting device binaries into the output linked binary. Device binaries are now inserted into the host binary with a help of the wrapper bit-code file which contains device binaries as data. Wrapper bit-code file is dynamically created by the clang driver with a help of new tool clang-offload-wrapper which takes device binaries as input and produces bit-code file with required contents. Wrapper bit-code is then compiled to an object and resulting object is appended to the host linking by the clang driver.

This is the second part of the patch for eliminating OpenMP linker script (please see https://reviews.llvm.org/D64943).

Differential Revision: https://reviews.llvm.org/D68166

llvm-svn: 374219
2019-10-09 20:42:58 +00:00
Yaxun Liu 99d0d3ae90 [HIP] Use option -nogpulib to disable linking device lib
Differential Revision: https://reviews.llvm.org/D68300

llvm-svn: 373649
2019-10-03 18:59:56 +00:00
Jonas Devlieghere 2b3d49b610 [Clang] Migrate llvm::make_unique to std::make_unique
Now that we've moved to C++14, we no longer need the llvm::make_unique
implementation from STLExtras.h. This patch is a mechanical replacement
of (hopefully) all the llvm::make_unique instances across the monorepo.

Differential revision: https://reviews.llvm.org/D66259

llvm-svn: 368942
2019-08-14 23:04:18 +00:00
Gheorghe-Teodor Bercea db900e389a [CUDA][Clang][Bugfix] Add missing CUDA 9.2 case
Summary:
The bug was reported on the OpenMP-dev list:

.../obj-release/lib/clang/9.0.0/include/__clang_cuda_intrinsics.h:173:35: error: '__nvvm_shfl_sync_idx_i32' needs target feature ptx60|ptx61|ptx63|ptx64
__MAKE_SYNC_SHUFFLES(__shfl_sync, __nvvm_shfl_sync_idx_i32,

This problem occurs when trying to compile a .cu file that requires a newer ptx version (>ptx60 in this case) than ptx42.



Reviewers: tra, ABataev, caomhin

Reviewed By: tra

Subscribers: jdoerfert, cfe-commits

Tags: #clang

Differential Revision: https://reviews.llvm.org/D61474

llvm-svn: 359910
2019-05-03 17:59:18 +00:00
Artem Belevich 4cbb235026 [CUDA] Do not pass deprecated option fo fatbinary
CUDA 10.1 tools deprecated some command line options.
fatbinary no longer needs --cuda.

Differential Revision: https://reviews.llvm.org/D61470

llvm-svn: 359838
2019-05-02 22:37:19 +00:00
Artem Belevich 5fe85a003f [CUDA] Implemented _[bi]mma* builtins.
These builtins provide access to the new integer and
sub-integer variants of MMA (matrix multiply-accumulate) instructions
provided by CUDA-10.x on sm_75 (AKA Turing) GPUs.

Also added a feature for PTX 6.4. While Clang/LLVM does not generate
any PTX instructions that need it, we still need to pass it through to
ptxas in order to be able to compile code that uses the new 'mma'
instruction as inline assembly (e.g used by NVIDIA's CUTLASS library
https://github.com/NVIDIA/cutlass/blob/master/cutlass/arch/mma.h#L101)

Differential Revision: https://reviews.llvm.org/D60279

llvm-svn: 359248
2019-04-25 22:28:09 +00:00
Artem Belevich 4071763bb8 Basic CUDA-10 support.
Differential Revision: https://reviews.llvm.org/D57771

llvm-svn: 353232
2019-02-05 22:38:58 +00:00
Artem Belevich 8fa28a0db0 [CUDA] Propagate detected version of CUDA to cc1
..and use it to control that parts of CUDA compilation
that depend on the specific version of CUDA SDK.

This patch has a placeholder for a 'new launch API' support
which is in a separate patch. The list will be further
extended in the upcoming patch to support CUDA-10.1.

Differential Revision: https://reviews.llvm.org/D57487

llvm-svn: 352798
2019-01-31 21:32:24 +00:00
Chandler Carruth 2946cd7010 Update the file headers across all of the LLVM projects in the monorepo
to reflect the new license.

We understand that people may be surprised that we're moving the header
entirely to discuss the new license. We checked this carefully with the
Foundation's lawyer and we believe this is the correct approach.

Essentially, all code in the project is now made available by the LLVM
project under our new license, so you will see that the license headers
include that license only. Some of our contributors have contributed
code under our old license, and accordingly, we have retained a copy of
our old license notice in the top-level files in each project and
repository.

llvm-svn: 351636
2019-01-19 08:50:56 +00:00
Alexey Bataev c92fc3c8bc [CUDA][OPENMP][NVPTX]Improve logic of the debug info support.
Summary:
Added support for the -gline-directives-only option + fixed logic of the
debug info for CUDA devices. If optimization level is O0, then options
--[no-]cuda-noopt-device-debug do not affect the debug info level. If
the optimization level is >O0, debug info options are used +
--no-cuda-noopt-device-debug is used or no --cuda-noopt-device-debug is
used, the optimization level for the device code is kept and the
emission of the debug directives is used.
If the opt level is > O0, debug info is requested +
--cuda-noopt-device-debug option is used, the optimization is disabled
for the device code + required debug info is emitted.

Reviewers: tra, echristo

Subscribers: aprantl, guansong, JDevlieghere, cfe-commits

Differential Revision: https://reviews.llvm.org/D51554

llvm-svn: 348930
2018-12-12 14:52:27 +00:00
Joel E. Denny 6dd34dc3dd [CUDA] Fix nvidia-cuda-toolkit detection on Ubuntu
This just extends D40453 (r319317) to Ubuntu.

Reviewed By: Hahnfeld, tra

Differential Revision: https://reviews.llvm.org/D55269

llvm-svn: 348504
2018-12-06 17:46:17 +00:00
Jonas Devlieghere fc51490baf Lift VFS from clang to llvm (NFC)
This patch moves the virtual file system form clang to llvm so it can be
used by more projects.

Concretely the patch:
 - Moves VirtualFileSystem.{h|cpp} from clang/Basic to llvm/Support.
 - Moves the corresponding unit test from clang to llvm.
 - Moves the vfs namespace from clang::vfs to llvm::vfs.
 - Formats the lines affected by this change, mostly this is the result of
   the added llvm namespace.

RFC on the mailing list:
http://lists.llvm.org/pipermail/llvm-dev/2018-October/126657.html

Differential revision: https://reviews.llvm.org/D52783

llvm-svn: 344140
2018-10-10 13:27:25 +00:00
Yaxun Liu 9767089d00 [HIP] Support early finalization of device code for -fno-gpu-rdc
This patch renames -f{no-}cuda-rdc to -f{no-}gpu-rdc and keeps the original
options as aliases. When -fgpu-rdc is off,
clang will assume the device code in each translation unit does not call
external functions except those in the device library, therefore it is possible
to compile the device code in each translation unit to self-contained kernels
and embed them in the host object, so that the host object behaves like
usual host object which can be linked by lld.

The benefits of this feature is: 1. allow users to create static libraries which
can be linked by host linker; 2. amortized device code linking time.

This patch modifies HIP action builder to insert actions for linking device
code and generating HIP fatbin, and pass HIP fatbin to host backend action.
It extracts code for constructing command for generating HIP fatbin as
a function so that it can be reused by early finalization. It also modifies
codegen of HIP host constructor functions to embed the device fatbin
when it is available.

Differential Revision: https://reviews.llvm.org/D52377

llvm-svn: 343611
2018-10-02 17:48:54 +00:00
Jonas Hahnfeld a981f67bcd [OpenMP] Improve search for libomptarget-nvptx
When looking for the bclib Clang considered the default library
path first while it preferred directories in LIBRARY_PATH when
constructing the invocation of nvlink. The latter actually makes
more sense because during development it allows using a non-default
runtime library. So change the search for the bclib to start
looking in directories given by LIBRARY_PATH.
Additionally add a new option --libomptarget-nvptx-path= which
will be searched first. This will be handy for testing purposes.

Differential Revision: https://reviews.llvm.org/D51686

llvm-svn: 343230
2018-09-27 16:12:32 +00:00
Artem Belevich 44ecb0e3c2 [CUDA] Added basic support for compiling with CUDA-10.0
llvm-svn: 342924
2018-09-24 23:10:44 +00:00
Matt Arsenault a13746b7eb Rename -mlink-cuda-bitcode to -mlink-builtin-bitcode
The same semantics work for OpenCL, and probably any offload
language. Keep the old name around as an alias.

llvm-svn: 340193
2018-08-20 18:16:48 +00:00
Alexey Bataev b83b4e40fe [DEBUGINFO] Disable unsupported debug info options for NVPTX target.
Summary:
Some targets support only default set of the debug options and do not
support additional debug options, like NVPTX target. Patch introduced
virtual function supportsDebugInfoOptions() that can be overloaded
by the toolchain, checks if the target supports some debug
options and emits warning when an unsupported debug option is
found.

Reviewers: echristo

Subscribers: aprantl, JDevlieghere, cfe-commits

Differential Revision: https://reviews.llvm.org/D49148

llvm-svn: 338155
2018-07-27 19:45:14 +00:00
Artem Belevich 578653a8fc [CUDA] Fixed the list of GPUs supported by CUDA-9.
Differential Revision: https://reviews.llvm.org/D47268

llvm-svn: 333098
2018-05-23 16:45:23 +00:00
Artem Belevich 679dafe69e [CUDA] Added -f[no-]cuda-short-ptr option
The option enables use of 32-bit pointers for accessing
const/local/shared memory. The feature is disabled by default.

Differential Revision: https://reviews.llvm.org/D46148

llvm-svn: 331938
2018-05-09 23:10:09 +00:00
Artem Belevich 3cce307799 [CUDA] Enable CUDA compilation with CUDA-9.2
Differential Revision: https://reviews.llvm.org/D45827

llvm-svn: 330753
2018-04-24 18:23:19 +00:00
Artem Belevich 0ae8590354 [NVPTX, CUDA] Added support for m8n32k16 and m32n8k16 variants of wmma instructions.
The new instructions were added added for sm_70+ GPUs in CUDA-9.1.

Differential Revision: https://reviews.llvm.org/D45068

llvm-svn: 330296
2018-04-18 21:51:48 +00:00
Alexey Bataev e36c67b35c [NVPTX] Emit debug info in DWARF-2 by default for Cuda devices.
Summary:
NVPTX target supports debug info in DWARF-2 format. Patch adds emission
of debug info in DWARF-2 by default.

Reviewers: tra, jlebar

Subscribers: aprantl, JDevlieghere, cfe-commits

Differential Revision: https://reviews.llvm.org/D42581

llvm-svn: 330272
2018-04-18 16:31:09 +00:00
Artem Belevich dde3dc27ee [CUDA] Added --[no-]cuda-include-ptx=sm_XX|all option.
Currently we always include PTX into the fatbin along
with the GPU code.It about doubles the size of the GPU binary
we need to carry in the executable. These options allow control
inclusion of PTX into GPU binary.

This patch does not change the defaults, though we may consider
making no-PTX the default in the future.

Differential Revision: https://reviews.llvm.org/D45495

llvm-svn: 329737
2018-04-10 18:38:22 +00:00
Alexander Kornienko 2a8c18d991 Fix typos in clang
Found via codespell -q 3 -I ../clang-whitelist.txt
Where whitelist consists of:

  archtype
  cas
  classs
  checkk
  compres
  definit
  frome
  iff
  inteval
  ith
  lod
  methode
  nd
  optin
  ot
  pres
  statics
  te
  thru

Patch by luzpaz! (This is a subset of D44188 that applies cleanly with a few
files that have dubious fixes reverted.)

Differential revision: https://reviews.llvm.org/D44188

llvm-svn: 329399
2018-04-06 15:14:32 +00:00
Gheorghe-Teodor Bercea 0d5aa84ad9 [OpenMP] Add flag for linking runtime bitcode library
Summary: This patch adds an additional flag to the OpenMP device offloading toolchain to link in the runtime library bitcode.

Reviewers: Hahnfeld, ABataev, carlo.bertolli, caomhin, grokos, hfinkel

Reviewed By: ABataev, grokos

Subscribers: jholewinski, guansong, cfe-commits

Differential Revision: https://reviews.llvm.org/D43197

llvm-svn: 327460
2018-03-13 23:19:52 +00:00
Gheorghe-Teodor Bercea 0805b80a73 Revert revision 327438.
llvm-svn: 327447
2018-03-13 20:50:12 +00:00
Gheorghe-Teodor Bercea 148046c11b [OpenMP] Add flag for linking runtime bitcode library
Summary: This patch adds an additional flag to the OpenMP device offloading toolchain to link in the runtime library bitcode.

Reviewers: Hahnfeld, ABataev, carlo.bertolli, caomhin, grokos, hfinkel

Reviewed By: ABataev, grokos

Subscribers: jholewinski, guansong, cfe-commits

Differential Revision: https://reviews.llvm.org/D43197

llvm-svn: 327438
2018-03-13 19:39:19 +00:00
Jonas Hahnfeld 5379c6d6fd [CUDA] Add option to generate relocatable device code
As a first step, pass '-c/--compile-only' to ptxas so that it
doesn't complain about references to external function. This
will successfully generate object files, but they won't work
at runtime because the registration routines need to adapted.

Differential Revision: https://reviews.llvm.org/D42921

llvm-svn: 324878
2018-02-12 10:46:45 +00:00
Jonas Hahnfeld 7f9c518423 [CUDA] Detect installation in PATH
If the CUDA toolkit is not installed to its default locations
in /usr/local/cuda, the user is forced to specify --cuda-path.
This is tedious and the driver can be smarter if well-known tools
(like ptxas) can already be found in the PATH environment variable.

Add option --cuda-path-ignore-env if the user wants to ignore
set environment variables. Also use it in the tests to make sure
the driver always finds the same CUDA installation, regardless
of the user's environment.

Differential Revision: https://reviews.llvm.org/D42642

llvm-svn: 323848
2018-01-31 08:26:51 +00:00
Artem Belevich fbc56a904f [CUDA] Added partial support for CUDA-9.1
Clang can use CUDA-9.1 now, though new APIs (are not implemented yet.

The major change is that headers in CUDA-9.1 went through substantial
changes that started in CUDA-9.0 which required substantial changes
in the cuda compatibility headers provided by clang.

There are two major issues:
* CUDA SDK no longer provides declarations for libdevice functions.
* A lot of device-side functions have become nvcc's builtins and
  CUDA headers no longer contain their implementations.

This patch changes the way CUDA headers are handled if we compile
with CUDA 9.x. Both 9.0 and 9.1 are affected.

* Clang provides its own declarations of libdevice functions.
* For CUDA-9.x clang now provides implementation of device-side
  'standard library' functions using libdevice.

This patch should not affect compilation with CUDA-8. There may be
some observable differences for CUDA-9.0, though they are not expected
to affect functionality.

Tested: CUDA test-suite tests for all supported combinations of:
        CUDA: 7.0,7.5,8.0,9.0,9.1
        GPU: sm_20, sm_35, sm_60, sm_70

Differential Revision: https://reviews.llvm.org/D42513

llvm-svn: 323713
2018-01-30 00:00:12 +00:00
Ismail Donmez 64f99dfa08 Fix function call to fix build
../tools/clang/lib/Driver/ToolChains/Cuda.cpp:80:18: error: reference to non-static member function must be called; did you mean to call it with no arguments?
    if (Distro(D.getVFS).IsDebian())
               ~~^~~~~~
                       ()

llvm-svn: 319322
2017-11-29 15:18:02 +00:00
Sylvestre Ledru 02082c39f3 Follow up of r319317, add the missing header file
llvm-svn: 319319
2017-11-29 15:11:53 +00:00
Sylvestre Ledru 0cfcdc3ffd Add the nvidia-cuda-toolkit Debian package path to search path
Summary:
Reported here:
http://bugs.debian.org/882505

Patch by Andreas Beckmann


Reviewers: Hahnfeld, tra

Reviewed By: tra

Subscribers: jlebar, cfe-commits

Differential Revision: https://reviews.llvm.org/D40453

llvm-svn: 319317
2017-11-29 15:03:28 +00:00
Jonas Hahnfeld 7c78cc5273 [OpenMP] Consistently use cubin extension for nvlink
This was previously done in some places, but for example not for
bundling so that single object compilation with -c failed. In
addition cubin was used for all file types during unbundling which
is incorrect for assembly files that are passed to ptxas.
Tighten up the tests so that we can't regress in that area.

Differential Revision: https://reviews.llvm.org/D40250

llvm-svn: 318763
2017-11-21 14:44:45 +00:00
Justin Lebar 066494d8c1 [CUDA] Print an error if you try to compile with < sm_30 on CUDA 9.
Summary:
CUDA 9's minimum sm is sm_30.

Ideally we should also make sm_30 the default when compiling with CUDA
9, but that seems harder than it should be.

Subscribers: sanjoy

Differential Revision: https://reviews.llvm.org/D39109

llvm-svn: 316611
2017-10-25 21:32:06 +00:00
Jonas Hahnfeld 30b4418e5a [CMake][OpenMP] Customize default offloading arch
For the shuffle instructions in reductions we need at least sm_30
but the user may want to customize the default architecture.

Differential Revision: https://reviews.llvm.org/D38883

llvm-svn: 315996
2017-10-17 13:37:36 +00:00
Jonas Hahnfeld e2c342fc65 [CUDA] Require libdevice only if needed
If the user passes -nocudalib, we can live without it being present.
Simplify the code by just checking whether LibDeviceMap is empty.

Differential Revision: https://reviews.llvm.org/D38901

llvm-svn: 315902
2017-10-16 13:31:30 +00:00
Gheorghe-Teodor Bercea 5a3608ccfa [OpenMP] Don't throw cudalib not found error if only front-end is required.
Summary: If we only use the compiler front-end, do not throw an error about the cuda device library not being found. This allows the front-end to be run on systems where no Cuda installation is found.

Reviewers: Hahnfeld, ABataev, carlo.bertolli, caomhin, tra

Reviewed By: tra

Subscribers: hfinkel, tra, cfe-commits

Differential Revision: https://reviews.llvm.org/D37914

llvm-svn: 314217
2017-09-26 15:36:20 +00:00
Gheorghe-Teodor Bercea 20789a5f09 [OpenMP] Enable the existing nocudalib flag for OpenMP offloading toolchain.
Summary: Enable the -nocudalib flag for the OpenMP device offloading toolchain as well. Currently it can only be used for the CUDA toolchain.

Reviewers: Hahnfeld, ABataev, carlo.bertolli, caomhin, hfinkel, tra

Reviewed By: tra

Subscribers: hfinkel, cfe-commits

Differential Revision: https://reviews.llvm.org/D37913

llvm-svn: 314164
2017-09-25 21:56:32 +00:00
Gheorghe-Teodor Bercea 5636f4b33a [OpenMP] Bugfix: output file name drops the absolute path where full path is needed.
Summary: When composing the output file name, the path to the file is being dropped. The full path is required.

Reviewers: Hahnfeld, ABataev, caomhin, carlo.bertolli, hfinkel, tra

Reviewed By: tra

Subscribers: hfinkel, tra, cfe-commits

Differential Revision: https://reviews.llvm.org/D37912

llvm-svn: 314156
2017-09-25 21:25:38 +00:00
Gheorghe-Teodor Bercea d45720b55a Revert commit with wrong message.
llvm-svn: 314154
2017-09-25 21:22:49 +00:00
Gheorghe-Teodor Bercea 8cf757ceda [OpenMP] Don't throw cudalib not found error if only front-end is required.
Summary: If we only use the compiler front-end, do not throw an error about the cuda device library not being found. This allows the front-end to be run on systems where no Cuda installation is found.

Reviewers: Hahnfeld, ABataev, carlo.bertolli, caomhin, tra

Reviewed By: tra

Subscribers: hfinkel, tra, cfe-commits

Differential Revision: https://reviews.llvm.org/D37914

llvm-svn: 314150
2017-09-25 21:07:16 +00:00
Artem Belevich 4654dc89be [NVPTX] Implemented shfl.sync instruction and supporting intrinsics/builtins.
Differential Revision: https://reviews.llvm.org/D38090

llvm-svn: 313820
2017-09-20 21:23:07 +00:00
Artem Belevich 8af4e23d1e [CUDA] Added rudimentary support for CUDA-9 and sm_70.
For now CUDA-9 is not included in the list of CUDA versions clang
searches for, so the path to CUDA-9 must be explicitly passed
via --cuda-path=.

On LLVM side NVPTX added sm_70 GPU type which bumps required
PTX version to 6.0, but otherwise is equivalent to sm_62 at the moment.

Differential Revision: https://reviews.llvm.org/D37576

llvm-svn: 312734
2017-09-07 18:14:32 +00:00
Gheorghe-Teodor Bercea 9c52574886 [OpenMP] Enable previously successful offloading tests.
Create a separate test file to contain all tests for OpenMP
offloading to GPUs.

Make libdevice checking more robust by accounting for
the case in which no libdevice is found.

This changes are in connrection with diff: D29660

llvm-svn: 310718
2017-08-11 15:46:22 +00:00
Gheorghe-Teodor Bercea 14528c60ba [OpenMP] Delete tests in openmp-offload.c which cuase failures
until a better way to perform these tests is figured out.

Change connected to diff: D29654

llvm-svn: 310625
2017-08-10 16:56:59 +00:00
Alex Lorenz 994f231792 Revert r310489 and follow-up commits r310505, r310519, r310537 and r310549
Commit r310489 caused 'openmp-offload.c' test failures on Darwin and other
platforms:
http://lab.llvm.org:8080/green/job/clang-stage1-cmake-RA-incremental_check/39230/testReport/junit/Clang/Driver/openmp_offload_c/

The follow-up commits tried to fix the test, but the test is still failing.

llvm-svn: 310580
2017-08-10 10:34:46 +00:00
Gheorghe-Teodor Bercea a659943306 [OpenMP] Provide a default GPU arch that is supported by
the underlying hardware.

This fixes a bug triggered by diff: D29660

llvm-svn: 310549
2017-08-10 05:01:42 +00:00
Gheorghe-Teodor Bercea 690f6f9b9f [OpenMP] Enable executable lookup into driver directory.
Summary: Invoking the compiler inside a script causes the clang-offload-bundler executable to not be found. This patch enables the lookup for executables in the driver directory where the clang-offload-bundler resides.

Reviewers: hfinkel, carlo.bertolli, arpith-jacob, ABataev, caomhin

Reviewed By: hfinkel

Subscribers: cfe-commits

Differential Revision: https://reviews.llvm.org/D36537

llvm-svn: 310513
2017-08-09 19:52:28 +00:00
Gheorghe-Teodor Bercea 6b26dcb6d6 [OpenMP] Add flag for overwriting default PTX version for OpenMP targets
Summary:
This flag "--fopenmp-ptx=" enables the overwriting of the default PTX version used for GPU offloaded OpenMP target regions: "+ptx42".



Reviewers: arpith-jacob, caomhin, carlo.bertolli, ABataev, Hahnfeld, jlebar, hfinkel, tstellar

Reviewed By: ABataev

Subscribers: rengolin, cfe-commits

Differential Revision: https://reviews.llvm.org/D29660

llvm-svn: 310489
2017-08-09 15:56:54 +00:00
Gheorghe-Teodor Bercea 0846582878 [OpenMP] Add flag for disabling the default generation of relocatable OpenMP target code for NVIDIA GPUs.
Summary: Previously we have added the "-c" flag which gets passed to PTXAS by default to generate relocatable OpenMP target code by default. This set of flags exposes control over this behaviour.

Reviewers: arpith-jacob, caomhin, carlo.bertolli, ABataev, Hahnfeld, jlebar, hfinkel, tstellar

Reviewed By: ABataev

Subscribers: Hahnfeld, rengolin, cfe-commits

Differential Revision: https://reviews.llvm.org/D29659

llvm-svn: 310484
2017-08-09 15:27:39 +00:00
Gheorghe-Teodor Bercea b9d117233f [OpenMP] Make OpenMP generated code for the NVIDIA device relocatable by default
Original Diff: D29642

This patch was previously reverted due to an error with patch D29654
that this depends on.

llvm-svn: 310479
2017-08-09 14:59:35 +00:00
Gheorghe-Teodor Bercea 2c92693280 [OpenMP] OpenMP device offloading code generation produces a cubin file which is then integrated in the host binary using the host linker.
Diff: D29654

llvm-svn: 310362
2017-08-08 14:33:05 +00:00
Alex Lorenz 7e9c478cda Revert r310291, r310300 and r310332 because of test failure on Darwin
The commit r310291 introduced the failure. r310332 was a test fix commit and
r310300 was a followup commit. I reverted these two to avoid merge conflicts
when reverting.

The 'openmp-offload.c' test is failing on Darwin because the following
run lines:
// RUN:   touch %t1.o
// RUN:   touch %t2.o
// RUN:   %clang -### -no-canonical-prefixes -fopenmp=libomp -fopenmp-targets=nvptx64-nvidia-cuda -save-temps -no-canonical-prefixes %t1.o %t2.o 2>&1 \
// RUN:   | FileCheck -check-prefix=CHK-TWOCUBIN %s

trigger the following assertion:

Driver.cpp:3418:
    assert(CachedResults.find(ActionTC) != CachedResults.end() &&
           "Result does not exist??");

llvm-svn: 310345
2017-08-08 11:20:17 +00:00
Gheorghe-Teodor Bercea ceb422236a [OpenMP] Make OpenMP generated code for the NVIDIA device relocatable by default
Summary: When device offloading is enabled and the device is an NVIDIA GPU, OpenMP target regions must be compiled with relocation enabled by passing the "-c" flag to the PTXAS invocation.

Reviewers: arpith-jacob, caomhin, carlo.bertolli, ABataev, Hahnfeld, jlebar, hfinkel, tstellar

Reviewed By: Hahnfeld

Subscribers: Hahnfeld, rengolin, mkuron, cfe-commits

Differential Revision: https://reviews.llvm.org/D29642

llvm-svn: 310300
2017-08-07 20:31:51 +00:00
Gheorghe-Teodor Bercea 53431bc046 [OpenMP] Pass -v to PTXAS if it was passed to the driver.
Summary: When compiling code being offloaded by OpenMP to an NVIDIA GPU, pass the -v to PTXAS if it was passed to the CLANG driver.

Reviewers: arpith-jacob, caomhin, carlo.bertolli, ABataev, jlebar, hfinkel, tstellar

Reviewed By: jlebar

Subscribers: Hahnfeld, rengolin, cfe-commits

Differential Revision: https://reviews.llvm.org/D29644

llvm-svn: 310295
2017-08-07 20:19:23 +00:00
Gheorghe-Teodor Bercea 4cdba82ee0 [OpenMP] Integrate OpenMP target region cubin into host binary
Summary: OpenMP device offloading code generation produces a cubin file which is then integrated in the host binary using the host linker.

Reviewers: arpith-jacob, caomhin, carlo.bertolli, ABataev, Hahnfeld, jlebar, rnk, hfinkel, tstellar

Reviewed By: hfinkel

Subscribers: sfantao, rnk, rengolin, cfe-commits

Differential Revision: https://reviews.llvm.org/D29654

llvm-svn: 310291
2017-08-07 20:01:48 +00:00
Gheorghe-Teodor Bercea 47e0cf378c [OpenMP] Add flag for specifying the target device architecture for OpenMP device offloading
Summary:
OpenMP has the ability to offload target regions to devices which may have different architectures.

A new -fopenmp-target-arch flag is introduced to specify the device architecture.

In this patch I use the new flag to specify the compute capability of the underlying NVIDIA architecture for the OpenMP offloading CUDA tool chain.

Only a host-offloading test is provided since full device offloading capability will only be available when [[ https://reviews.llvm.org/D29654 | D29654 ]] lands.

Reviewers: hfinkel, Hahnfeld, carlo.bertolli, caomhin, ABataev

Reviewed By: hfinkel

Subscribers: guansong, cfe-commits

Tags: #openmp

Differential Revision: https://reviews.llvm.org/D34784

llvm-svn: 310263
2017-08-07 15:39:11 +00:00
Gheorghe-Teodor Bercea f0f29608d0 [OpenMP] Extend CLANG target options with device offloading kind.
Summary: Pass the type of the device offloading when building the tool chain for a particular target architecture. This is required when supporting multiple tool chains that target a single device type. In our particular use case, the OpenMP and CUDA tool chains will use the same ```addClangTargetOptions ``` method. This enables the reuse of common options and ensures control over options only supported by a particular tool chain.

Reviewers: arpith-jacob, caomhin, carlo.bertolli, ABataev, jlebar, hfinkel, tstellar, Hahnfeld

Reviewed By: hfinkel

Subscribers: jgravelle-google, aheejin, rengolin, jfb, dschuff, sbc100, cfe-commits

Differential Revision: https://reviews.llvm.org/D29647

llvm-svn: 307272
2017-07-06 16:22:21 +00:00
David L. Jones f561abab56 [Driver] Consolidate tools and toolchains by target platform. (NFC)
Summary:
(This is a move-only refactoring patch. There are no functionality changes.)

This patch splits apart the Clang driver's tool and toolchain implementation
files. Each target platform toolchain is moved to its own file, along with the
closest-related tools. Each target platform toolchain has separate headers and
implementation files, so the hierarchy of classes is unchanged.

There are some remaining shared free functions, mostly from Tools.cpp. Several
of these move to their own architecture-specific files, similar to r296056. Some
of them are only used by a single target platform; since the tools and
toolchains are now together, some helpers now live in a platform-specific file.
The balance are helpers related to manipulating argument lists, so they are now
in a new file pair, CommonArgs.h and .cpp.

I've tried to cluster the code logically, which is fairly straightforward for
most of the target platforms and shared architectures. I think I've made
reasonable choices for these, as well as the various shared helpers; but of
course, I'm happy to hear feedback in the review.

There are some particular things I don't like about this patch, but haven't been
able to find a better overall solution. The first is the proliferation of files:
there are several files that are tiny because the toolchain is not very
different from its base (usually the Gnu tools/toolchain). I think this is
mostly a reflection of the true complexity, though, so it may not be "fixable"
in any reasonable sense. The second thing I don't like are the includes like
"../Something.h". I've avoided this largely by clustering into the current file
structure. However, a few of these includes remain, and in those cases it
doesn't make sense to me to sink an existing file any deeper.

Reviewers: rsmith, mehdi_amini, compnerd, rnk, javed.absar

Subscribers: emaste, jfb, danalbert, srhines, dschuff, jyknight, nemanjai, nhaehnle, mgorny, cfe-commits

Differential Revision: https://reviews.llvm.org/D30372

llvm-svn: 297250
2017-03-08 01:02:16 +00:00