Commit Graph

73 Commits

Author SHA1 Message Date
Joseph Huber b647f13226 [CUDA][HIP] Fix new driver crashing when using -save-temps in RDC-mode
Previously when using the `clang-offload-packager` we did not pass the
active offloading kinds. Then in Clang when we attempted to detect when
there was host-offloading action that needed to be embedded in the host
we did not find it. This patch adds the active offloading kinds so we
know when there is input to be embedded.

Reviewed By: tra

Differential Revision: https://reviews.llvm.org/D134189
2022-09-19 14:38:44 -05:00
Joseph Huber 2d26ecb1fb [OpenMP] Remove simplified device runtime handling
The old device runtime had a "simplified" version that prevented many of
the runtime features from being initialized. The old device runtime was
deleted in LLVM 14 and is no longer in use. Selectively deactivating
features is now done using specific flags rather than the old technique.
This patch simply removes the extra logic required for handling the old
simple runtime scheme.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D133802
2022-09-14 09:41:50 -05:00
Joseph Huber 47166968db [OpenMP] Deprecate the old driver for OpenMP offloading
Recently OpenMP has transitioned to using the "new" driver which
primarily merges the device and host linking phases into a single
wrapper that handles both at the same time. This replaced a few tools
that were only used for OpenMP offloading, such as the
`clang-offload-wrapper` and `clang-nvlink-wrapper`. The new driver
carries some marked benefits compared to the old driver that is now
being deprecated. Things like device-side LTO, static library
support, and more compatible tooling. As such, we should be able to
completely deprecate the old driver, at least for OpenMP. The old driver
support will still exist for CUDA and HIP, although both of these can
currently be compiled on Linux with `--offload-new-driver` to use the new
method.

Note that this does not deprecate the `clang-offload-bundler`, although
it is unused by OpenMP now, it is still used by the HIP toolchain both
as their device binary format and object format.

When I proposed deprecating this code I heard some vendors voice
concernes about needing to update their code in their fork. They should
be able to just revert this commit if it lands.

Reviewed By: jdoerfert, MaskRay, ye-luo

Differential Revision: https://reviews.llvm.org/D130020
2022-08-26 13:47:09 -05:00
Fangrui Song 5c29ffda90 Revert "[Driver][test] Replace ^//$ with empty string"
This reverts commit 4817b7729a.

It caused some `^/\n` and had some objection about its readability improvement.
2022-06-24 13:52:27 -07:00
Fangrui Song 4817b7729a [Driver][test] Replace ^//$ with empty string
The convention does not add //\n. Having all RUN/CHECK lines separated by //\n
makes editor movement difficult (e.g. { } in Vim).
2022-06-24 11:25:03 -07:00
David Blaikie 4821508d4d Revert "DebugInfo: Fully integrate ctor type homing into 'limited' debug info"
Reverting to simplify some Google-internal rollout issues. Will recommit
in a week or two.

This reverts commit 517bbc64db.
2022-06-24 17:07:47 +00:00
David Blaikie 517bbc64db DebugInfo: Fully integrate ctor type homing into 'limited' debug info
Simplify debug info back to just "limited" or "full" by rolling the ctor
type homing fully into the "limited" debug info.

Also fix a bug I found along the way that was causing ctor type homing
to kick in even when something could be vtable homed (where vtable
homing is stronger/more effective than ctor homing) - fixing at the same
time as it keeps the tests (that were testing only "limited non ctor"
homing and now test ctor homing) passing.
2022-06-23 20:15:00 +00:00
Fangrui Song 980679981f [Driver][test] Remove clang{{.*}} when testing -cc1 command lines
The majority of tests omit testing "clang" for -cc1 command lines. In addition,
some distributions symlink %clang to an executable with a content hash based
filename so clang{{.*}} check won't work.

With this change, we can remove many -no-canonical-prefixes whose purpose was to
make the tests pass on such distributions.
2022-05-02 11:02:19 -07:00
Joseph Huber ae23be84cb [OpenMP] Make the new offloading driver the default
Previously an opt-in flag `-fopenmp-new-driver` was used to enable the
new offloading driver. After passing tests for a few months it should be
sufficiently mature to flip the switch and make it the default. The new
offloading driver is now enabled if there is OpenMP and OpenMP
offloading present and the new `-fno-openmp-new-driver` is not present.

The new offloading driver has three main benefits over the old method:
- Static library support
- Device-side LTO
- Unified clang driver stages

Depends on D122683

Differential Revision: https://reviews.llvm.org/D122831
2022-04-18 15:05:09 -04:00
Joseph Huber 984a0dc386 [OpenMP] Use new offloading binary when embedding offloading images
The previous patch introduced the offloading binary format so we can
store some metada along with the binary image. This patch introduces
using this inside the linker wrapper and Clang instead of the previous
method that embedded the metadata in the section name.

Differential Revision: https://reviews.llvm.org/D122683
2022-04-15 20:35:26 -04:00
Fangrui Song 63fbc77121 [Driver][test] Remove unused/obsoleted REQUIRES: clang-driver
It (introduced by 556d713c70) appears to be
related to the removed dragonegg project. In addition, the feature was a bit
misnamed and may lur users to unnecessarily use it.
2022-04-12 13:29:46 -07:00
Yaxun (Sam) Liu 09a5eae0d1 [clang-offload-bundler] add -input/-output options
Currently, clang-offload-bundler has -inputs and -outputs options that accept
values with comma as the delimiter. This causes issues with file paths
containing commas, which are valid file paths on Linux.

This add two new options -input and -output, which accept one single file,
and allow multiple instances. This allows arbitrary file paths. The old
-inputs and -outputs options will be kept for backward compatibility, but
are not allowed to be used with -input and -output options for simplicity.
In the future, -inputs and -outputs options will be phasing out.

RFC: https://discourse.llvm.org/t/rfc-adding-input-and-output-options-to-clang-offload-bundler/60049

Patch by: Siu Chi Chan

Reviewed by: Yaxun Liu

Differential Revision: https://reviews.llvm.org/D120662
2022-04-05 11:13:01 -04:00
Joseph Huber 9f89769cd7 [Clang] Add offload kind to embedded offload object
This patch adds the offload kind to the embedded section name in
preparation for offloading to different kinda like CUDA or HIP.

Depends on D120288

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D120271
2022-03-14 20:08:27 -04:00
Joseph Huber 034adaf5be [OpenMP] Completely remove old device runtime
This patch completely removes the old OpenMP device runtime. Previously,
the old runtime had the prefix `libomptarget-new-` and the old runtime
was simply called `libomptarget-`. This patch makes the formerly new
runtime the only runtime available. The entire project has been deleted,
and all references to the `libomptarget-new` runtime has been replaced
with `libomptarget-`.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D118934
2022-02-04 15:31:33 -05:00
Joseph Huber da20df2115 Revert "[OpenMP] Don't use bound architecture when checking cache on the host"
This reverts commit 9138d96f8b.
2022-02-03 17:43:10 -05:00
Joseph Huber 9138d96f8b [OpenMP] Don't use bound architecture when checking cache on the host
When we are creating jobs for the new driver we first check the cache to
see if the job was already created as a part of the offloading
toolchain. This would sometimes fail if the bound architecture was set
for the host during offloading. We want to ingore this because it is not
relevant for looking up host actions. Previously it was set on some
machines and would cause the cache lookup to fail.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D118858
2022-02-03 17:17:38 -05:00
Joseph Huber 28c1534136 [OpenMP] Temporarily remove checks to fix failing test on MACOS
Summary:
This patch removes some of the check lines that are problematic on
MACOS. The output on the MAC systems works but should be slightly
different. Because this is simply the output being slightly different
rather than broken functionality the test is being changed.
2022-02-01 09:56:09 -05:00
Joseph Huber b79e2a1ccd [OpenMP] Remove hard-coded triple in new driver test
Summary:
Previously this test used a hard-coded triple value in the check lines
wihch failed on other architectures. This patch changes that to accept
any host triple.
2022-01-31 16:46:51 -05:00
Joseph Huber 12ae095bbb [OpenMP] Embed device files into the host IR
This patch adds support for embedding the device object files into the
host IR to create a fat binary. Each offloading file will be inserted
into a section with the following naming format
`.llvm.offloading.<triple>.<arch>.<filename>`.

Depends on D116542

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D116543
2022-01-31 15:56:02 -05:00
Joseph Huber 2f9ace9e9a [OpenMP] Introduce new flag to change offloading driver pipeline
This patch introduces the `-fopenmp-new-driver` option which instructs
the compiler to use a new driver scheme for producing offloading code.
In this scheme we create a complete offloading object file and then pass
it as input to the host compilation phase. This will allow us to embed
the object code in the backend phase.

This is the start of a series of commits to rework the OpenMP offloading driver
pipeline. The goal of this is to simplify the steps required for creating an
offloading program. This patch changes the driver's configuration to simply pass
the device file back to the host as an input so it can be embedded as an LLVM IR
global during the backend, then simply passes that object file to the linker.

This driver implementation will currently create the following phases,
```
$ clang input.c -fopenmp -fopenmp-targets=nvptx64 -fopenmp-new-driver -ccc-print-phases
               +- 0: input, "input.c", c, (host-openmp)
            +- 1: preprocessor, {0}, cpp-output, (host-openmp)
         +- 2: compiler, {1}, ir, (host-openmp)
         |        |     +- 3: input, "input.c", c, (device-openmp)
         |        |  +- 4: preprocessor, {3}, cpp-output, (device-openmp)
         |        |- 5: compiler, {4}, ir, (device-openmp)
         |     +- 6: offload, "host-openmp (x86_64-unknown-linux-gnu)" {2}, "device-openmp (nvptx64)" {5}, ir
         |  +- 7: backend, {6}, assembler, (device-openmp)
         |- 8: assembler, {7}, object, (device-openmp)
      +- 9: offload, "host-openmp (x86_64-unknown-linux-gnu)" {2}, "device-openmp (nvptx64)" {8}, ir
   +- 10: backend, {9}, assembler, (host-openmp)
+- 11: assembler, {10}, object, (host-openmp)
12: clang-linker-wrapper, {11}, image, (host-openmp)
```

Which will map to the following bindings

```
# "x86_64-unknown-linux-gnu" - "clang", inputs: ["input.c"], output: "/tmp/input-bae62e.bc"
# "nvptx64" - "clang", inputs: ["input.c", "/tmp/input-bae62e.bc"], output: "/tmp/input-76784e.s"
# "nvptx64" - "NVPTX::Assembler", inputs: ["/tmp/input-76784e.s"], output: "/tmp/input-8f29db.o"
# "x86_64-unknown-linux-gnu" - "clang", inputs: ["/tmp/input-bae62e.bc", "/tmp/input-8f29db.o"], output: "/tmp/input-545450.o"
# "x86_64-unknown-linux-gnu" - "Offload::Linker", inputs: ["/tmp/input-545450.o"], output: "a.out"
```

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D116541
2022-01-31 15:55:58 -05:00
Joseph Huber 24f88f57de [OpenMP] Accept shortened triples for -Xopenmp-target=
This patch builds on the change in D117634 that expanded the short
triples when passed in by the user. This patch adds the same
functionality for the `-Xopenmp-target=` flag. Previously it was
unintuitive that passing `-fopenmp-targets=nvptx64
-Xopenmp-target=nvptx64 <arg>` would not forward the arg because the
triples did not match on account of `nvptx64` being expanded to
`nvptx64-nvidia-cuda`.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D118495
2022-01-28 18:22:17 -05:00
Joseph Huber c99407e31c [OpenMP] Make the new device runtime the default
This patch changes the `-fopenmp-target-new-runtime` option which controls if
the new or old device runtime is used to be true by default.  Disabling this to
use the old runtime now requires using `-fno-openmp-target-new-runtime`.

Reviewed By: JonChesterfield, tianshilei1992, gregrodgers, ronlieb

Differential Revision: https://reviews.llvm.org/D114890
2021-12-02 11:11:45 -05:00
Jake Egan 0796869e4e [AIX] Disable unsupported offloading gpu tests
GPUs are not supported on AIX, so this patch sets these tests as unsupported.

Reviewed By: stevewan

Differential Revision: https://reviews.llvm.org/D114381
2021-11-25 15:10:51 -05:00
Jon Chesterfield 2a581710c1 [openmp] No longer use LIBRARY_PATH to find devicertl
Given D109057, change test runner to use the libomptarget-x-bc-path
argument instead of the LIBRARY_PATH environment variable to find the device
library.

Also drop the use of LIBRARY_PATH environment variable as it is far
too easy to pull in the device library from an unrelated toolchain by accident
with the current setup. No loss in flexibility to developers as the clang
commandline used here is still available.

Reviewed By: jdoerfert, tianshilei1992

Differential Revision: https://reviews.llvm.org/D109061
2021-09-09 17:16:41 +01:00
Jon Chesterfield e62f4f172e [openmp] 41c73671d0, this time with staged patch applied 2021-09-08 22:07:47 +01:00
Jon Chesterfield 41c73671d0 [openmp] Re-enable test from D109057, now with windows path aware regex 2021-09-08 21:57:38 +01:00
Jon Chesterfield 06cdf48a0d [openmp] Drop test from D109057, disproportionately difficult to run on windows 2021-09-01 21:51:51 +01:00
Jon Chesterfield c7cbf1a03e [openmp] Accept directory for libomptarget-bc-path
The commandline flag to specify a particular openmp devicertl library
currently errors like:
```
fatal error: cannot open file
      './runtimes/runtimes-bins/openmp/libomptarget':
      Is a directory
```
CommonArgs successfully appends the directory to the commandline args then
mlink-builtin-bitcode rejects it.

This patch is a point fix to that. If --libomptarget-amdgcn-bc-path=directory
then append the expected name for the current architecture and go on as before.
This is useful for test runners that don't hardcode the architecture.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D109057
2021-09-01 21:22:35 +01:00
Jon Chesterfield 6b0636ce53 Revert "[openmp] Accept directory for libomptarget-bc-path"
Windows separator problem. Fixing that broke another regex.
This reverts commit 0173e024fd.
2021-09-01 20:45:41 +01:00
Jon Chesterfield 88511f6bc5 [libomptarget] Drop path separator from test to fix windows build 2021-09-01 20:34:58 +01:00
Jon Chesterfield 0173e024fd [openmp] Accept directory for libomptarget-bc-path
The commandline flag to specify a particular openmp devicertl library
currently errors like:
```
fatal error: cannot open file
      './runtimes/runtimes-bins/openmp/libomptarget':
      Is a directory
```
CommonArgs successfully appends the directory to the commandline args then
mlink-builtin-bitcode rejects it.

This patch is a point fix to that. If --libomptarget-amdgcn-bc-path=directory
then append the expected name for the current architecture and go on as before.
This is useful for test runners that don't hardcode the architecture.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D109057
2021-09-01 19:46:21 +01:00
Aaron Ballman 530ea28fef Correct a lot of diagnostic wordings for the driver
Clang diagnostics should not start with a capital letter or use
trailing punctuation (https://clang.llvm.org/docs/InternalsManual.html#the-format-string),
but quite a few driver diagnostics were not following this advice. This
corrects the grammar and punctuation to improve consistency, but does
not change the circumstances under which the diagnostics are produced.
2021-08-05 07:04:55 -04:00
Amy Huang 1a3bf2953a [DebugInfo] Switch to using constructor homing (-debug-info-kind=constructor) by default when debug info is enabled
Constructor homing reduces the amount of class type info that is emitted
by emitting conmplete type info for a class only when a constructor for
that class is emitted.

This will mainly reduce the amount of duplicate debug info in object
files. In Chrome enabling ctor homing decreased total build directory sizes
by about 30%.

It's also expected that some class types (such as unused classes)
will no longer be emitted in the debug info. This is fine, since we wouldn't
expect to need these types when debugging.

In some cases (e.g. libc++, https://reviews.llvm.org/D98750), classes
are used without calling the constructor. Since this is technically
undefined behavior, enabling constructor homing should be fine.
However Clang now has an attribute
`__attribute__((standalone_debug))` that can be used on classes to
ignore ctor homing.

Bug: https://bugs.llvm.org/show_bug.cgi?id=46537

Differential Revision: https://reviews.llvm.org/D106084
2021-07-26 17:24:42 -07:00
Joseph Huber d297211692 [OpenMP] Add a driver flag to enable the new device runtime library
This patch adds a driver flag `-fopenmp-target-new-runtime` to optionally enable the new device runtime
bitcode library. This allows users to enable the new experimental runtime
before it becomes the default in the future.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D106793
2021-07-26 16:35:56 -04:00
Alex Lorenz 6b938d2ead Recommit "[clang][driver] Use the provided arch name for a Darwin target triple
This ensures that the Darwin driver uses a consistent target triple
representation when the triple is printed out to the user.

This reverts the revert commit ab0df6c034.

Differential Revision: https://reviews.llvm.org/D100807
2021-04-29 15:00:40 -07:00
Shilei Tian c41ae246ac [OpenMP][Clang][NVPTX] Only build one bitcode library for each SM
In D97003, CUDA 9.2 is the minimum requirement for OpenMP offloading on
NVPTX target. We don't need to have macros in source code to select right functions
based on CUDA version. we don't need to compile multiple bitcode libraries of
different CUDA versions for each SM. We don't need to worry about future
compatibility with newer CUDA version.

`-target-feature +ptx61` is used in this patch, which corresponds to the highest
PTX version that CUDA 9.2 can support.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D97198
2021-03-08 12:03:04 -05:00
Pushpinder Singh 99951aa68d OpenMP: Fix object clobbering issue when using save-temps
There are two preconditions to reproduce the issue,
 1. Use -save-temps option
 2. Provide the -o option with name equal to the input file name
    without the file extension. For e.g. clang a.c -o a

With the -o specified, the AssembleJobAction after OffloadWrapperJobAction
will produce the object file with same name as host code object file.
Due to this clash, the OffloadWrapperAction overwrites the initial host
object file, which results in lld error. This also fixes the `multiple definition of __dummy.omp_offloading.entry'` issue in D96769 .

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D97273
2021-02-25 00:50:51 -05:00
Shilei Tian 76151acf89 [Clang][OpenMP] Require CUDA 9.2+ for OpenMP offloading on NVPTX target
In current implementation of `deviceRTLs`, we're using some functions
that are CUDA version dependent (if CUDA_VERSION < 9, it is one; otheriwse, it
is another one). As a result, we have to compile one bitcode library for each
CUDA version supported. A worse problem is forward compatibility. If a new CUDA
version is released, we have to update CMake file as well.

CUDA 9.2 has been released for three years. Instead of using various weird tricks
to make `deviceRTLs` work with different CUDA versions and still have forward
compatibility, we can simply drop support for CUDA 9.1 or lower version. It has at
least two benifits:
- We don't need to generate bitcode libraries for each CUDA version;
- Clang driver doesn't need to search for the bitcode lib based on CUDA version.

We can claim that starting from LLVM 12, OpenMP offloading on NVPTX target requires
CUDA 9.2+.

Reviewed By: jdoerfert, JonChesterfield

Differential Revision: https://reviews.llvm.org/D97003
2021-02-22 11:00:33 -05:00
Shilei Tian 33d660939d [Clang][OpenMP] Update driver test case for OpenMP offload to use sm_35
`sm_35` is the minimum requirement for OpenMP offloading on NVPTX device.
Current driver test case is using `sm_20`. D97003 is going to switch the minimum
CUDA version to 9.2, which only supports `sm_30+`. This patch makes step for the
change.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D97120
2021-02-20 15:14:13 -05:00
Shilei Tian 7c03f7d7d0 [OpenMP][deviceRTLs] Build the deviceRTLs with OpenMP instead of target dependent language
From this patch (plus some landed patches), `deviceRTLs` is taken as a regular OpenMP program with just `declare target` regions. In this way, ideally, `deviceRTLs` can be written in OpenMP directly. No CUDA, no HIP anymore. (Well, AMD is still working on getting it work. For now AMDGCN still uses original way to compile) However, some target specific functions are still required, but they're no longer written in target specific language. For example, CUDA parts have all refined by replacing CUDA intrinsic and builtins with LLVM/Clang/NVVM intrinsics.
Here're a list of changes in this patch.
1. For NVPTX, `DEVICE` is defined empty in order to make the common parts still work with AMDGCN. Later once AMDGCN is also available, we will completely remove `DEVICE` or probably some other macros.
2. Shared variable is implemented with OpenMP allocator, which is defined in `allocator.h`. Again, this feature is not available on AMDGCN, so two macros are redefined properly.
3. CUDA header `cuda.h` is dropped in the source code. In order to deal with code difference in various CUDA versions, we build one bitcode library for each supported CUDA version. For each CUDA version, the highest PTX version it supports will be used, just as what we currently use for CUDA compilation.
4. Correspondingly, compiler driver is also updated to support CUDA version encoded in the name of bitcode library. Now the bitcode library for NVPTX is named as `libomptarget-nvptx-cuda_[cuda_version]-sm_[sm_number].bc`, such as `libomptarget-nvptx-cuda_80-sm_20.bc`.

With this change, there are also multiple features to be expected in the near future:
1. CUDA will be completely dropped when compiling OpenMP. By the time, we also build bitcode libraries for all supported SM, multiplied by all supported CUDA version.
2. Atomic operations used in `deviceRTLs` can be replaced by `omp atomic` if OpenMP 5.1 feature is fully supported. For now, the IR generated is totally wrong.
3. Target specific parts will be wrapped into `declare variant` with `isa` selector if it can work properly. No target specific macro is needed anymore.
4. (Maybe more...)

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D94745
2021-01-26 12:28:47 -05:00
Shilei Tian 5ad038aafa [Clang][OpenMP][NVPTX] Replace `libomptarget-nvptx-path` with `libomptarget-nvptx-bc-path`
D94700 removed the static library so we no longer need to pass
`-llibomptarget-nvptx` to `nvlink`. Since the bitcode library is the only device
runtime for now, instead of emitting a warning when it is not found, an error
should be raised. We also set a new option `libomptarget-nvptx-bc-path` to let
user choose which bitcode library is being used.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D95161
2021-01-23 14:42:38 -05:00
Nico Weber 956034c6c8 [mac/arm] XFAIL two more tests on arm64-apple
Part of PR46644
2020-12-12 15:20:50 -05:00
Amy Huang 394db22595 Revert "Switch to using -debug-info-kind=constructor as default (from =limited)"
This reverts commit 227db86a1b.

Causing debug info errors in google3 LTO builds; also causes a
debuginfo-test failure.
2020-07-28 11:23:59 -07:00
Amy Huang 227db86a1b Switch to using -debug-info-kind=constructor as default (from =limited)
Summary:
-debug-info-kind=constructor reduces the amount of class debug info that
is emitted; this patch switches to using this as the default.

Constructor homing emits the complete type info for a class only when the
constructor is emitted, so it is expected that there will be some classes that
are not defined in the debug info anymore because they are never constructed,
and we shouldn't need debug info for these classes.

I compared the PDB files for clang, and there are 273 class types that are defined with `=limited`
but not with `=constructor` (out of ~60,000 total class types).
We've looked at a number of the types that are no longer defined with =constructor. The vast
majority of cases are something like class A is used as a parameter in a member function of
some other class B, which is emitted. But the function that uses class A is never called, and class A
is never constructed, and therefore isn't emitted in the debug info.

Bug: https://bugs.llvm.org/show_bug.cgi?id=46537

Subscribers: aprantl, cfe-commits, lldb-commits

Tags: #clang, #lldb

Differential Revision: https://reviews.llvm.org/D79147
2020-07-09 15:26:46 -07:00
Saiyedul Islam 602d9b0afc [OpenMP][AMDGCN] Support OpenMP offloading for AMDGCN architecture - Part 1
Summary:
Allow AMDGCN as a GPU offloading target for OpenMP during compiler
invocation and allow setting CUDAMode for it.

Originally authored by Greg Rodgers (@gregrodgers).

Reviewers: ronlieb, yaxunl, b-sumner, scchan, JonChesterfield, jdoerfert, sameerds, msearles, hliao, arsenm

Reviewed By: sameerds

Subscribers: sstefan1, jvesely, wdng, arsenm, guansong, dexonsmith, cfe-commits, llvm-commits, gregrodgers

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D79754
2020-05-27 07:51:27 +00:00
Sergey Dmitriev a0d83768f1 [Clang][OpenMP Offload] Add new tool for wrapping offload device binaries
This patch removes the remaining part of the OpenMP offload linker scripts which was used for inserting device binaries into the output linked binary. Device binaries are now inserted into the host binary with a help of the wrapper bit-code file which contains device binaries as data. Wrapper bit-code file is dynamically created by the clang driver with a help of new tool clang-offload-wrapper which takes device binaries as input and produces bit-code file with required contents. Wrapper bit-code is then compiled to an object and resulting object is appended to the host linking by the clang driver.

This is the second part of the patch for eliminating OpenMP linker script (please see https://reviews.llvm.org/D64943).

Differential Revision: https://reviews.llvm.org/D68166

llvm-svn: 374219
2019-10-09 20:42:58 +00:00
Gheorghe-Teodor Bercea e62c693c8e [OpenMP][Clang] Support for target math functions
Summary:
In this patch we propose a temporary solution to resolving math functions for the NVPTX toolchain, temporary until OpenMP variant is supported by Clang.

We intercept the inclusion of math.h and cmath headers and if we are in the OpenMP-NVPTX case, we re-use CUDA's math function resolution mechanism.

Authors:
@gtbercea
@jdoerfert

Reviewers: hfinkel, caomhin, ABataev, tra

Reviewed By: hfinkel, ABataev, tra

Subscribers: JDevlieghere, mgorny, guansong, cfe-commits, jdoerfert

Tags: #clang

Differential Revision: https://reviews.llvm.org/D61399

llvm-svn: 360265
2019-05-08 15:52:33 +00:00
Jonas Devlieghere fe608c938c Revert "[OpenMP][Clang] Support for target math functions"
This commit appears to be breaking stage-2 builds on GreenDragon. The
OpenMP wrappers for cmath and math.h are copied into the root of the
resource directory and cause a cyclic dependency in module 'Darwin':
Darwin -> std -> Darwin. This blows up when CMake is testing for modules
support and breaks all stage 2 module builds, including the ThinLTO bot
and all LLDB bots.

CMake Error at cmake/modules/HandleLLVMOptions.cmake:497 (message):
  LLVM_ENABLE_MODULES is not supported by this compiler

llvm-svn: 360192
2019-05-07 21:08:15 +00:00
Gheorghe-Teodor Bercea 1e28a668bc [OpenMP][Clang] Support for target math functions
Summary:
In this patch we propose a temporary solution to resolving math functions for the NVPTX toolchain, temporary until OpenMP variant is supported by Clang.

We intercept the inclusion of math.h and cmath headers and if we are in the OpenMP-NVPTX case, we re-use CUDA's math function resolution mechanism.

Authors:
@gtbercea
@jdoerfert

Reviewers: hfinkel, caomhin, ABataev, tra

Reviewed By: hfinkel, ABataev, tra

Subscribers: mgorny, guansong, cfe-commits, jdoerfert

Tags: #clang

Differential Revision: https://reviews.llvm.org/D61399

llvm-svn: 360063
2019-05-06 18:19:15 +00:00
Alexey Bataev 8061acd501 [OPENMP][NVPTX]Use faster teams reduction algorithm.
A faster way to reduce the values in teams reductions was found, the
codegen is updated to use this faster algorithm and new runtime functions.

llvm-svn: 354479
2019-02-20 16:36:22 +00:00