Commit Graph

82 Commits

Author SHA1 Message Date
Yaxun (Sam) Liu 4fd05e0ad7 [HIP] Change to code object v4
Change to code object v4 by default to match ROCm 4.1.

Reviewed by: Artem Belevich

Differential Revision: https://reviews.llvm.org/D99235
2021-04-06 20:22:58 -04:00
Yaxun (Sam) Liu 907af84396 [CUDA][HIP] rename -fcuda-flush-denormals-to-zero
Rename it to -fgpu-flush-denormals-to-zero.

Reviewed by: Artem Belevich

Differential Revision: https://reviews.llvm.org/D99688
2021-04-05 00:13:51 -04:00
Yaxun (Sam) Liu 51ade31e67 [HIP] Support device sanitizer
Add option -fgpu-sanitize to enable sanitizer for AMDGPU target.

Since it is experimental, it is off by default.

Reviewed by: Artem Belevich

Differential Revision: https://reviews.llvm.org/D96835
2021-02-18 23:30:25 -05:00
Yaxun (Sam) Liu 1dab94f9ed [CUDA][HIP] Pass -fgpu-rdc to host clang -cc1
Currently -fgpu-rdc is not passed to host clang -cc1.
This causes issue because -fgpu-rdc affects shadow
variable linkage in host compilation.

Reviewed by: Artem Belevich

Differential Revision: https://reviews.llvm.org/D96105
2021-02-08 19:08:20 -05:00
Yaxun (Sam) Liu 0b81d9a992 [AMDGPU] add -mcode-object-version=n
Add option -mcode-object-version=n to control code object version for
AMDGPU.

Differential Revision: https://reviews.llvm.org/D91310
2020-12-07 18:08:37 -05:00
Yaxun (Sam) Liu 4bed1d9b32 [HIP] fix bundle entry ID for --
Canonicalize triple used in fat binary. Change from
amdgcn-amd-amdhsa to amdgcn-amd-amdhsa-.

This is part of https://reviews.llvm.org/D60620
2020-12-07 18:08:37 -05:00
Michael Liao d8949a8ad3 [hip] Fix host object creation from fatbin
- `__hip_fatbin` should a symbol in `.hip_fatbin` section.

Differential Revision: https://reviews.llvm.org/D92418
2020-12-02 10:36:01 -05:00
Serge Pavlov 70bf35070a [Driver] Add output file to properties of Command
Object of class `Command` contains various properties of a command to
execute, but output file was missed from them. This change adds this
property. It is required for reporting consumed time and memory implemented
in D78903 and may be used in other cases too.

Differential Revision: https://reviews.llvm.org/D78902
2020-10-08 18:23:39 +07:00
Yaxun (Sam) Liu e372c1d762 [HIP] Fix -fgpu-allow-device-init option
The option needs to be passed to both host and device compilation.

Differential Revision: https://reviews.llvm.org/D88550
2020-10-04 22:13:05 -04:00
Yaxun (Sam) Liu 9756a402f2 Recommit "[HIP] Add option --gpu-instrument-lib="
recommit 64f7790e7d after
fixing hip-device-libs.hip.
2020-10-04 21:41:43 -04:00
Yaxun (Sam) Liu fef0ebbc0b Revert "[HIP] Add option --gpu-instrument-lib="
This reverts commit 64f7790e7d due
to regression in hip-device-libs.hip.
2020-10-04 21:27:29 -04:00
Yaxun (Sam) Liu 64f7790e7d [HIP] Add option --gpu-instrument-lib=
Add an option --gpu-instrument-lib= to allow users to specify
an instrument device library. This is for supporting -finstrument
in device code for debugging/profiling tools.

Differential Revision: https://reviews.llvm.org/D88557
2020-10-04 21:16:36 -04:00
Yaxun (Sam) Liu 2cd75f738e Diagnose invalid target ID for AMDGPU toolchain for assembler
AMDGPU toolchain currently only diagnose invalid target ID for OpenCL
source compilation. Invalid target ID is not diagnosed for assembler.

This patch fixes that.

Differential Revision: https://reviews.llvm.org/D88377
2020-10-02 19:38:02 -04:00
Yaxun (Sam) Liu dc6a0b0ec7 [HIP] Align device binary
To facilitate faster loading of device binaries and share them among processes,
HIP runtime favors their alignment being 4096 bytes. HIP runtime can load
unaligned device binaries, however, aligning them at 4096 bytes results in
faster loading and less shared memory usage.

This patch adds an option -bundle-align to clang-offload-bundler which allows
bundles to be aligned at specified alignment. By default it is 1, which is NFC
compared to existing format.

This patch then aligns embedded fat binary and device binary inside fat binary
at 4096 bytes.

It has been verified this change does not cause significant overall file size increase
for typical HIP applications (less than 1%).

Differential Revision: https://reviews.llvm.org/D88734
2020-10-02 18:10:44 -04:00
Yaxun (Sam) Liu 10eb3bf2d4 Skip -fPIE for AMDGPU and HIP toolchain
AMDGPU toolchain does not support -fPIE, therefore skip it if specified by driver.

Differential Revision: https://reviews.llvm.org/D88425
2020-09-28 22:03:18 -04:00
Reid Kleckner 3453b6928d Revert "Recommit "[CUDA][HIP] Defer overloading resolution diagnostics for host device functions""
This reverts commit e39da8ab6a.

This depends on a change that needs additional design review and needs
to be reverted.
2020-09-24 11:16:54 -07:00
Yaxun (Sam) Liu e39da8ab6a Recommit "[CUDA][HIP] Defer overloading resolution diagnostics for host device functions"
This recommits 7f1f89ec8d and
40df06cdaf after fixing memory
sanitizer failure.
2020-09-24 08:44:37 -04:00
Yaxun (Sam) Liu 772bd8a7d9 Revert "[CUDA][HIP] Defer overloading resolution diagnostics for host device functions"
This reverts commit 7f1f89ec8d.

This reverts commit 40df06cdaf.
2020-09-17 13:55:31 -04:00
Yaxun (Sam) Liu 40df06cdaf [CUDA][HIP] Defer overloading resolution diagnostics for host device functions
In CUDA/HIP a function may become implicit host device function by
pragma or constexpr. A host device function is checked in both
host and device compilation. However it may be emitted only
on host or device side, therefore the diagnostics should be
deferred until it is known to be emitted.

Currently clang is only able to defer certain diagnostics. This causes
false alarms and limits the usefulness of host device functions.

This patch lets clang defer all overloading resolution diagnostics for host device functions.

An option -fgpu-defer-diag is added to control this behavior. By default
it is off.

It is NFC for other languages.

Differential Revision: https://reviews.llvm.org/D84364
2020-09-17 11:30:42 -04:00
Yaxun (Sam) Liu ccb4124a41 Fix -gz=zlib options for linker
gcc translates -gz=zlib to --compress-debug-options=zlib for both assembler and linker
but clang only does this for assembler.

The linker needs --compress-debug-options=zlib option to compress the debug sections
in the generated executable or shared library.

Due to this bug, -gz=zlib has no effect on the generated executable or shared library.

This patch fixes that.

Differential Revision: https://reviews.llvm.org/D87321
2020-09-11 17:12:58 -04:00
Yaxun (Sam) Liu 7546b29e76 [HIP] Support target id by --offload-arch
This patch introduces support of target id by
-offload-arch.

Differential Revision: https://reviews.llvm.org/D60620
2020-08-18 23:43:53 -04:00
Yaxun (Sam) Liu 5d2c3e031a Fix regression due to test hip-version.hip
Added RocmInstallationDetector to Darwin and MinGW.

Fixed duplicate ROCm detector in ROCm toolchain.
2020-07-11 12:45:29 -04:00
Yaxun (Sam) Liu 849d4405f5 [HIP] Fix rocm detection
Do not detect device library by default in rocm detector.
Only detect device library in Rocm and HIP toolchain.

Separate detection of HIP runtime and Rocm device library.

Detect rocm path by version file in host toolchains.

Also added detecting rocm version and printing rocm
installation path and version with -v.

Fixed include path and device library detection for
ROCm 3.5.

Added --hip-version option. Renamed --hip-device-lib-path
to --rocm-device-lib-path.

Fixed default value for -fhip-new-launch-api.

Added default -std option for HIP.

Differential Revision: https://reviews.llvm.org/D82930
2020-07-10 23:20:15 -04:00
Aaron En Ye Shi c64bb3f736 [HIP] Use default triple in llvm-mc for system ld
The Ubuntu system ld does not recognize the amdgcn-amd-amdhsa target.
Instead the host object with embedded device fat binary should not be
assembled by that triple. It should use default triple, so that the
object is compatible with system ld.

Reviewed By: yaxunl

Differential Revision: https://reviews.llvm.org/D83145
2020-07-07 16:44:51 +00:00
James Y Knight 4772b99dff Clang Driver: refactor support for writing response files to be
specified at Command creation, rather than as part of the Tool.

This resolves the hack I just added to allow Darwin toolchain to vary
its level of support based on `-mlinker-version=`.

The change preserves the _current_ settings for response-file support.
Some tools look likely to be declaring that they don't support
response files in error, however I kept them as-is in order for this
change to be a simple refactoring.

Differential Revision: https://reviews.llvm.org/D82782
2020-06-29 18:27:02 -04:00
Yaxun (Sam) Liu 8013ce4490 [HIP] Add missing options for lto
Add -mcpu, -mattr, -mllvm, and -save-temps options for lto when necessary.

Differential Revision: https://reviews.llvm.org/D82506
2020-06-26 00:26:05 -04:00
Aaron En Ye Shi 77df5a8283 [HIP] Move HIP Linking Logic into HIP ToolChain
This patch is a follow up on https://reviews.llvm.org/D78759.

Extract the HIP Linker script from generic GNU linker,
and move it into HIP ToolChain. Update OffloadActionBuilder
Link actions feature to apply device linking and host linking
actions separately. Using MC Directives, embed the device images
and define symbols.

Reviewers: JonChesterfield, yaxunl

Subscribers: tra, echristo, jdoerfert, msearles, scchan

Differential Revision: https://reviews.llvm.org/D81963
2020-06-22 19:48:48 +00:00
Yaxun (Sam) Liu c830d517b4 [HIP] Enable -amdgpu-internalize-symbols
Enable -amdgpu-internalize-symbols to eliminate unused functions and global variables
for whole program to speed up compilation and improve performance.

For -fno-gpu-rdc, -amdgpu-internalize-symbols is passed to clang -cc1.

For -fgpu-rdc, -amdgpu-internalize-symbols is passed to lld.

Differential Revision: https://reviews.llvm.org/D81959
2020-06-18 16:34:37 -04:00
Yaxun (Sam) Liu 6752786d65 [HIP] Do not use llvm-link/opt/llc for -fgpu-rdc
This patch is a follow up on https://reviews.llvm.org/D81627.

In addition to default -fno-gpu-rdc case, this patches let
HIP toolchain not use llvm-link/opt/llc to link device code
for -fgpu-rdc case. Instead, uses standard lto.

This will eliminate some redundant optimizations and speed
up the compilation/linking.

Differential Revision: https://reviews.llvm.org/D81861
2020-06-15 21:09:18 -04:00
Yaxun (Sam) Liu e8090d83fd [HIP] Do not call opt/llc for -fno-gpu-rdc
Currently HIP toolchain calls clang to emit bitcode then calls opt/llc for device compilation for the default -fno-gpu-rdc
case, which is unnecessary since clang is able to compile a single source file to ISA.

This patch fixes the HIP action builder and toolchain so that the default -fno-gpu-rdc can be done like a canonical
toolchain, i.e. one clang -cc1 invocation to compile source code to ISA.

This can avoid unnecessary processes to speed up the compilation, and avoid redundant LLVM passes which are
performed in clang -cc1 and opt.

Differential Revision: https://reviews.llvm.org/D81627
2020-06-15 18:55:01 -04:00
Yaxun (Sam) Liu 8422bc9efc recommit "[HIP] Add default header and include path"
recommit 11d06b9511 with
fix for lit tests.
2020-06-06 14:21:22 -04:00
Nico Weber 2920348063 Revert "recommit "[HIP] Add default header and include path""
This reverts commit 1fa43e0b34.
Still breaks tests on several bots, see https://reviews.llvm.org/D81176
2020-06-05 21:50:04 -04:00
Yaxun (Sam) Liu 1fa43e0b34 recommit "[HIP] Add default header and include path"
recommit 11d06b9511 with
fix for lit tests.
2020-06-05 20:41:15 -04:00
Yaxun (Sam) Liu 8a8c6913a9 Revert "[HIP] Add default header and include path"
This reverts commit 11d06b9511.
2020-06-05 15:42:57 -04:00
Yaxun (Sam) Liu 11d06b9511 [HIP] Add default header and include path
To support std::complex and some other standard C/C++ functions in HIP device code,
they need to be forced to be __host__ __device__ functions by pragmas. This is done
by some clang standard C++ wrapper headers which are shared between cuda-clang and hip-Clang.

For these standard C++ wapper headers to work properly, specific include path order
has to be enforced:

  clang C++ wrapper include path
  standard C++ include path
  clang include path

Also, these C++ wrapper headers require device version of some standard C/C++ functions
must be declared before including them. This needs to be done by including a default
header which declares or defines these device functions. The default header is always
included before any other headers are included by users.

This patch adds the the default header and include path for HIP.

Differential Revision: https://reviews.llvm.org/D81176
2020-06-05 12:44:57 -04:00
Matt Arsenault 14e1845711 HIP: Merge builtin library handling
Merge with the new --rocm-path handling used for OpenCL. This looks
for a usable set of device libraries upfront, rather than giving a
generic "no such file or directory error". If any of the required
bitcode libraries are missing, this will now produce a "cannot find
ROCm installation." error. This differs from the existing hip specific
flags by pointing to a rocm root install instead of a single directory
with bitcode files.

This tries to maintain compatibility with the existing the
--hip-device-lib and --hip-device-lib-path flags, as well as the
HIP_DEVICE_LIB_PATH environment variable, or at least the range of
uses with testcases. The existing range of uses and behavior doesn't
entirely make sense to me, so some of the untested edge cases change
behavior. Currently the two path forms seem to have the double purpose
of a search path for an arbitrary --hip-device-lib, and for finding
the stock set of libraries. Since the stock set of libraries This also
changes the behavior when multiple paths are specified, and only takes
the last one (and the environment variable only handles a single
path).

If --hip-device-lib is used, it now only treats --hip-device-lib-path
as the search path for it, and does not attempt to find the rocm
installation. If not, --hip-device-lib-path and the environment
variable are used as the directory to search instead of the rocm root
based path.

This should also automatically fix handling of the options to use
wave64.
2020-05-12 09:50:22 -04:00
Matt Arsenault 4593e4131a AMDGPU: Teach toolchain to link rocm device libs
Currently the library is separately linked, but this isn't correct to
implement fast math flags correctly. Each module should get the
version of the library appropriate for its combination of fast math
and related flags, with the attributes propagated into its functions
and internalized.

HIP already maintains the list of libraries, but this is not used for
OpenCL. Unfortunately, HIP uses a separate --hip-device-lib argument,
despite both languages using the same bitcode library. Eventually
these two searches need to be merged.

An additional problem is there are 3 different locations the libraries
are installed, depending on which build is used. This also needs to be
consolidated (or at least the search logic needs to deal with this
unnecessary complexity).
2020-04-10 13:37:32 -04:00
Michael Liao c97be2c377 [hip] Remove `hip_pinned_shadow`.
Summary:
- Use `device_builtin_surface` and `device_builtin_texture` for
  surface/texture reference support. So far, both the host and device
  use the same reference type, which could be revised later when
  interface/implementation is stablized.

Reviewers: yaxunl

Subscribers: cfe-commits

Tags: #clang

Differential Revision: https://reviews.llvm.org/D77583
2020-04-07 09:51:49 -04:00
Matt Arsenault 4ea3650c21 HIP: Link correct denormal mode library
This wasn't respecting the flush mode based on the default, and also
wasn't correctly handling the explicit
-fno-cuda-flush-denormals-to-zero overriding the mode.
2020-04-01 12:36:22 -04:00
Matt Arsenault 175e42303b AMDGPU: Make HIPToolChain a subclass of AMDGPUToolChain
This fixes some code duplication. This is also a step towards
consolidating builtin library handling.
2020-03-31 18:22:46 -04:00
Matt Arsenault c9d65a48af HIP: Ensure new denormal mode attributes are set
Apparently HIPToolChain does not subclass from AMDGPUToolChain, so
this was not applying the new denormal attributes. I'm not sure why
this doesn't subclass. Just copy the implementation for now.
2020-03-31 18:00:37 -04:00
Yaxun (Sam) Liu 2ae25647d1 [CUDA][HIP] Add -Xarch_device and -Xarch_host options
The argument after -Xarch_device will be added to the arguments for CUDA/HIP
device compilation and will be removed for host compilation.

The argument after -Xarch_host will be added to the arguments for CUDA/HIP
host compilation and will be removed for device compilation.

Differential Revision: https://reviews.llvm.org/D76520
2020-03-24 10:13:05 -04:00
Yaxun (Sam) Liu 78957bab55 [NFC] Refactor handling of Xarch option
Extract common code to a function. To prepare for
adding an option for CUDA/HIP host and device only
option.

Differential Revision: https://reviews.llvm.org/D76455
2020-03-22 14:42:09 -04:00
Benjamin Kramer adcd026838 Make llvm::StringRef to std::string conversions explicit.
This is how it should've been and brings it more in line with
std::string_view. There should be no functional change here.

This is mostly mechanical from a custom clang-tidy check, with a lot of
manual fixups. It uncovers a lot of minor inefficiencies.

This doesn't actually modify StringRef yet, I'll do that in a follow-up.
2020-01-28 23:25:25 +01:00
Yaxun (Sam) Liu b7e415f37f [HIP] Fix environment variable HIP_DEVICE_LIB_PATH
Currently device lib path set by environment variable HIP_DEVICE_LIB_PATH
does not work due to extra "-L" added to each entry.

This patch fixes that by allowing argument name to be empty in addDirectoryList.

Differential Revision: https://reviews.llvm.org/D73299
2020-01-28 11:27:01 -05:00
Michael Liao 49f7bc9e1e [hip] Remove `-Werror=format-nonliteral`
Summary:
- It won't distinguish host and device code and trigger compilation
  failure on irrelevant code.

Reviewers: sameerds, yaxunl

Subscribers: cfe-commits

Tags: #clang

Differential Revision: https://reviews.llvm.org/D73224
2020-01-23 11:02:11 -05:00
Holger Wünsche 24d7a0935b [HIP] use GetProgramPath for executable discovery
This change replaces the manual building of executable paths
using llvm::sys::path::append with GetProgramPath.
This enables adding other paths in case executables reside
in different directories and makes the code easier to read.

Differential Revision: https://reviews.llvm.org/D72903
2020-01-21 09:41:30 -08:00
Matt Arsenault a4451d88ee Consolidate internal denormal flushing controls
Currently there are 4 different mechanisms for controlling denormal
flushing behavior, and about as many equivalent frontend controls.

- AMDGPU uses the fp32-denormals and fp64-f16-denormals subtarget features
- NVPTX uses the nvptx-f32ftz attribute
- ARM directly uses the denormal-fp-math attribute
- Other targets indirectly use denormal-fp-math in one DAGCombine
- cl-denorms-are-zero has a corresponding denorms-are-zero attribute

AMDGPU wants a distinct control for f32 flushing from f16/f64, and as
far as I can tell the same is true for NVPTX (based on the attribute
name).

Work on consolidating these into the denormal-fp-math attribute, and a
new type specific denormal-fp-math-f32 variant. Only ARM seems to
support the two different flush modes, so this is overkill for the
other use cases. Ideally we would error on the unsupported
positive-zero mode on other targets from somewhere.

Move the logic for selecting the flush mode into the compiler driver,
instead of handling it in cc1. denormal-fp-math/denormal-fp-math-f32
are now both cc1 flags, but denormal-fp-math-f32 is not yet exposed as
a user flag.

-cl-denorms-are-zero, -fcuda-flush-denormals-to-zero and
-fno-cuda-flush-denormals-to-zero will be mapped to
-fp-denormal-math-f32=ieee or preserve-sign rather than the old
attributes.

Stop emitting the denorms-are-zero attribute for the OpenCL flag. It
has no in-tree users. The meaning would also be target dependent, such
as the AMDGPU choice to treat this as only meaning allow flushing of
f32 and not f16 or f64. The naming is also potentially confusing,
since DAZ in other contexts refers to instructions implicitly treating
input denormals as zero, not necessarily flushing output denormals to
zero.

This also does not attempt to change the behavior for the current
attribute. The LangRef now states that the default is ieee behavior,
but this is inaccurate for the current implementation. The clang
handling is slightly hacky to avoid touching the existing
denormal-fp-math uses. Fixing this will be left for a future patch.

AMDGPU is still using the subtarget feature to control the denormal
mode, but the new attribute are now emitted. A future change will
switch this and remove the subtarget features.
2020-01-17 20:09:53 -05:00
Sameer Sahasrabuddhe ed181efa17 [HIP][AMDGPU] expand printf when compiling HIP to AMDGPU
Summary:
This change implements the expansion in two parts:
- Add a utility function emitAMDGPUPrintfCall() in LLVM.
- Invoke the above function from Clang CodeGen, when processing a HIP
  program for the AMDGPU target.

The printf expansion has undefined behaviour if the format string is
not a compile-time constant. As a sufficient condition, the HIP
ToolChain now emits -Werror=format-nonliteral.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D71365
2020-01-16 15:15:38 +05:30
Yaxun (Sam) Liu 9f2d8b5c0c [HIP] Add option --gpu-max-threads-per-block=n
Add this option to change the default launch bounds.

Differential Revision: https://reviews.llvm.org/D71221
2020-01-07 11:18:00 -05:00