llvm-project

Commit Graph

Author	SHA1	Message	Date
Yaxun (Sam) Liu	4fd05e0ad7	[HIP] Change to code object v4 Change to code object v4 by default to match ROCm 4.1. Reviewed by: Artem Belevich Differential Revision: https://reviews.llvm.org/D99235	2021-04-06 20:22:58 -04:00
Yaxun (Sam) Liu	907af84396	[CUDA][HIP] rename -fcuda-flush-denormals-to-zero Rename it to -fgpu-flush-denormals-to-zero. Reviewed by: Artem Belevich Differential Revision: https://reviews.llvm.org/D99688	2021-04-05 00:13:51 -04:00
Yaxun (Sam) Liu	51ade31e67	[HIP] Support device sanitizer Add option -fgpu-sanitize to enable sanitizer for AMDGPU target. Since it is experimental, it is off by default. Reviewed by: Artem Belevich Differential Revision: https://reviews.llvm.org/D96835	2021-02-18 23:30:25 -05:00
Yaxun (Sam) Liu	1dab94f9ed	[CUDA][HIP] Pass -fgpu-rdc to host clang -cc1 Currently -fgpu-rdc is not passed to host clang -cc1. This causes issue because -fgpu-rdc affects shadow variable linkage in host compilation. Reviewed by: Artem Belevich Differential Revision: https://reviews.llvm.org/D96105	2021-02-08 19:08:20 -05:00
Yaxun (Sam) Liu	0b81d9a992	[AMDGPU] add -mcode-object-version=n Add option -mcode-object-version=n to control code object version for AMDGPU. Differential Revision: https://reviews.llvm.org/D91310	2020-12-07 18:08:37 -05:00
Yaxun (Sam) Liu	4bed1d9b32	[HIP] fix bundle entry ID for -- Canonicalize triple used in fat binary. Change from amdgcn-amd-amdhsa to amdgcn-amd-amdhsa-. This is part of https://reviews.llvm.org/D60620	2020-12-07 18:08:37 -05:00
Michael Liao	d8949a8ad3	[hip] Fix host object creation from fatbin - `__hip_fatbin` should a symbol in `.hip_fatbin` section. Differential Revision: https://reviews.llvm.org/D92418	2020-12-02 10:36:01 -05:00
Serge Pavlov	70bf35070a	[Driver] Add output file to properties of Command Object of class `Command` contains various properties of a command to execute, but output file was missed from them. This change adds this property. It is required for reporting consumed time and memory implemented in D78903 and may be used in other cases too. Differential Revision: https://reviews.llvm.org/D78902	2020-10-08 18:23:39 +07:00
Yaxun (Sam) Liu	e372c1d762	[HIP] Fix -fgpu-allow-device-init option The option needs to be passed to both host and device compilation. Differential Revision: https://reviews.llvm.org/D88550	2020-10-04 22:13:05 -04:00
Yaxun (Sam) Liu	9756a402f2	Recommit "[HIP] Add option --gpu-instrument-lib=" recommit `64f7790e7d` after fixing hip-device-libs.hip.	2020-10-04 21:41:43 -04:00
Yaxun (Sam) Liu	fef0ebbc0b	Revert "[HIP] Add option --gpu-instrument-lib=" This reverts commit `64f7790e7d` due to regression in hip-device-libs.hip.	2020-10-04 21:27:29 -04:00
Yaxun (Sam) Liu	64f7790e7d	[HIP] Add option --gpu-instrument-lib= Add an option --gpu-instrument-lib= to allow users to specify an instrument device library. This is for supporting -finstrument in device code for debugging/profiling tools. Differential Revision: https://reviews.llvm.org/D88557	2020-10-04 21:16:36 -04:00
Yaxun (Sam) Liu	2cd75f738e	Diagnose invalid target ID for AMDGPU toolchain for assembler AMDGPU toolchain currently only diagnose invalid target ID for OpenCL source compilation. Invalid target ID is not diagnosed for assembler. This patch fixes that. Differential Revision: https://reviews.llvm.org/D88377	2020-10-02 19:38:02 -04:00
Yaxun (Sam) Liu	dc6a0b0ec7	[HIP] Align device binary To facilitate faster loading of device binaries and share them among processes, HIP runtime favors their alignment being 4096 bytes. HIP runtime can load unaligned device binaries, however, aligning them at 4096 bytes results in faster loading and less shared memory usage. This patch adds an option -bundle-align to clang-offload-bundler which allows bundles to be aligned at specified alignment. By default it is 1, which is NFC compared to existing format. This patch then aligns embedded fat binary and device binary inside fat binary at 4096 bytes. It has been verified this change does not cause significant overall file size increase for typical HIP applications (less than 1%). Differential Revision: https://reviews.llvm.org/D88734	2020-10-02 18:10:44 -04:00
Yaxun (Sam) Liu	10eb3bf2d4	Skip -fPIE for AMDGPU and HIP toolchain AMDGPU toolchain does not support -fPIE, therefore skip it if specified by driver. Differential Revision: https://reviews.llvm.org/D88425	2020-09-28 22:03:18 -04:00
Reid Kleckner	3453b6928d	Revert "Recommit "[CUDA][HIP] Defer overloading resolution diagnostics for host device functions"" This reverts commit `e39da8ab6a`. This depends on a change that needs additional design review and needs to be reverted.	2020-09-24 11:16:54 -07:00
Yaxun (Sam) Liu	e39da8ab6a	Recommit "[CUDA][HIP] Defer overloading resolution diagnostics for host device functions" This recommits `7f1f89ec8d` and `40df06cdaf` after fixing memory sanitizer failure.	2020-09-24 08:44:37 -04:00
Yaxun (Sam) Liu	772bd8a7d9	Revert "[CUDA][HIP] Defer overloading resolution diagnostics for host device functions" This reverts commit `7f1f89ec8d`. This reverts commit `40df06cdaf`.	2020-09-17 13:55:31 -04:00
Yaxun (Sam) Liu	40df06cdaf	[CUDA][HIP] Defer overloading resolution diagnostics for host device functions In CUDA/HIP a function may become implicit host device function by pragma or constexpr. A host device function is checked in both host and device compilation. However it may be emitted only on host or device side, therefore the diagnostics should be deferred until it is known to be emitted. Currently clang is only able to defer certain diagnostics. This causes false alarms and limits the usefulness of host device functions. This patch lets clang defer all overloading resolution diagnostics for host device functions. An option -fgpu-defer-diag is added to control this behavior. By default it is off. It is NFC for other languages. Differential Revision: https://reviews.llvm.org/D84364	2020-09-17 11:30:42 -04:00
Yaxun (Sam) Liu	ccb4124a41	Fix -gz=zlib options for linker gcc translates -gz=zlib to --compress-debug-options=zlib for both assembler and linker but clang only does this for assembler. The linker needs --compress-debug-options=zlib option to compress the debug sections in the generated executable or shared library. Due to this bug, -gz=zlib has no effect on the generated executable or shared library. This patch fixes that. Differential Revision: https://reviews.llvm.org/D87321	2020-09-11 17:12:58 -04:00
Yaxun (Sam) Liu	7546b29e76	[HIP] Support target id by --offload-arch This patch introduces support of target id by -offload-arch. Differential Revision: https://reviews.llvm.org/D60620	2020-08-18 23:43:53 -04:00
Yaxun (Sam) Liu	5d2c3e031a	Fix regression due to test hip-version.hip Added RocmInstallationDetector to Darwin and MinGW. Fixed duplicate ROCm detector in ROCm toolchain.	2020-07-11 12:45:29 -04:00
Yaxun (Sam) Liu	849d4405f5	[HIP] Fix rocm detection Do not detect device library by default in rocm detector. Only detect device library in Rocm and HIP toolchain. Separate detection of HIP runtime and Rocm device library. Detect rocm path by version file in host toolchains. Also added detecting rocm version and printing rocm installation path and version with -v. Fixed include path and device library detection for ROCm 3.5. Added --hip-version option. Renamed --hip-device-lib-path to --rocm-device-lib-path. Fixed default value for -fhip-new-launch-api. Added default -std option for HIP. Differential Revision: https://reviews.llvm.org/D82930	2020-07-10 23:20:15 -04:00
Aaron En Ye Shi	c64bb3f736	[HIP] Use default triple in llvm-mc for system ld The Ubuntu system ld does not recognize the amdgcn-amd-amdhsa target. Instead the host object with embedded device fat binary should not be assembled by that triple. It should use default triple, so that the object is compatible with system ld. Reviewed By: yaxunl Differential Revision: https://reviews.llvm.org/D83145	2020-07-07 16:44:51 +00:00
James Y Knight	4772b99dff	Clang Driver: refactor support for writing response files to be specified at Command creation, rather than as part of the Tool. This resolves the hack I just added to allow Darwin toolchain to vary its level of support based on `-mlinker-version=`. The change preserves the _current_ settings for response-file support. Some tools look likely to be declaring that they don't support response files in error, however I kept them as-is in order for this change to be a simple refactoring. Differential Revision: https://reviews.llvm.org/D82782	2020-06-29 18:27:02 -04:00
Yaxun (Sam) Liu	8013ce4490	[HIP] Add missing options for lto Add -mcpu, -mattr, -mllvm, and -save-temps options for lto when necessary. Differential Revision: https://reviews.llvm.org/D82506	2020-06-26 00:26:05 -04:00
Aaron En Ye Shi	77df5a8283	[HIP] Move HIP Linking Logic into HIP ToolChain This patch is a follow up on https://reviews.llvm.org/D78759. Extract the HIP Linker script from generic GNU linker, and move it into HIP ToolChain. Update OffloadActionBuilder Link actions feature to apply device linking and host linking actions separately. Using MC Directives, embed the device images and define symbols. Reviewers: JonChesterfield, yaxunl Subscribers: tra, echristo, jdoerfert, msearles, scchan Differential Revision: https://reviews.llvm.org/D81963	2020-06-22 19:48:48 +00:00
Yaxun (Sam) Liu	c830d517b4	[HIP] Enable -amdgpu-internalize-symbols Enable -amdgpu-internalize-symbols to eliminate unused functions and global variables for whole program to speed up compilation and improve performance. For -fno-gpu-rdc, -amdgpu-internalize-symbols is passed to clang -cc1. For -fgpu-rdc, -amdgpu-internalize-symbols is passed to lld. Differential Revision: https://reviews.llvm.org/D81959	2020-06-18 16:34:37 -04:00
Yaxun (Sam) Liu	6752786d65	[HIP] Do not use llvm-link/opt/llc for -fgpu-rdc This patch is a follow up on https://reviews.llvm.org/D81627. In addition to default -fno-gpu-rdc case, this patches let HIP toolchain not use llvm-link/opt/llc to link device code for -fgpu-rdc case. Instead, uses standard lto. This will eliminate some redundant optimizations and speed up the compilation/linking. Differential Revision: https://reviews.llvm.org/D81861	2020-06-15 21:09:18 -04:00
Yaxun (Sam) Liu	e8090d83fd	[HIP] Do not call opt/llc for -fno-gpu-rdc Currently HIP toolchain calls clang to emit bitcode then calls opt/llc for device compilation for the default -fno-gpu-rdc case, which is unnecessary since clang is able to compile a single source file to ISA. This patch fixes the HIP action builder and toolchain so that the default -fno-gpu-rdc can be done like a canonical toolchain, i.e. one clang -cc1 invocation to compile source code to ISA. This can avoid unnecessary processes to speed up the compilation, and avoid redundant LLVM passes which are performed in clang -cc1 and opt. Differential Revision: https://reviews.llvm.org/D81627	2020-06-15 18:55:01 -04:00
Yaxun (Sam) Liu	8422bc9efc	recommit "[HIP] Add default header and include path" recommit `11d06b9511` with fix for lit tests.	2020-06-06 14:21:22 -04:00
Nico Weber	2920348063	Revert "recommit "[HIP] Add default header and include path"" This reverts commit `1fa43e0b34`. Still breaks tests on several bots, see https://reviews.llvm.org/D81176	2020-06-05 21:50:04 -04:00
Yaxun (Sam) Liu	1fa43e0b34	recommit "[HIP] Add default header and include path" recommit `11d06b9511` with fix for lit tests.	2020-06-05 20:41:15 -04:00
Yaxun (Sam) Liu	8a8c6913a9	Revert "[HIP] Add default header and include path" This reverts commit `11d06b9511`.	2020-06-05 15:42:57 -04:00
Yaxun (Sam) Liu	11d06b9511	[HIP] Add default header and include path To support std::complex and some other standard C/C++ functions in HIP device code, they need to be forced to be __host__ __device__ functions by pragmas. This is done by some clang standard C++ wrapper headers which are shared between cuda-clang and hip-Clang. For these standard C++ wapper headers to work properly, specific include path order has to be enforced: clang C++ wrapper include path standard C++ include path clang include path Also, these C++ wrapper headers require device version of some standard C/C++ functions must be declared before including them. This needs to be done by including a default header which declares or defines these device functions. The default header is always included before any other headers are included by users. This patch adds the the default header and include path for HIP. Differential Revision: https://reviews.llvm.org/D81176	2020-06-05 12:44:57 -04:00
Matt Arsenault	14e1845711	HIP: Merge builtin library handling Merge with the new --rocm-path handling used for OpenCL. This looks for a usable set of device libraries upfront, rather than giving a generic "no such file or directory error". If any of the required bitcode libraries are missing, this will now produce a "cannot find ROCm installation." error. This differs from the existing hip specific flags by pointing to a rocm root install instead of a single directory with bitcode files. This tries to maintain compatibility with the existing the --hip-device-lib and --hip-device-lib-path flags, as well as the HIP_DEVICE_LIB_PATH environment variable, or at least the range of uses with testcases. The existing range of uses and behavior doesn't entirely make sense to me, so some of the untested edge cases change behavior. Currently the two path forms seem to have the double purpose of a search path for an arbitrary --hip-device-lib, and for finding the stock set of libraries. Since the stock set of libraries This also changes the behavior when multiple paths are specified, and only takes the last one (and the environment variable only handles a single path). If --hip-device-lib is used, it now only treats --hip-device-lib-path as the search path for it, and does not attempt to find the rocm installation. If not, --hip-device-lib-path and the environment variable are used as the directory to search instead of the rocm root based path. This should also automatically fix handling of the options to use wave64.	2020-05-12 09:50:22 -04:00
Matt Arsenault	4593e4131a	AMDGPU: Teach toolchain to link rocm device libs Currently the library is separately linked, but this isn't correct to implement fast math flags correctly. Each module should get the version of the library appropriate for its combination of fast math and related flags, with the attributes propagated into its functions and internalized. HIP already maintains the list of libraries, but this is not used for OpenCL. Unfortunately, HIP uses a separate --hip-device-lib argument, despite both languages using the same bitcode library. Eventually these two searches need to be merged. An additional problem is there are 3 different locations the libraries are installed, depending on which build is used. This also needs to be consolidated (or at least the search logic needs to deal with this unnecessary complexity).	2020-04-10 13:37:32 -04:00
Michael Liao	c97be2c377	[hip] Remove `hip_pinned_shadow`. Summary: - Use `device_builtin_surface` and `device_builtin_texture` for surface/texture reference support. So far, both the host and device use the same reference type, which could be revised later when interface/implementation is stablized. Reviewers: yaxunl Subscribers: cfe-commits Tags: #clang Differential Revision: https://reviews.llvm.org/D77583	2020-04-07 09:51:49 -04:00
Matt Arsenault	4ea3650c21	HIP: Link correct denormal mode library This wasn't respecting the flush mode based on the default, and also wasn't correctly handling the explicit -fno-cuda-flush-denormals-to-zero overriding the mode.	2020-04-01 12:36:22 -04:00
Matt Arsenault	175e42303b	AMDGPU: Make HIPToolChain a subclass of AMDGPUToolChain This fixes some code duplication. This is also a step towards consolidating builtin library handling.	2020-03-31 18:22:46 -04:00
Matt Arsenault	c9d65a48af	HIP: Ensure new denormal mode attributes are set Apparently HIPToolChain does not subclass from AMDGPUToolChain, so this was not applying the new denormal attributes. I'm not sure why this doesn't subclass. Just copy the implementation for now.	2020-03-31 18:00:37 -04:00
Yaxun (Sam) Liu	2ae25647d1	[CUDA][HIP] Add -Xarch_device and -Xarch_host options The argument after -Xarch_device will be added to the arguments for CUDA/HIP device compilation and will be removed for host compilation. The argument after -Xarch_host will be added to the arguments for CUDA/HIP host compilation and will be removed for device compilation. Differential Revision: https://reviews.llvm.org/D76520	2020-03-24 10:13:05 -04:00
Yaxun (Sam) Liu	78957bab55	[NFC] Refactor handling of Xarch option Extract common code to a function. To prepare for adding an option for CUDA/HIP host and device only option. Differential Revision: https://reviews.llvm.org/D76455	2020-03-22 14:42:09 -04:00
Benjamin Kramer	adcd026838	Make llvm::StringRef to std::string conversions explicit. This is how it should've been and brings it more in line with std::string_view. There should be no functional change here. This is mostly mechanical from a custom clang-tidy check, with a lot of manual fixups. It uncovers a lot of minor inefficiencies. This doesn't actually modify StringRef yet, I'll do that in a follow-up.	2020-01-28 23:25:25 +01:00
Yaxun (Sam) Liu	b7e415f37f	[HIP] Fix environment variable HIP_DEVICE_LIB_PATH Currently device lib path set by environment variable HIP_DEVICE_LIB_PATH does not work due to extra "-L" added to each entry. This patch fixes that by allowing argument name to be empty in addDirectoryList. Differential Revision: https://reviews.llvm.org/D73299	2020-01-28 11:27:01 -05:00
Michael Liao	49f7bc9e1e	[hip] Remove `-Werror=format-nonliteral` Summary: - It won't distinguish host and device code and trigger compilation failure on irrelevant code. Reviewers: sameerds, yaxunl Subscribers: cfe-commits Tags: #clang Differential Revision: https://reviews.llvm.org/D73224	2020-01-23 11:02:11 -05:00
Holger Wünsche	24d7a0935b	[HIP] use GetProgramPath for executable discovery This change replaces the manual building of executable paths using llvm::sys::path::append with GetProgramPath. This enables adding other paths in case executables reside in different directories and makes the code easier to read. Differential Revision: https://reviews.llvm.org/D72903	2020-01-21 09:41:30 -08:00
Matt Arsenault	a4451d88ee	Consolidate internal denormal flushing controls Currently there are 4 different mechanisms for controlling denormal flushing behavior, and about as many equivalent frontend controls. - AMDGPU uses the fp32-denormals and fp64-f16-denormals subtarget features - NVPTX uses the nvptx-f32ftz attribute - ARM directly uses the denormal-fp-math attribute - Other targets indirectly use denormal-fp-math in one DAGCombine - cl-denorms-are-zero has a corresponding denorms-are-zero attribute AMDGPU wants a distinct control for f32 flushing from f16/f64, and as far as I can tell the same is true for NVPTX (based on the attribute name). Work on consolidating these into the denormal-fp-math attribute, and a new type specific denormal-fp-math-f32 variant. Only ARM seems to support the two different flush modes, so this is overkill for the other use cases. Ideally we would error on the unsupported positive-zero mode on other targets from somewhere. Move the logic for selecting the flush mode into the compiler driver, instead of handling it in cc1. denormal-fp-math/denormal-fp-math-f32 are now both cc1 flags, but denormal-fp-math-f32 is not yet exposed as a user flag. -cl-denorms-are-zero, -fcuda-flush-denormals-to-zero and -fno-cuda-flush-denormals-to-zero will be mapped to -fp-denormal-math-f32=ieee or preserve-sign rather than the old attributes. Stop emitting the denorms-are-zero attribute for the OpenCL flag. It has no in-tree users. The meaning would also be target dependent, such as the AMDGPU choice to treat this as only meaning allow flushing of f32 and not f16 or f64. The naming is also potentially confusing, since DAZ in other contexts refers to instructions implicitly treating input denormals as zero, not necessarily flushing output denormals to zero. This also does not attempt to change the behavior for the current attribute. The LangRef now states that the default is ieee behavior, but this is inaccurate for the current implementation. The clang handling is slightly hacky to avoid touching the existing denormal-fp-math uses. Fixing this will be left for a future patch. AMDGPU is still using the subtarget feature to control the denormal mode, but the new attribute are now emitted. A future change will switch this and remove the subtarget features.	2020-01-17 20:09:53 -05:00
Sameer Sahasrabuddhe	ed181efa17	[HIP][AMDGPU] expand printf when compiling HIP to AMDGPU Summary: This change implements the expansion in two parts: - Add a utility function emitAMDGPUPrintfCall() in LLVM. - Invoke the above function from Clang CodeGen, when processing a HIP program for the AMDGPU target. The printf expansion has undefined behaviour if the format string is not a compile-time constant. As a sufficient condition, the HIP ToolChain now emits -Werror=format-nonliteral. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D71365	2020-01-16 15:15:38 +05:30
Yaxun (Sam) Liu	9f2d8b5c0c	[HIP] Add option --gpu-max-threads-per-block=n Add this option to change the default launch bounds. Differential Revision: https://reviews.llvm.org/D71221	2020-01-07 11:18:00 -05:00

1 2

82 Commits