llvm-project

Commit Graph

Author	SHA1	Message	Date
Joseph Huber	1bae02b773	[Cuda] Use fallback method to mangle externalized decls if no CUID given CUDA requires that static variables be visible to the host when offloading. However, The standard semantics of a stiatc variable dictate that it should not be visible outside of the current file. In order to access it from the host we need to perform "externalization" on the static variable on the device. This requires generating a semi-unique name that can be affixed to the variable as to not cause linker errors. This is currently done using the CUID functionality, an MD5 hash value set up by the clang driver. This allows us to achieve is mostly unique ID that is unique even between multiple compilations of the same file. However, this is not always availible. Instead, this patch uses the unique ID from the file to generate a unique symbol name. This will create a unique name that is consistent between the host and device side compilations without requiring the CUID to be entered by the driver. The one downside to this is that we are no longer stable under multiple compilations of the same file. However, this is a very niche use-case and is not supported by Nvidia's CUDA compiler so it likely to be good enough. Reviewed By: tra Differential Revision: https://reviews.llvm.org/D125904	2022-05-26 09:18:22 -04:00
Daniil Kovalev	828b63c309	[NVPTX] Enhance vectorization of ld.param & st.param Since function parameters and return values are passed via param space, we can force special alignment for values hold in it which will add vectorization options. This change may be done if the function has private or internal linkage. Special alignment is forced during 2 phases. 1) Instruction selection lowering. Here we use special alignment for function prototypes (changing both own return value and parameters alignment), call lowering (changing both callee's return value and parameters alignment). 2) IR pass nvptx-lower-args. Here we change alignment of byval parameters that belong to param space (or are casted to it). We only handle cases when all uses of such parameters are loads from it. For such loads, we can change the alignment according to special type alignment and the load offset. Then, load-store-vectorizer IR pass will perform vectorization where alignment allows it. Special alignment calculated as maximum from default ABI type alignment and alignment 16. Alignment 16 is chosen because it's the maximum size of vectorized ld.param & st.param. Before specifying such special alignment, we should check if it is a multiple of the alignment that the type already has. For example, if a value has an enforced alignment of 64, default ABI alignment of 4 and special alignment of 16, we should preserve 64. This patch will be followed by a refactoring patch that removes duplicating code in handling byval and non-byval arguments. Differential Revision: https://reviews.llvm.org/D120129	2022-03-24 12:36:52 +03:00
Daniil Kovalev	a034878564	Revert "[NVPTX] Enhance vectorization of ld.param & st.param" This reverts commit `f854434f0f`. Placed URL to wrong differential revision in commit message.	2022-03-24 12:32:06 +03:00
Daniil Kovalev	f854434f0f	[NVPTX] Enhance vectorization of ld.param & st.param Since function parameters and return values are passed via param space, we can force special alignment for values hold in it which will add vectorization options. This change may be done if the function has private or internal linkage. Special alignment is forced during 2 phases. 1) Instruction selection lowering. Here we use special alignment for function prototypes (changing both own return value and parameters alignment), call lowering (changing both callee's return value and parameters alignment). 2) IR pass nvptx-lower-args. Here we change alignment of byval parameters that belong to param space (or are casted to it). We only handle cases when all uses of such parameters are loads from it. For such loads, we can change the alignment according to special type alignment and the load offset. Then, load-store-vectorizer IR pass will perform vectorization where alignment allows it. Special alignment calculated as maximum from default ABI type alignment and alignment 16. Alignment 16 is chosen because it's the maximum size of vectorized ld.param & st.param. Before specifying such special alignment, we should check if it is a multiple of the alignment that the type already has. For example, if a value has an enforced alignment of 64, default ABI alignment of 4 and special alignment of 16, we should preserve 64. This patch will be followed by a refactoring patch that removes duplicating code in handling byval and non-byval arguments. Differential Revision: https://reviews.llvm.org/D121549	2022-03-24 12:25:36 +03:00
Fangrui Song	fd739804e0	[test] Add {{.}} to make ELF tests immune to dso_local/dso_preemptable/(none) differences For a default visibility external linkage definition, dso_local is set for ELF -fno-pic/-fpie and COFF and Mach-O. Since default clang -cc1 for ELF is similar to -fpic ("PIC Level" is not set), this nuance causes unneeded binary format differences. To make emitted IR similar, ELF -cc1 -fpic will default to -fno-semantic-interposition, which sets dso_local for default visibility external linkage definitions. To make this flip smooth and enable future (dso_local as definition default), this patch replaces (function) `define ` with `define{{.}} `, (variable/constant/alias) `= ` with `={{.}} `, or inserts appropriate `{{.}} `.	2020-12-31 00:27:11 -08:00
Yaxun (Sam) Liu	abd8cd9199	[CUDA][HIP] Fix linkage for -fgpu-rdc Currently for explicit template function instantiation in CUDA/HIP device compilation clang emits instantiated kernel with external linkage and instantiated device function with internal linkage. This is fine for -fno-gpu-rdc since there is only one TU. However this causes duplicate symbols for kernels for -fgpu-rdc if the same instantiation happen in multiple TU. Or missing symbols if a device function calls an explicitly instantiated template function in a different TU. To make explicit template function instantiation work for -fgpu-rdc we need to follow the C++ linkage paradigm, i.e. use weak_odr linkage. Differential Revision: https://reviews.llvm.org/D90311	2020-11-03 08:07:19 -05:00

6 Commits