llvm-project

Commit Graph

Author	SHA1	Message	Date
Joachim Meyer	5d689cf2a6	[NFC][CUDA] Fix order of round(f) definition in __clang_cuda_math.h for non-LP64. This broke ARM builds e.g.: https://lab.llvm.org/buildbot/#/builders/187/builds/212	2021-07-02 21:55:48 +02:00
Artem Belevich	7d057efddc	[CUDA] Work around a bug in rint/nearbyint caused by a broken implementation provided by CUDA. Normally math functions are forwarded to __nv_* counterparts provided by CUDA's libdevice bitcode. However, __nv_rint()/__nv_nearbyint() functions there have a bug -- they use round() which rounds up instead of rounding towards the nearest integer, so we end up with rint(2.5f) producing 3.0 instead of expected 2.0. The broken bitcode is not actually used by NVCC itself, which has both a work-around in CUDA headers and, in recent versions, uses correct implementations in NVCC's built-ins. This patch implements equivalent workaround and directs rint/nearbyint to __builtin_* variants that produce correct results. Differential Revision: https://reviews.llvm.org/D85236	2020-08-05 13:13:48 -07:00
Johannes Doerfert	b5667d00e0	[OpenMP][CUDA] Fix std::complex in GPU regions The old way worked to some degree for C++-mode but in C mode we actually tried to introduce variants of macros (e.g., isinf). To make both modes work reliably we get rid of those extra variants and directly use NVIDIA intrinsics in the complex implementation. While this has to be revisited as we add other GPU targets which want to reuse the code, it should be fine for now. Reviewed By: tra, JonChesterfield, yaxunl Differential Revision: https://reviews.llvm.org/D83591	2020-07-11 00:40:05 -05:00
Johannes Doerfert	7f1e6fcff9	[OpenMP] Use __OPENMP_NVPTX__ instead of _OPENMP in wrapper headers Due to recent changes we cannot use OpenMP in CUDA files anymore (PR45533) as the math handling of CUDA is different when _OPENMP is defined. We actually want this different behavior only if we are offloading with OpenMP to NVIDIA, thus generating NVPTX. With this patch we do not interfere with the CUDA math handling except if we are in NVPTX offloading mode, as indicated by the presence of __OPENMP_NVPTX__. Reviewed By: tra Differential Revision: https://reviews.llvm.org/D78155	2020-07-10 18:53:34 -05:00
Johannes Doerfert	d999cbc988	[OpenMP] Initial support for std::complex in target regions This simply follows the scheme we have for other wrappers. It resolves the current link problem, e.g., `__muldc3 not found`, when std::complex operations are used on a device. This will not allow complex make math function calls to work properly, e.g., sin, but that is more complex (pan intended) anyway. Reviewed By: tra, JonChesterfield Differential Revision: https://reviews.llvm.org/D80897	2020-07-08 17:33:59 -05:00
Johannes Doerfert	f85ae058f5	[OpenMP] Provide math functions in OpenMP device code via OpenMP variants For OpenMP target regions to piggy back on the CUDA/AMDGPU/... implementation of math functions, we include the appropriate definitions inside of an `omp begin/end declare variant match(device={arch(nvptx)})` scope. This way, the vendor specific math functions will become specialized versions of the system math functions. When a system math function is called and specialized version is available the selection logic introduced in D75779 instead call the specialized version. In contrast to the code path we used so far, the system header is actually included. This means functions without specialized versions are available and so are macro definitions. This should address PR42061, PR42798, and PR42799. Reviewed By: ye-luo Differential Revision: https://reviews.llvm.org/D75788	2020-04-07 23:33:24 -05:00
Johannes Doerfert	b0b5f0416b	[OpenMP][FIX] Undo changes accidentally already introduced in NFC commit In `d1705c1196` (D77238) we accidentally included subsequent changes and did not only move the code into a new file (which was the intention). We undo the changes now and re-introduce them with the appropriate test changes later.	2020-04-02 01:33:39 -05:00
Johannes Doerfert	d1705c1196	[CUDA][NFC] Split math.h functions out of __clang_cuda_device_functions.h This is not supported to change anything but allow us to reuse the math functions separately from the device functions, e.g., source them at different times. This will be used by the OpenMP overlay. This also adds two `return` keywords that were missing. Reviewed By: tra Differential Revision: https://reviews.llvm.org/D77238	2020-04-01 23:46:27 -05:00

8 Commits