Commit Graph

8 Commits

Author SHA1 Message Date
Joachim Meyer 5d689cf2a6 [NFC][CUDA] Fix order of round(f) definition in __clang_cuda_math.h for non-LP64.
This broke ARM builds e.g.: https://lab.llvm.org/buildbot/#/builders/187/builds/212
2021-07-02 21:55:48 +02:00
Artem Belevich 7d057efddc [CUDA] Work around a bug in rint/nearbyint caused by a broken implementation provided by CUDA.
Normally math functions are forwarded to __nv_* counterparts provided by CUDA's
libdevice bitcode. However, __nv_rint*()/__nv_nearbyint*() functions there have
a bug -- they use round() which rounds *up* instead of rounding towards the
nearest integer, so we end up with rint(2.5f) producing 3.0 instead of expected
2.0. The broken bitcode is not actually used by NVCC itself, which has both a
work-around in CUDA headers and, in recent versions, uses correct
implementations in NVCC's built-ins.

This patch implements equivalent workaround and directs rint*/nearbyint* to
__builtin_* variants that produce correct results.

Differential Revision: https://reviews.llvm.org/D85236
2020-08-05 13:13:48 -07:00
Johannes Doerfert b5667d00e0 [OpenMP][CUDA] Fix std::complex in GPU regions
The old way worked to some degree for C++-mode but in C mode we actually
tried to introduce variants of macros (e.g., isinf). To make both modes
work reliably we get rid of those extra variants and directly use NVIDIA
intrinsics in the complex implementation. While this has to be revisited
as we add other GPU targets which want to reuse the code, it should be
fine for now.

Reviewed By: tra, JonChesterfield, yaxunl

Differential Revision: https://reviews.llvm.org/D83591
2020-07-11 00:40:05 -05:00
Johannes Doerfert 7f1e6fcff9 [OpenMP] Use __OPENMP_NVPTX__ instead of _OPENMP in wrapper headers
Due to recent changes we cannot use OpenMP in CUDA files anymore
(PR45533) as the math handling of CUDA is different when _OPENMP is
defined. We actually want this different behavior only if we are
offloading with OpenMP to NVIDIA, thus generating NVPTX. With this patch
we do not interfere with the CUDA math handling except if we are in
NVPTX offloading mode, as indicated by the presence of __OPENMP_NVPTX__.

Reviewed By: tra

Differential Revision: https://reviews.llvm.org/D78155
2020-07-10 18:53:34 -05:00
Johannes Doerfert d999cbc988 [OpenMP] Initial support for std::complex in target regions
This simply follows the scheme we have for other wrappers. It resolves
the current link problem, e.g., `__muldc3 not found`, when std::complex
operations are used on a device.

This will not allow complex make math function calls to work properly,
e.g., sin, but that is more complex (pan intended) anyway.

Reviewed By: tra, JonChesterfield

Differential Revision: https://reviews.llvm.org/D80897
2020-07-08 17:33:59 -05:00
Johannes Doerfert f85ae058f5 [OpenMP] Provide math functions in OpenMP device code via OpenMP variants
For OpenMP target regions to piggy back on the CUDA/AMDGPU/... implementation of math functions,
we include the appropriate definitions inside of an `omp begin/end declare variant match(device={arch(nvptx)})` scope.
This way, the vendor specific math functions will become specialized versions of the system math functions.
When a system math function is called and specialized version is available the selection logic introduced in D75779
instead call the specialized version. In contrast to the code path we used so far, the system header is actually included.
This means functions without specialized versions are available and so are macro definitions.

This should address PR42061, PR42798, and PR42799.

Reviewed By: ye-luo

Differential Revision: https://reviews.llvm.org/D75788
2020-04-07 23:33:24 -05:00
Johannes Doerfert b0b5f0416b [OpenMP][FIX] Undo changes accidentally already introduced in NFC commit
In d1705c1196 (D77238) we accidentally included subsequent changes and
did not only move the code into a new file (which was the intention).
We undo the changes now and re-introduce them with the appropriate test
changes later.
2020-04-02 01:33:39 -05:00
Johannes Doerfert d1705c1196 [CUDA][NFC] Split math.h functions out of __clang_cuda_device_functions.h
This is not supported to change anything but allow us to reuse the math
functions separately from the device functions, e.g., source them at
different times. This will be used by the OpenMP overlay.

This also adds two `return` keywords that were missing.

Reviewed By: tra

Differential Revision: https://reviews.llvm.org/D77238
2020-04-01 23:46:27 -05:00