llvm-project

Commit Graph

Author	SHA1	Message	Date
Tue Ly	82d6e77048	[libc] Implement tanf function correctly rounded for all rounding modes. Implement tanf function correctly rounded for all rounding modes. We use the range reduction that is shared with `sinf`, `cosf`, and `sincosf`: ``` k = round(x * 32/pi) and y = x * (32/pi) - k. ``` Then we use the tangent of sum formula: ``` tan(x) = tan((k + y)* pi/32) = tan((k mod 32) * pi / 32 + y * pi/32) = (tan((k mod 32) * pi/32) + tan(y * pi/32)) / (1 - tan((k mod 32) * pi/32) * tan(y * pi/32)) ``` We need to make a further reduction when `k mod 32 >= 16` due to the pole at `pi/2` of `tan(x)` function: ``` if (k mod 32 >= 16): k = k - 31, y = y - 1.0 ``` And to compute the final result, we store `tan(k * pi/32)` for `k = -15..15` in a table of 32 double values, and evaluate `tan(y * pi/32)` with a degree-11 minimax odd polynomial generated by Sollya with: ``` > P = fpminimax(tan(y * pi/32)/y, [\|0, 2, 4, 6, 8, 10\|], [\|D...\|], [0, 1.5]); ``` Performance benchmark using `perf` tool from the CORE-MATH project on Ryzen 1700: ``` $ CORE_MATH_PERF_MODE="rdtsc" ./perf.sh tanf CORE-MATH reciprocal throughput : 18.586 System LIBC reciprocal throughput : 50.068 LIBC reciprocal throughput : 33.823 LIBC reciprocal throughput : 25.161 (with `-msse4.2` flag) LIBC reciprocal throughput : 19.157 (with `-mfma` flag) $ CORE_MATH_PERF_MODE="rdtsc" ./perf.sh tanf --latency GNU libc version: 2.31 GNU libc release: stable CORE-MATH latency : 55.630 System LIBC latency : 106.264 LIBC latency : 96.060 LIBC latency : 90.727 (with `-msse4.2` flag) LIBC latency : 82.361 (with `-mfma` flag) ``` Reviewed By: orex Differential Revision: https://reviews.llvm.org/D131715	2022-08-12 09:21:05 -04:00
Kirill Okhotnikov	5ef987c985	[libc][math] Added tanhf function. Correct rounding function. Performance ~2x faster than glibc analog. Performance (llvm 12 intel): ``` CORE_MATH_PERF_MODE=rdtsc PERF_ARGS='' ./perf.sh tanhf GNU libc version: 2.31 GNU libc release: stable 13.279 37.492 18.145 CORE_MATH_PERF_MODE=rdtsc PERF_ARGS='--latency' ./perf.sh tanhf GNU libc version: 2.31 GNU libc release: stable 40.658 109.582 66.568 ``` Differential Revision: https://reviews.llvm.org/D130780	2022-08-01 22:43:00 +02:00
Kirill Okhotnikov	a7f55f0805	[libc][math] Added sinhf function. Differential Revision: https://reviews.llvm.org/D129278	2022-07-29 17:20:53 +02:00
Kirill Okhotnikov	fcb9d7e2cf	[libc][math] Added coshf function. Differential Revision: https://reviews.llvm.org/D129275	2022-07-29 16:57:28 +02:00
Alex Brachet	c179bcc151	[libc] Add imaxabs Differential Revision: https://reviews.llvm.org/D129517	2022-07-11 21:28:21 +00:00
Kirill Okhotnikov	b8e8012aa2	[libc][math] fmod/fmodf implementation. This is a implementation of find remainder fmod function from standard libm. The underline algorithm is developed by myself, but probably it was first invented before. Some features of the implementation: 1. The code is written on more-or-less modern C++. 2. One general implementation for both float and double precision numbers. 3. Spitted platform/architecture dependent and independent code and tests. 4. Tests covers 100% of the code for both float and double numbers. Tests cases with NaN/Inf etc is copied from glibc. 5. The new implementation in general 2-4 times faster for “regular” x,y values. It can be 20 times faster for x/y huge value, but can also be 2 times slower for double denormalized range (according to perf tests provided). 6. Two different implementation of division loop are provided. In some platforms division can be very time consuming operation. Depend on platform it can be 3-10 times slower than multiplication. Performance tests: The test is based on core-math project (https://gitlab.inria.fr/core-math/core-math). By Tue Ly suggestion I took hypot function and use it as template for fmod. Preserving all test cases. `./check.sh <--special\|--worst> fmodf` passed. `CORE_MATH_PERF_MODE=rdtsc ./perf.sh fmodf` results are ``` GNU libc version: 2.35 GNU libc release: stable 21.166 <-- FPU 51.031 <-- current glibc 37.659 <-- this fmod version. ```	2022-06-24 23:09:14 +02:00
Alex Brachet	b1183305f8	[libc] Add strlcat Differential Revision: https://reviews.llvm.org/D125978	2022-05-19 21:48:39 +00:00
Alex Brachet	fc2c8b2371	[libc] Add strlcpy Differential Revision: https://reviews.llvm.org/D125806	2022-05-18 17:45:05 +00:00
Tue Ly	0f031daea8	[libc] Initial support for darwin-aarch64. Add initial support for darwin-aarch64 (macOS M1). Some differences compared to linux-aarch64: - `math.h` defined `math_errhandling` by the compiler builtin `__math_errhandling()` but Apple Clang 13.0.0 on M1 does not support `__math_errhandling()` builtin as a macro function or a constexpr function. - `math.h` defines `UNDERFLOW` and `OVERFLOW` macros. - Besides 5 usual floating point exceptions: `FE_INEXACT`, `FE_UNDERFLOW`, `FE_OVERFLOW`, `FE_DIVBYZERO`, and `FE_INVALID`, `fenv.h` also has another floating point exception: `FE_FLUSHTOZERO`. The corresponding trap for `FE_FLUSHTOZERO` in the control register is at the different location compared to the status register. - `FE_FLUSHTOZERO` exception flag cannot be raised with the default CPU floating point operation mode. Reviewed By: sivachandra Differential Revision: https://reviews.llvm.org/D120914	2022-03-10 09:26:09 -05:00

9 Commits