Implement acosf function correctly rounded for all rounding modes.
We perform range reduction as follows:
- When `|x| < 2^(-10)`, we use cubic Taylor polynomial:
```
acos(x) = pi/2 - asin(x) ~ pi/2 - x - x^3 / 6.
```
- When `2^(-10) <= |x| <= 0.5`, we use the same approximation that is used for `asinf(x)` when `|x| <= 0.5`:
```
acos(x) = pi/2 - asin(x) ~ pi/2 - x - x^3 * P(x^2).
```
- When `0.5 < x <= 1`, we use the double angle formula: `cos(2y) = 1 - 2 * sin^2 (y)` to reduce to:
```
acos(x) = 2 * asin( sqrt( (1 - x)/2 ) )
```
- When `-1 <= x < -0.5`, we reduce to the positive case above using the formula:
```
acos(x) = pi - acos(-x)
```
Performance benchmark using perf tool from the CORE-MATH project on Ryzen 1700:
```
$ CORE_MATH_PERF_MODE="rdtsc" ./perf.sh acosf
GNU libc version: 2.35
GNU libc release: stable
CORE-MATH reciprocal throughput : 28.613
System LIBC reciprocal throughput : 29.204
LIBC reciprocal throughput : 24.271
$ CORE_MATH_PERF_MODE="rdtsc" ./perf.sh asinf --latency
GNU libc version: 2.35
GNU libc release: stable
CORE-MATH latency : 55.554
System LIBC latency : 76.879
LIBC latency : 62.118
```
Reviewed By: orex, zimmermann6
Differential Revision: https://reviews.llvm.org/D133550
This adds division and power implementations to UInt. Modulo and
division are handled by the same function. These are necessary for some
higher order mathematics, often involving large floating point numbers.
Reviewed By: sivachandra, lntue
Differential Revision: https://reviews.llvm.org/D132184
The libc.src.__support.FPUtil.fputil target encompassed many unrelated
files, and provided a lot of hidden dependencies. This patch splits out
all of these files into component parts and cleans up the cmake files
that used them. It does not touch any source files for simplicity, but
there may be changes made to them in future patches.
Reviewed By: lntue
Differential Revision: https://reviews.llvm.org/D132980
Performance by core-math (core-math/glibc 2.31/current llvm-14):
10.845/43.174/13.467
The review is done on top of D132809.
Differential Revision: https://reviews.llvm.org/D132811
1) `double log2_eval(double)` function added with better than float precision is added.
2) Some refactoring done to put all auxiliary functions and corresponding data
to one place to reuse the code.
3) Added tests for new functions.
4) Performance and precision tests of the function shows, that it more precise than exiting log2,
(no exceptional cases), but timing is ~5% higer that on current one.
Differential Revision: https://reviews.llvm.org/D132809
The FormatSection and the writer functions both previously took a char*
and a length to represent a string. Now they use the StringView class to
represent that more succinctly. This change also required fixing
everywhere these were used, so it touches a lot of files.
Reviewed By: sivachandra
Differential Revision: https://reviews.llvm.org/D131994
In the printf_core CMake, the file pieces are defined as object
libraries that depend on the File data structure. If these are added
unconditionally they'll try to evaluate that dependancy even when there
is no File available. This patch adds a guard to prevent that error.
Reviewed By: sivachandra
Differential Revision: https://reviews.llvm.org/D131921
To use the FILE data structure, LLVM-libc must be in fullbuild mode
since it expects its own implementation. This means that (f)printf can't
be used without fullbuild, but s(n)printf only uses strings. This patch
adjusts the CMake to allow for this.
Reviewed By: sivachandra, lntue
Differential Revision: https://reviews.llvm.org/D131913
To accurately measure the size of sprintf in a finished binary, the
easiest method is to simply build a binary with and without sprintf.
This patch adds an integration test that can be built with and without
sprintf, as well as targets to build it.
Reviewed By: sivachandra
Differential Revision: https://reviews.llvm.org/D131735
Implement tanf function correctly rounded for all rounding modes.
We use the range reduction that is shared with `sinf`, `cosf`, and `sincosf`:
```
k = round(x * 32/pi) and y = x * (32/pi) - k.
```
Then we use the tangent of sum formula:
```
tan(x) = tan((k + y)* pi/32) = tan((k mod 32) * pi / 32 + y * pi/32)
= (tan((k mod 32) * pi/32) + tan(y * pi/32)) / (1 - tan((k mod 32) * pi/32) * tan(y * pi/32))
```
We need to make a further reduction when `k mod 32 >= 16` due to the pole at `pi/2` of `tan(x)` function:
```
if (k mod 32 >= 16): k = k - 31, y = y - 1.0
```
And to compute the final result, we store `tan(k * pi/32)` for `k = -15..15` in a table of 32 double values,
and evaluate `tan(y * pi/32)` with a degree-11 minimax odd polynomial generated by Sollya with:
```
> P = fpminimax(tan(y * pi/32)/y, [|0, 2, 4, 6, 8, 10|], [|D...|], [0, 1.5]);
```
Performance benchmark using `perf` tool from the CORE-MATH project on Ryzen 1700:
```
$ CORE_MATH_PERF_MODE="rdtsc" ./perf.sh tanf
CORE-MATH reciprocal throughput : 18.586
System LIBC reciprocal throughput : 50.068
LIBC reciprocal throughput : 33.823
LIBC reciprocal throughput : 25.161 (with `-msse4.2` flag)
LIBC reciprocal throughput : 19.157 (with `-mfma` flag)
$ CORE_MATH_PERF_MODE="rdtsc" ./perf.sh tanf --latency
GNU libc version: 2.31
GNU libc release: stable
CORE-MATH latency : 55.630
System LIBC latency : 106.264
LIBC latency : 96.060
LIBC latency : 90.727 (with `-msse4.2` flag)
LIBC latency : 82.361 (with `-mfma` flag)
```
Reviewed By: orex
Differential Revision: https://reviews.llvm.org/D131715
Specifically, POSIX functions pthread_key_create, pthread_key_delete,
pthread_setspecific and pthread_getspecific have been added. The C
standard equivalents tss_create, tss_delete, tss_set and tss_get have
also been added.
Reviewed By: lntue, michaelrj
Differential Revision: https://reviews.llvm.org/D131647