This version is more composable and also simpler at the expense of being more explicit and more verbose.
This patch provides rationale for the framework, implementation and unit tests but the functions themselves are still using the previous version. The change in implementation will come in a follow up patch.
Differential Revision: https://reviews.llvm.org/D136292
Switch from green checkmarks to the following legend:
X = x86_64
A = aarch64
a = arm32
Reviewed By: lntue
Differential Revision: https://reviews.llvm.org/D136020
* Make consistent heading names
* Factor out |check| into an include for reuse
* Use it everywhere (No more YES or UTF-8)
* Remove unneeded summary from pages. People know why they're there.
* Ensure source location headers everywhere.
Differential Revision: https://reviews.llvm.org/D136016
This reverts commit https://reviews.llvm.org/D135134 (b3f1d58a13)
That revision appears to have broken Arm memcpy in some subtle
ways. Am communicating with the original author to get a
good reproduction.
This version is more composable and also simpler at the expense of being more explicit and more verbose. It also provides minimal implementations for ARM platforms.
Codegen can be checked here https://godbolt.org/z/chf1Y6eGM
Differential Revision: https://reviews.llvm.org/D135134
This version is more composable and also simpler at the expense of being more explicit and more verbose. It also provides minimal implementations for ARM platforms.
Codegen can be checked here https://godbolt.org/z/x19zvE59v
Differential Revision: https://reviews.llvm.org/D135134
This version is more composable and also simpler at the expense of being more explicit and more verbose. It also provides minimal implementations for ARM platforms.
Codegen can be checked here https://godbolt.org/z/x19zvE59v
Differential Revision: https://reviews.llvm.org/D135134
The implementation currently ignores all spawn attributes. Support for
them will be added in future changes.
A simple allocator for integration tests has been added so that the
integration test for posix_spawn can use the
posix_spawn_file_actions_add* functions.
Reviewed By: michaelrj
Differential Revision: https://reviews.llvm.org/D135752
This version is more composable and also simpler at the expense of being more explicit and more verbose. It also provides minimal implementations for ARM platforms.
Codegen can be checked here https://godbolt.org/z/x19zvE59v
Differential Revision: https://reviews.llvm.org/D135134
These headers are uncommonly used, and from extensions, but some basic
support is needed. Macros have been added where available.
Reviewed By: sivachandra
Differential Revision: https://reviews.llvm.org/D135731
This version is more composable and also simpler at the expense of being more explicit and more verbose.
This patch is not meant to be submitted but gives an idea of the change.
Codegen can be checked in https://godbolt.org/z/6z1dEoWbs by removing the "static inline" before individual functions.
Unittests are coming.
Suggested review order:
- utils
- op_base
- op_builtin
- op_generic
- op_x86 / op_aarch64
- *_implementations.h
Differential Revision: https://reviews.llvm.org/D135134
Some headers hadn't been added, this fixes that and improves the
ordering.
Reviewed By: sivachandra
Differential Revision: https://reviews.llvm.org/D135629
In the isatty test it was assumed that /dev/tty existed, but that's not
always the case. Now we check for that.
Differential Revision: https://reviews.llvm.org/D135626
The isatty function uses the side effects of an ioctl call to determine
if a specific file descriptor is a terminal. I chose TIOCGETD (get line
discipline of terminal) because it didn't require any new structs.
Reviewed By: sivachandra, lntue
Differential Revision: https://reviews.llvm.org/D135618
The sysconf function has many options, this patch adds the basic funtion
and the pagesize option. More options will be added in future patches.
Reviewed By: sivachandra
Differential Revision: https://reviews.llvm.org/D135409
This patch adds support for converting doubles to string in the %f/F
format specifier. It does not yet support long doubles outside of the
double range. This implementation is based on the work of Ulf Adams,
specifically the Ryu Printf algorithm.
See:
Ulf Adams. 2019. Ryū revisited: printf floating point conversion.
Proc. ACM Program. Lang. 3, OOPSLA, Article 169 (October 2019), 23 pages.
https://doi.org/10.1145/3360595
Differential Revision: https://reviews.llvm.org/D131023
The logic for strsignal and strerror is very similar, so I've moved them
both to use a shared utility (MessageMapper) for the basic
functionality.
Reviewed By: sivachandra
Differential Revision: https://reviews.llvm.org/D135322
I've implemente the gnu variant of strerror_r since that seems to be the
one more relevant to what we're trying to do.
Differential Revision: https://reviews.llvm.org/D135227
Previously the futex type was defined in terms of unsigned int, now it's
uint32, which is more portable.
Reviewed By: sivachandra
Differential Revision: https://reviews.llvm.org/D135408
Without this fix, the declaration in sched.h will not have the "__" prefix and
will cause a compile failure.
Reviewed By: michaelrj
Differential Revision: https://reviews.llvm.org/D135286
Add the macro CPU_COUNT as well as a backing function to implement the
functionality.
Reviewed By: sivachandra
Differential Revision: https://reviews.llvm.org/D135179
This provides the reference implementation of rand and srand. In future
this will likely be upgraded to something that supports full ints.
Reviewed By: sivachandra
Differential Revision: https://reviews.llvm.org/D135187
A very simple and minimal implementation of fork is added. Future
changes will add more functionality to satisfy POSIX and Linux
requirements.
An implementation of wait and a few support macros in sys/wait.h
have also been added to help with testing the fork function.
Reviewed By: lntue, michaelrj
Differential Revision: https://reviews.llvm.org/D135131
Our syscall implementation depends on a specific macro that's only
defined in our headers. If we're not using our headers, then the test
doesn't work. I've disabled the test in this case because there's no
point in testing the system libc's syscall implementation.
Differential Revision: https://reviews.llvm.org/D134994
Add the syscall wrapper function and tests. It's implemented using a
macro to guarantee the minimum number of arguments.
Reviewed By: sivachandra, lntue
Differential Revision: https://reviews.llvm.org/D134919
They were disabled because we were including linux/signal.h from our
signal.h. Linux's signal.h is not designed to be included from user
programs as it causes a lot of non-standard name pollution. Also, it is
not self-contained. This change defines types and macros relevant for
signal related syscalls within libc's headers and removes inclusion of
Linux headers.
This patch enables the funtions only for x86_64. They will be enabled
for aarch64 also in a follow up patch after testing.
Reviewed By: abrachet, lntue
Differential Revision: https://reviews.llvm.org/D134567
On windows, including math.h causes macros for "OVERFLOW" and
"UNDERFLOW" to be defined. This patch renames some variables internal to
FEnvImpl.h to avoid colliding with those.
Reviewed By: sivachandra
Differential Revision: https://reviews.llvm.org/D134775
The existing thrd_once function has been refactored so that the
implementation can be shared between thrd_once and pthread_once
functions.
Reviewed By: michaelrj
Differential Revision: https://reviews.llvm.org/D134716
The windows build has fallen behind a little, this patch fixes some
issues that were preventing it from building.
Specifically: Some subfolders weren't being included, leading to missing
targets in the cmake.
Reviewed By: sivachandra
Differential Revision: https://reviews.llvm.org/D134676
Tested:
Limited unit test: This makes a call and checks that no error was
returned, but we currently don't have the ability to ensure that
time has elapsed as expected.
Co-authored-by: Jeff Bailey <jeffbailey@google.com>
Reviewed By: sivachandra, jeffbailey
Differential Revision: https://reviews.llvm.org/D134095
Previously the mman macros were in api.td, but platform differences are
easier to handle with preprocessor macros so they have been moved to
include. Also I completed the list of macros (at least for what I need
soon) and fixed some previously incorrect values.
Reviewed By: sivachandra, lntue
Differential Revision: https://reviews.llvm.org/D134491
Strerror maps error numbers to strings. Additionally, a utility for
mapping errors to strings was added so that it could be reused for
perror and similar.
Reviewed By: sivachandra
Differential Revision: https://reviews.llvm.org/D134074
Implement exp10f function correctly rounded to all rounding modes.
Algorithm: perform range reduction to reduce
```
10^x = 2^(hi + mid) * 10^lo
```
where:
```
hi is an integer,
0 <= mid * 2^5 < 2^5
-log10(2) / 2^6 <= lo <= log10(2) / 2^6
```
Then `2^mid` is stored in a table of 32 entries and the product `2^hi * 2^mid` is
performed by adding `hi` into the exponent field of `2^mid`.
`10^lo` is then approximated by a degree-5 minimax polynomials generated by Sollya with:
```
> P = fpminimax((10^x - 1)/x, 4, [|D...|], [-log10(2)/64. log10(2)/64]);
```
Performance benchmark using perf tool from the CORE-MATH project on Ryzen 1700:
```
$ CORE_MATH_PERF_MODE="rdtsc" ./perf.sh exp10f
GNU libc version: 2.35
GNU libc release: stable
CORE-MATH reciprocal throughput : 10.215
System LIBC reciprocal throughput : 7.944
LIBC reciprocal throughput : 38.538
LIBC reciprocal throughput : 12.175 (with `-msse4.2` flag)
LIBC reciprocal throughput : 9.862 (with `-mfma` flag)
$ CORE_MATH_PERF_MODE="rdtsc" ./perf.sh exp10f --latency
GNU libc version: 2.35
GNU libc release: stable
CORE-MATH latency : 40.744
System LIBC latency : 37.546
BEFORE
LIBC latency : 48.989
LIBC latency : 44.486 (with `-msse4.2` flag)
LIBC latency : 40.221 (with `-mfma` flag)
```
This patch relies on https://reviews.llvm.org/D134002
Reviewed By: orex, zimmermann6
Differential Revision: https://reviews.llvm.org/D134104
Now libc headers can be installed separately from installing the rest of
the libc.
Reviewed By: sivachandra
Differential Revision: https://reviews.llvm.org/D133960
Reduce the number of subintervals that need lookup table and optimize
the evaluation steps.
Currently, `exp2f` is computed by reducing to `2^hi * 2^mid * 2^lo` where
`-16/32 <= mid <= 15/32` and `-1/64 <= lo <= 1/64`, and `2^lo` is then
approximated by a degree 6 polynomial.
Experiment with Sollya showed that by using a degree 6 polynomial, we
can approximate `2^lo` for a bigger range with reasonable errors:
```
> P = fpminimax((2^x - 1)/x, 5, [|D...|], [-1/64, 1/64]);
> dirtyinfnorm(2^x - 1 - x*P, [-1/64, 1/64]);
0x1.e18a1bc09114def49eb851655e2e5c4dd08075ac2p-63
> P = fpminimax((2^x - 1)/x, 5, [|D...|], [-1/32, 1/32]);
> dirtyinfnorm(2^x - 1 - x*P, [-1/32, 1/32]);
0x1.05627b6ed48ca417fe53e3495f7df4baf84a05e2ap-56
```
So we can optimize the implementation a bit with:
# Reduce the range to `mid = i/16` for `i = 0..15` and `-1/32 <= lo <= 1/32`
# Store the table `2^mid` in bits, and add `hi` directly to its exponent field to compute `2^hi * 2^mid`
# Rearrange the order of evaluating the polynomial approximating `2^lo`.
Performance benchmark using perf tool from the CORE-MATH project on Ryzen 1700:
```
$ CORE_MATH_PERF_MODE="rdtsc" ./perf.sh exp2f
GNU libc version: 2.35
GNU libc release: stable
CORE-MATH reciprocal throughput : 9.534
System LIBC reciprocal throughput : 6.229
BEFORE:
LIBC reciprocal throughput : 21.405
LIBC reciprocal throughput : 15.241 (with `-msse4.2` flag)
LIBC reciprocal throughput : 11.111 (with `-mfma` flag)
AFTER:
LIBC reciprocal throughput : 18.617
LIBC reciprocal throughput : 12.852 (with `-msse4.2` flag)
LIBC reciprocal throughput : 9.253 (with `-mfma` flag)
$ CORE_MATH_PERF_MODE="rdtsc" ./perf.sh exp2f --latency
GNU libc version: 2.35
GNU libc release: stable
CORE-MATH latency : 40.869
System LIBC latency : 30.580
BEFORE
LIBC latency : 64.888
LIBC latency : 61.027 (with `-msse4.2` flag)
LIBC latency : 48.778 (with `-mfma` flag)
AFTER
LIBC latency : 48.803
LIBC latency : 45.047 (with `-msse4.2` flag)
LIBC latency : 37.487 (with `-mfma` flag)
```
Reviewed By: sivachandra, orex
Differential Revision: https://reviews.llvm.org/D133870
Implement acosf function correctly rounded for all rounding modes.
We perform range reduction as follows:
- When `|x| < 2^(-10)`, we use cubic Taylor polynomial:
```
acos(x) = pi/2 - asin(x) ~ pi/2 - x - x^3 / 6.
```
- When `2^(-10) <= |x| <= 0.5`, we use the same approximation that is used for `asinf(x)` when `|x| <= 0.5`:
```
acos(x) = pi/2 - asin(x) ~ pi/2 - x - x^3 * P(x^2).
```
- When `0.5 < x <= 1`, we use the double angle formula: `cos(2y) = 1 - 2 * sin^2 (y)` to reduce to:
```
acos(x) = 2 * asin( sqrt( (1 - x)/2 ) )
```
- When `-1 <= x < -0.5`, we reduce to the positive case above using the formula:
```
acos(x) = pi - acos(-x)
```
Performance benchmark using perf tool from the CORE-MATH project on Ryzen 1700:
```
$ CORE_MATH_PERF_MODE="rdtsc" ./perf.sh acosf
GNU libc version: 2.35
GNU libc release: stable
CORE-MATH reciprocal throughput : 28.613
System LIBC reciprocal throughput : 29.204
LIBC reciprocal throughput : 24.271
$ CORE_MATH_PERF_MODE="rdtsc" ./perf.sh asinf --latency
GNU libc version: 2.35
GNU libc release: stable
CORE-MATH latency : 55.554
System LIBC latency : 76.879
LIBC latency : 62.118
```
Reviewed By: orex, zimmermann6
Differential Revision: https://reviews.llvm.org/D133550
This function cannot have any instrumentation because it's
assembly must match exactly what the debugger is expecting.
Previously it was just a list of what sanitizers we expect
libc would be sanitized with but this is untenable.
Update the utility functions for checking exceptional values of math
functions to use cpp::optional return values.
Reviewed By: sivachandra
Differential Revision: https://reviews.llvm.org/D133134
This adds division and power implementations to UInt. Modulo and
division are handled by the same function. These are necessary for some
higher order mathematics, often involving large floating point numbers.
Reviewed By: sivachandra, lntue
Differential Revision: https://reviews.llvm.org/D132184
builtin_wrappers contains the wrappers for the clz builtins, which do
not depend on anything in fputil. This patch moves the file out of
FPUtil. The location is updated as appropriate.
Reviewed By: lntue
Differential Revision: https://reviews.llvm.org/D133035
The implementation currently supports only non-thumb mode. As a test for
the implementation, mmap and munmap functions have been enabled.
Reviewed By: michaelrj
Differential Revision: https://reviews.llvm.org/D132825
The libc.src.__support.FPUtil.fputil target encompassed many unrelated
files, and provided a lot of hidden dependencies. This patch splits out
all of these files into component parts and cleans up the cmake files
that used them. It does not touch any source files for simplicity, but
there may be changes made to them in future patches.
Reviewed By: lntue
Differential Revision: https://reviews.llvm.org/D132980
Performance by core-math (core-math/glibc 2.31/current llvm-14):
10.845/43.174/13.467
The review is done on top of D132809.
Differential Revision: https://reviews.llvm.org/D132811
1) `double log2_eval(double)` function added with better than float precision is added.
2) Some refactoring done to put all auxiliary functions and corresponding data
to one place to reuse the code.
3) Added tests for new functions.
4) Performance and precision tests of the function shows, that it more precise than exiting log2,
(no exceptional cases), but timing is ~5% higer that on current one.
Differential Revision: https://reviews.llvm.org/D132809
This patch allows for adjusting the size of the array that printf uses
to track the types of arguments in index mode. This is useful for
optimizing the tradeoff between memory usage and performance.
Reviewed By: sivachandra
Differential Revision: https://reviews.llvm.org/D131993
The FormatSection and the writer functions both previously took a char*
and a length to represent a string. Now they use the StringView class to
represent that more succinctly. This change also required fixing
everywhere these were used, so it touches a lot of files.
Reviewed By: sivachandra
Differential Revision: https://reviews.llvm.org/D131994