Summary:
This follows the addition of `GetRandom` with D34412. We remove our
`/dev/urandom` code and use the new function. Additionally, change the PRNG for
a slightly faster version. One of the issues with the old code is that we have
64 full bits of randomness per "next", using only 8 of those for the Salt and
discarding the rest. So we add a cached u64 in the PRNG that can serve up to
8 u8 before having to call the "next" function again.
During some integration work, I also realized that some very early processes
(like `init`) do not benefit from `/dev/urandom` yet. So if there is no
`getrandom` syscall as well, we have to fallback to some sort of initialization
of the PRNG.
Now a few words on why XoRoShiRo and not something else. I have played a while
with various PRNGs on 32 & 64 bit platforms. Some results are below. LCG 32 & 64
are usually faster but produce respectively 15 & 31 bits of entropy, meaning
that to get a full 64-bit, you would need to call them several times. The simple
XorShift is fast, produces 32 bits but is mediocre with regard to PRNG test
suites, PCG is slower overall, and XoRoShiRo is faster than XorShift128+ and
produces full 64 bits.
%%%
root@tulip-chiphd:/data # ./randtest.arm
[+] starting xs32...
[?] xs32 duration: 22431833053ns
[+] starting lcg32...
[?] lcg32 duration: 14941402090ns
[+] starting pcg32...
[?] pcg32 duration: 44941973771ns
[+] starting xs128p...
[?] xs128p duration: 48889786981ns
[+] starting lcg64...
[?] lcg64 duration: 33831042391ns
[+] starting xos128p...
[?] xos128p duration: 44850878605ns
root@tulip-chiphd:/data # ./randtest.aarch64
[+] starting xs32...
[?] xs32 duration: 22425151678ns
[+] starting lcg32...
[?] lcg32 duration: 14954255257ns
[+] starting pcg32...
[?] pcg32 duration: 37346265726ns
[+] starting xs128p...
[?] xs128p duration: 22523807219ns
[+] starting lcg64...
[?] lcg64 duration: 26141304679ns
[+] starting xos128p...
[?] xos128p duration: 14937033215ns
%%%
Reviewers: alekseyshl
Reviewed By: alekseyshl
Subscribers: aemerson, kristof.beyls, llvm-commits
Differential Revision: https://reviews.llvm.org/D35221
llvm-svn: 307798
Summary:
This change adds Android support to the allocator (but doesn't yet enable it in
the cmake config), and should be the last fragment of the rewritten change
D31947.
Android has more memory constraints than other platforms, so the idea of a
unique context per thread would not have worked. The alternative chosen is to
allocate a set of contexts based on the number of cores on the machine, and
share those contexts within the threads. Contexts can be dynamically reassigned
to threads to prevent contention, based on a scheme suggested by @dvyuokv in
the initial review.
Additionally, given that Android doesn't support ELF TLS (only emutls for now),
we use the TSan TLS slot to make things faster: Scudo is mutually exclusive
with other sanitizers so this shouldn't cause any problem.
An additional change made here, is replacing `thread_local` by `THREADLOCAL`
and using the initial-exec thread model in the non-Android version to prevent
extraneous weak definition and checks on the relevant variables.
Reviewers: kcc, dvyukov, alekseyshl
Reviewed By: alekseyshl
Subscribers: srhines, mgorny, llvm-commits
Differential Revision: https://reviews.llvm.org/D32649
llvm-svn: 302300
Summary:
This change introduces scudo_tls.h & scudo_tls_linux.cpp, where we move the
thread local variables used by the allocator, namely the cache, quarantine
cache & prng. `ScudoThreadContext` will hold those. This patch doesn't
introduce any new platform support yet, this will be the object of a later
patch. This also changes the PRNG so that the structure can be POD.
Reviewers: kcc, dvyukov, alekseyshl
Reviewed By: dvyukov, alekseyshl
Subscribers: llvm-commits, mgorny
Differential Revision: https://reviews.llvm.org/D32440
llvm-svn: 301584