This CL introduces configuration options to allow pointers to be
compacted in the thread-specific caches and transfer batches. This
offers the possibility to have them use 32-bit of space instead of
64-bit for the 64-bit Primary, thus cutting the size of the caches
and batches by nearly half (and as such the memory used in size
class 0). The cost is an additional read from the region information
in the fast path.
This is not a new idea, as it's being used in the sanitizer_common
64-bit primary. The difference here is that it is configurable via
the allocator config, with the possibility of not compacting at all.
This CL enables compacting pointers in the Android and Fuchsia default
configurations.
Differential Revision: https://reviews.llvm.org/D96435
Summary:
This patch includes several changes to reduce the overall footprint
of the allocator:
- for realloc'd chunks: only keep the same chunk when lowering the size
if the delta is within a page worth of bytes;
- when draining a cache: drain the beginning, not the end; we add pointers
at the end, so that meant we were draining the most recently added
pointers;
- change the release code to account for an freed up last page: when
scanning the pages, we were looking for pages fully covered by blocks;
in the event of the last page, if it's only partially covered, we
wouldn't mark it as releasable - even what follows the last chunk is
all 0s. So now mark the rest of the page as releasable, and adapt the
test;
- add a missing `setReleaseToOsIntervalMs` to the cacheless secondary;
- adjust the Android classes based on more captures thanks to pcc@'s
tool.
Reviewers: pcc, cferris, hctim, eugenis
Subscribers: #sanitizers, llvm-commits
Tags: #sanitizers, #llvm
Differential Revision: https://reviews.llvm.org/D75142
Add an optional table lookup after the existing logarithm computation
for MidSize < Size <= MaxSize during size -> class lookups. The lookup is
O(1) due to indexing a precomputed (via constexpr) table based on a size
table. Switch to this approach for the Android size class maps.
Other approaches considered:
- Binary search was found to have an unacceptable (~30%) performance cost.
- An approach using NEON instructions (see older version of D73824) was found
to be slightly slower than this approach on newer SoCs but significantly
slower on older ones.
By selecting the values in the size tables to minimize wastage (for example,
by passing the malloc_info output of a target program to the included
compute_size_class_config program), we can increase the density of allocations
at a small (~0.5% on bionic malloc_sql_trace as measured using an identity
table) performance cost.
Reduces RSS on specific Android processes as follows (KB):
Before After
zygote (median of 50 runs) 26836 26792 (-0.2%)
zygote64 (median of 50 runs) 30384 30076 (-1.0%)
dex2oat (median of 3 runs) 375792 372952 (-0.8%)
I also measured the amount of whole-system idle dirty heap on Android by
rebooting the system and then running the following script repeatedly until
the results were stable:
for i in $(seq 1 50); do grep -A5 scudo: /proc/*/smaps | grep Pss: | cut -d: -f2 | awk '{s+=$1} END {print s}' ; sleep 1; done
I did this 3 times both before and after this change and the results were:
Before: 365650, 356795, 372663
After: 344521, 356328, 342589
These results are noisy so it is hard to make a definite conclusion, but
there does appear to be a significant effect.
On other platforms, increase the sizes of all size classes by a fixed offset
equal to the size of the allocation header. This has also been found to improve
density, since it is likely for allocation sizes to be a power of 2, which
would otherwise waste space by pushing the allocation into the next size class.
Differential Revision: https://reviews.llvm.org/D73824
Summary:
This tweaks some behaviors of the allocator wrt 32-bit, notably
tailoring the size-class map.
I had to remove a `printStats` from `__scudo_print_stats` since when
within Bionic they share the same slot so they can't coexist at the
same time. I have to find a solution for that later, but right now we
are not using the Svelte configuration.
Reviewers: rengolin
Subscribers: #sanitizers, llvm-commits
Tags: #sanitizers, #llvm
Differential Revision: https://reviews.llvm.org/D74178
By subtracting 1 from Size at the beginning we can simplify the
subsequent calculations. This also saves 4 instructions on aarch64
and 9 instructions on x86_64, but seems to be perf neutral.
Differential Revision: https://reviews.llvm.org/D73936
Summary:
This changes a couple of parameters in the default Android config to
address some performance and memory footprint issues (well to be closer
to the default Bionic allocator numbers).
Subscribers: #sanitizers, llvm-commits
Tags: #sanitizers, #llvm
Differential Revision: https://reviews.llvm.org/D73750
The macros INLINE and COMPILER_CHECK always expand to the same thing (inline
and static_assert respectively). Both expansions are standards compliant C++
and are used consistently in the rest of LLVM, so let's improve consistency
with the rest of LLVM by replacing them with the expansions.
Differential Revision: https://reviews.llvm.org/D70793
Summary:
`SCUDO_DEBUG` was not enabled for unit tests, meaning the `DCHECK`s
were never tripped. While turning this on, I discovered that a few
of those not-exercised checks were actually wrong. This CL addresses
those incorrect checks.
Not that to work in tests `CHECK_IMPL` has to explicitely use the
`scudo` namespace. Also changes a C cast to a C++ cast.
Reviewers: hctim, pcc, cferris, eugenis, vitalybuka
Subscribers: mgorny, #sanitizers, llvm-commits
Tags: #sanitizers, #llvm
Differential Revision: https://reviews.llvm.org/D70276
Summary:
Following up on D68471, this CL introduces some `getStats` APIs to
gather statistics in char buffers (`ScopedString` really) instead of
printing them out right away. Ultimately `printStats` will just
output the buffer, but that allows us to potentially do some work
on the intermediate buffer, and can be used for a `mallocz` type
of functionality. This allows us to pretty much get rid of all the
`Printf` calls around, but I am keeping the function in for
debugging purposes.
This changes the existing tests to use the new APIs when required.
I will add new tests as suggested in D68471 in another CL.
Reviewers: morehouse, hctim, vitalybuka, eugenis, cferris
Reviewed By: morehouse
Subscribers: delcypher, #sanitizers, llvm-commits
Tags: #llvm, #sanitizers
Differential Revision: https://reviews.llvm.org/D68653
llvm-svn: 374173
Summary:
This changes a few things to improve memory footprint and performances
on Android, and fixes a test compilation error:
- add `stdlib.h` to `wrappers_c_test.cc` to address
https://bugs.llvm.org/show_bug.cgi?id=42810
- change Android size class maps, based on benchmarks, to improve
performances and lower the Svelte memory footprint. Also change the
32-bit region size for said configuration
- change the `reallocate` logic to reallocate in place for sizes larger
than the original chunk size, when they still fit in the same block.
This addresses patterns from `memory_replay` dumps like the following:
```
202: realloc 0xb48fd000 0xb4930650 12352
202: realloc 0xb48fd000 0xb48fd000 12420
202: realloc 0xb48fd000 0xb48fd000 12492
202: realloc 0xb48fd000 0xb48fd000 12564
202: realloc 0xb48fd000 0xb48fd000 12636
202: realloc 0xb48fd000 0xb48fd000 12708
202: realloc 0xb48fd000 0xb48fd000 12780
202: realloc 0xb48fd000 0xb48fd000 12852
202: realloc 0xb48fd000 0xb48fd000 12924
202: realloc 0xb48fd000 0xb48fd000 12996
202: realloc 0xb48fd000 0xb48fd000 13068
202: realloc 0xb48fd000 0xb48fd000 13140
202: realloc 0xb48fd000 0xb48fd000 13212
202: realloc 0xb48fd000 0xb48fd000 13284
202: realloc 0xb48fd000 0xb48fd000 13356
202: realloc 0xb48fd000 0xb48fd000 13428
202: realloc 0xb48fd000 0xb48fd000 13500
202: realloc 0xb48fd000 0xb48fd000 13572
202: realloc 0xb48fd000 0xb48fd000 13644
202: realloc 0xb48fd000 0xb48fd000 13716
202: realloc 0xb48fd000 0xb48fd000 13788
...
```
In this situation we were deallocating the old chunk, and
allocating a new one for every single one of those, but now we can
keep the same chunk (we just updated the header), which saves some
heap operations.
Reviewers: hctim, morehouse, vitalybuka, eugenis, cferris, rengolin
Reviewed By: morehouse
Subscribers: srhines, delcypher, #sanitizers, llvm-commits
Tags: #llvm, #sanitizers
Differential Revision: https://reviews.llvm.org/D67293
llvm-svn: 371628
Summary:
This introduces a bunch of small optimizations with the purpose of
making the fastpath tighter:
- tag more conditions as `LIKELY`/`UNLIKELY`: as a rule of thumb we
consider that every operation related to the secondary is unlikely
- attempt to reduce the number of potentially extraneous instructions
- reorganize the `Chunk` header to not straddle a word boundary and
use more appropriate types
Note that some `LIKELY`/`UNLIKELY` impact might be less obvious as
they are in slow paths (for example in `secondary.cc`), but at this
point I am throwing a pretty wide net, and it's consistant and doesn't
hurt.
This was mosly done for the benfit of Android, but other platforms
benefit from it too. An aarch64 Android benchmark gives:
- before:
```
BM_youtube/min_time:15.000/repeats:4/manual_time_mean 445244 us 659385 us 4
BM_youtube/min_time:15.000/repeats:4/manual_time_median 445007 us 658970 us 4
BM_youtube/min_time:15.000/repeats:4/manual_time_stddev 885 us 1332 us 4
```
- after:
```
BM_youtube/min_time:15.000/repeats:4/manual_time_mean 415697 us 621925 us 4
BM_youtube/min_time:15.000/repeats:4/manual_time_median 415913 us 622061 us 4
BM_youtube/min_time:15.000/repeats:4/manual_time_stddev 990 us 1163 us 4
```
Additional since `-Werror=conversion` is enabled on some platforms we
are built on, enable it upstream to catch things early: a few sign
conversions had slept through and needed additional casting.
Reviewers: hctim, morehouse, eugenis, vitalybuka
Reviewed By: vitalybuka
Subscribers: srhines, mgorny, javed.absar, kristof.beyls, delcypher, #sanitizers, llvm-commits
Tags: #llvm, #sanitizers
Differential Revision: https://reviews.llvm.org/D64664
llvm-svn: 366918
Summary:
This CL implements the memory reclaiming function `releaseFreeMemoryToOS`
and its associated classes. Most of this code was originally written by
Aleksey for the Primary64 in sanitizer_common, and I made some changes to
be able to implement 32-bit reclaiming as well. The code has be restructured
a bit to accomodate for freelist of batches instead of the freearray used
in the current sanitizer_common code.
Reviewers: eugenis, vitalybuka, morehouse, hctim
Reviewed By: vitalybuka
Subscribers: srhines, mgorny, delcypher, #sanitizers, llvm-commits
Tags: #llvm, #sanitizers
Differential Revision: https://reviews.llvm.org/D61214
llvm-svn: 359567
Summary:
As with the sanitizer_common allocator, the SCM allows for efficient
mapping between sizes and size-classes, table-free.
It doesn't depart significantly from the original, except that we
allow the use of size-class 0 for other purposes (as opposed to
chunks of size 0). The Primary will use it to hold TransferBatches.
Reviewers: vitalybuka, eugenis, hctim, morehouse
Reviewed By: vitalybuka
Subscribers: srhines, mgorny, delcypher, #sanitizers, llvm-commits
Tags: #llvm, #sanitizers
Differential Revision: https://reviews.llvm.org/D61088
llvm-svn: 359199