Summary:
We introduced a way to fallback to the immediately larger size class for
the Primary in the event a region was full, but in the event of the largest
size class, we would just fail.
This change allows to fallback to the Secondary when the last region of
the Primary is full. We also expand the trick to all platforms as opposed
to being Android only, and update the test to cover the new case.
Reviewers: hctim, cferris, eugenis, morehouse, pcc
Subscribers: #sanitizers, llvm-commits
Tags: #sanitizers
Differential Revision: https://reviews.llvm.org/D76430
Summary:
Due to Unity, we had to reduce our region sizes, but in some rare
situations, some programs (mostly tests AFAICT) manage to fill up
a region for a given size class.
So this adds a workaround for that attempts to allocate the block
from the immediately larger size class, wasting some memory but
allowing the application to keep going.
Reviewers: pcc, eugenis, cferris, hctim, morehouse
Subscribers: #sanitizers, llvm-commits
Tags: #sanitizers, #llvm
Differential Revision: https://reviews.llvm.org/D74567
Add an optional table lookup after the existing logarithm computation
for MidSize < Size <= MaxSize during size -> class lookups. The lookup is
O(1) due to indexing a precomputed (via constexpr) table based on a size
table. Switch to this approach for the Android size class maps.
Other approaches considered:
- Binary search was found to have an unacceptable (~30%) performance cost.
- An approach using NEON instructions (see older version of D73824) was found
to be slightly slower than this approach on newer SoCs but significantly
slower on older ones.
By selecting the values in the size tables to minimize wastage (for example,
by passing the malloc_info output of a target program to the included
compute_size_class_config program), we can increase the density of allocations
at a small (~0.5% on bionic malloc_sql_trace as measured using an identity
table) performance cost.
Reduces RSS on specific Android processes as follows (KB):
Before After
zygote (median of 50 runs) 26836 26792 (-0.2%)
zygote64 (median of 50 runs) 30384 30076 (-1.0%)
dex2oat (median of 3 runs) 375792 372952 (-0.8%)
I also measured the amount of whole-system idle dirty heap on Android by
rebooting the system and then running the following script repeatedly until
the results were stable:
for i in $(seq 1 50); do grep -A5 scudo: /proc/*/smaps | grep Pss: | cut -d: -f2 | awk '{s+=$1} END {print s}' ; sleep 1; done
I did this 3 times both before and after this change and the results were:
Before: 365650, 356795, 372663
After: 344521, 356328, 342589
These results are noisy so it is hard to make a definite conclusion, but
there does appear to be a significant effect.
On other platforms, increase the sizes of all size classes by a fixed offset
equal to the size of the allocation header. This has also been found to improve
density, since it is likely for allocation sizes to be a power of 2, which
would otherwise waste space by pushing the allocation into the next size class.
Differential Revision: https://reviews.llvm.org/D73824
Summary:
This CL changes multiple things to improve performance (notably on
Android).We introduce a cache class for the Secondary that is taking
care of this mechanism now.
The changes:
- change the Secondary "freelist" to an array. By keeping free secondary
blocks linked together through their headers, we were keeping a page
per block, which isn't great. Also we know touch less pages when
walking the new "freelist".
- fix an issue with the freelist getting full: if the pattern is an ever
increasing size malloc then free, the freelist would fill up and
entries would not be used. So now we empty the list if we get to many
"full" events;
- use the global release to os interval option for the secondary: it
was too costly to release all the time, particularly for pattern that
are malloc(X)/free(X)/malloc(X). Now the release will only occur
after the selected interval, when going through the deallocate path;
- allow release of the `BatchClassId` class: it is releasable, we just
have to make sure we don't mark the batches containing batches
pointers as free.
- change the default release interval to 1s for Android to match the
current Bionic allocator configuration. A patch is coming up to allow
changing it through `mallopt`.
- lower the smallest class that can be released to `PageSize/64`.
Reviewers: cferris, pcc, eugenis, morehouse, hctim
Subscribers: phosek, #sanitizers, llvm-commits
Tags: #sanitizers, #llvm
Differential Revision: https://reviews.llvm.org/D73507
Summary:
* Implement enable() and disable() in GWP-ASan.
* Setup atfork handler.
* Improve test harness sanity and re-enable GWP-ASan in Scudo.
Scudo_standalone disables embedded GWP-ASan as necessary around fork().
Standalone GWP-ASan sets the atfork handler in init() if asked to. This
requires a working malloc(), therefore GWP-ASan initialization in Scudo
is delayed to the post-init callback.
Test harness changes are about setting up a single global instance of
the GWP-ASan allocator so that pthread_atfork() does not create
dangling pointers.
Test case shamelessly stolen from D72470.
Reviewers: cryptoad, hctim, jfb
Subscribers: mgorny, jfb, #sanitizers, llvm-commits
Tags: #sanitizers, #llvm
Differential Revision: https://reviews.llvm.org/D73294
When the hardware and operating system support the ARM Memory Tagging
Extension, tag primary allocation granules with a random tag. The granules
either side of the allocation are tagged with tag 0, which is normally
excluded from the set of tags that may be selected randomly. Memory is
also retagged with a random tag when it is freed, and we opportunistically
reuse the new tag when the block is reused to reduce overhead. This causes
linear buffer overflows to be caught deterministically and non-linear buffer
overflows and use-after-free to be caught probabilistically.
This feature is currently only enabled for the Android allocator
and depends on an experimental Linux kernel branch available here:
https://github.com/pcc/linux/tree/android-experimental-mte
All code that depends on the kernel branch is hidden behind a macro,
ANDROID_EXPERIMENTAL_MTE. This is the same macro that is used by the Android
platform and may only be defined in non-production configurations. When the
userspace interface is finalized the code will be updated to use the stable
interface and all #ifdef ANDROID_EXPERIMENTAL_MTE will be removed.
Differential Revision: https://reviews.llvm.org/D70762
Summary:
In order to be compliant with tcmalloc's extension ownership
determination function, we have to expose a function that will
say if a chunk was allocated by us.
As to whether or not this has security consequences: someone
able to call this function repeatedly could use it to determine
secrets (cookie) or craft a valid header. So this should not be
exposed directly to untrusted user input.
Add related tests.
Additionally clang-format caught a few things to change.
Reviewers: hctim, pcc, cferris, eugenis, vitalybuka
Subscribers: JDevlieghere, jfb, #sanitizers, llvm-commits
Tags: #sanitizers, #llvm
Differential Revision: https://reviews.llvm.org/D70908
This test was previously effectively doing:
P = malloc(X); write X bytes to P; P = realloc(P, X - Y); P = realloc(P, X)
and expecting that all X bytes stored to P would still be identical after
the final realloc.
This happens to be true for the current scudo implementation of realloc,
but is not guaranteed to be true by the C standard ("Any bytes in the new
object beyond the size of the old object have indeterminate values.").
This implementation detail will change with the new memory tagging support,
which unconditionally zeros newly allocated granules when memory tagging
is enabled. Fix this by limiting the number of bytes that we test to the
minimum size that we realloc the allocation to.
Differential Revision: https://reviews.llvm.org/D70761
Summary:
This CL makes unit tests compatible with Fuchsia's zxtest. This
required a few changes here and there, but also unearthed some
incompatibilities that had to be addressed.
A header is introduced to allow to account for the zxtest/gtest
differences, some `#if SCUDO_FUCHSIA` are used to disable incompatible
code (the 32-bit primary, or the exclusive TSD).
It also brought to my attention that I was using
`__scudo_default_options` in different tests, which ended up in a
single binary, and I am not sure how that ever worked. So move
this to the main cpp.
Additionally fully disable the secondary freelist on Fuchsia as we do
not track VMOs for secondary allocations, so no release possible.
With some modifications to Scudo's BUILD.gn in Fuchsia:
```
[==========] 79 tests from 23 test cases ran (10280 ms total).
[ PASSED ] 79 tests
```
Reviewers: mcgrathr, phosek, hctim, pcc, eugenis, cferris
Subscribers: srhines, jfb, #sanitizers, llvm-commits
Tags: #sanitizers, #llvm
Differential Revision: https://reviews.llvm.org/D70682
Summary:
cferris@ found an issue where calling `releaseToOS` prior to any other
heap operation would lead to a crash, due to the allocator not being
properly initialized (it was discovered via `mallopt`).
The fix is to call `initThreadMaybe` prior to calling `releaseToOS` for
the Primary.
Add a test that crashes prior to fix.
Reviewers: hctim, cferris, pcc, eugenis
Subscribers: #sanitizers, llvm-commits
Tags: #sanitizers, #llvm
Differential Revision: https://reviews.llvm.org/D70552
Summary:
cferris@ found an issue due to the new Secondary free list behavior
and unfortunately it's completely my fault. The issue is twofold:
- I lost track of the (major) fact that the Combined assumes that
all chunks returned by the Secondary are zero'd out apprioriately
when dealing with `ZeroContents`. With the introduction of the
freelist, it's no longer the case as there can be a small portion
of memory between the header and the next page boundary that is
left untouched (the rest is zero'd via release). So the next time
that block is returned, it's not fully zero'd out.
- There was no test that would exercise that behavior :(
There are several ways to fix this, the one I chose makes the most
sense to me: we pass `ZeroContents` to the Secondary's `allocate`
and it zero's out the block if requested and it's coming from the
freelist. The prevents an extraneous `memset` in case the block
comes from `map`. Another possbility could have been to `memset`
in `deallocate`, but it's probably overzealous as all secondary
blocks don't need to be zero'd out.
Add a test that would have found the issue prior to fix.
Reviewers: morehouse, hctim, cferris, pcc, eugenis, vitalybuka
Subscribers: #sanitizers, llvm-commits
Tags: #sanitizers, #llvm
Differential Revision: https://reviews.llvm.org/D69675
Summary:
The secondary allocator is slow, because we map and unmap each block
on allocation and deallocation.
While I really like the security benefits of such a behavior, this
yields very disappointing performance numbers on Android for larger
allocation benchmarks.
So this change adds a free list to the secondary, that will hold
recently deallocated chunks, and (currently) release the extraneous
memory. This allows to save on some memory mapping operations on
allocation and deallocation. I do not think that this lowers the
security of the secondary, but can increase the memory footprint a
little bit (RSS & VA).
The maximum number of blocks the free list can hold is templatable,
`0U` meaning that we fallback to the old behavior. The higher that
number, the higher the extra memory footprint.
I added default configurations for all our platforms, but they are
likely to change in the near future based on needs and feedback.
Reviewers: hctim, morehouse, cferris, pcc, eugenis, vitalybuka
Subscribers: mgorny, #sanitizers, llvm-commits
Tags: #sanitizers, #llvm
Differential Revision: https://reviews.llvm.org/D69570
Summary:
Following up on D68471, this CL introduces some `getStats` APIs to
gather statistics in char buffers (`ScopedString` really) instead of
printing them out right away. Ultimately `printStats` will just
output the buffer, but that allows us to potentially do some work
on the intermediate buffer, and can be used for a `mallocz` type
of functionality. This allows us to pretty much get rid of all the
`Printf` calls around, but I am keeping the function in for
debugging purposes.
This changes the existing tests to use the new APIs when required.
I will add new tests as suggested in D68471 in another CL.
Reviewers: morehouse, hctim, vitalybuka, eugenis, cferris
Reviewed By: morehouse
Subscribers: delcypher, #sanitizers, llvm-commits
Tags: #llvm, #sanitizers
Differential Revision: https://reviews.llvm.org/D68653
llvm-svn: 374173
Summary:
Initially, our malloc_info was returning ENOTSUP, but Android would
rather have it return successfully and write a barebone XML to the
stream, so we will oblige.
Add an associated test.
Reviewers: cferris, morehouse, hctim, eugenis, vitalybuka
Reviewed By: morehouse
Subscribers: delcypher, #sanitizers, llvm-commits
Tags: #llvm, #sanitizers
Differential Revision: https://reviews.llvm.org/D68427
llvm-svn: 373754
Summary:
This changes a few things to improve memory footprint and performances
on Android, and fixes a test compilation error:
- add `stdlib.h` to `wrappers_c_test.cc` to address
https://bugs.llvm.org/show_bug.cgi?id=42810
- change Android size class maps, based on benchmarks, to improve
performances and lower the Svelte memory footprint. Also change the
32-bit region size for said configuration
- change the `reallocate` logic to reallocate in place for sizes larger
than the original chunk size, when they still fit in the same block.
This addresses patterns from `memory_replay` dumps like the following:
```
202: realloc 0xb48fd000 0xb4930650 12352
202: realloc 0xb48fd000 0xb48fd000 12420
202: realloc 0xb48fd000 0xb48fd000 12492
202: realloc 0xb48fd000 0xb48fd000 12564
202: realloc 0xb48fd000 0xb48fd000 12636
202: realloc 0xb48fd000 0xb48fd000 12708
202: realloc 0xb48fd000 0xb48fd000 12780
202: realloc 0xb48fd000 0xb48fd000 12852
202: realloc 0xb48fd000 0xb48fd000 12924
202: realloc 0xb48fd000 0xb48fd000 12996
202: realloc 0xb48fd000 0xb48fd000 13068
202: realloc 0xb48fd000 0xb48fd000 13140
202: realloc 0xb48fd000 0xb48fd000 13212
202: realloc 0xb48fd000 0xb48fd000 13284
202: realloc 0xb48fd000 0xb48fd000 13356
202: realloc 0xb48fd000 0xb48fd000 13428
202: realloc 0xb48fd000 0xb48fd000 13500
202: realloc 0xb48fd000 0xb48fd000 13572
202: realloc 0xb48fd000 0xb48fd000 13644
202: realloc 0xb48fd000 0xb48fd000 13716
202: realloc 0xb48fd000 0xb48fd000 13788
...
```
In this situation we were deallocating the old chunk, and
allocating a new one for every single one of those, but now we can
keep the same chunk (we just updated the header), which saves some
heap operations.
Reviewers: hctim, morehouse, vitalybuka, eugenis, cferris, rengolin
Reviewed By: morehouse
Subscribers: srhines, delcypher, #sanitizers, llvm-commits
Tags: #llvm, #sanitizers
Differential Revision: https://reviews.llvm.org/D67293
llvm-svn: 371628