This patch enhances the secondary allocator to be able to detect buffer
overflow, and (on hardware supporting memory tagging) use-after-free
and buffer underflow.
Use-after-free detection is implemented by setting memory page
protection to PROT_NONE on free. Because this must be done immediately
rather than after the memory has been quarantined, we no longer use the
combined allocator quarantine for secondary allocations. Instead, a
quarantine has been added to the secondary allocator cache.
Buffer overflow detection is implemented by aligning the allocation
to the right of the writable pages, so that any overflows will
spill into the guard page to the right of the allocation, which
will have PROT_NONE page protection. Because this would require the
secondary allocator to produce a header at the correct position,
the responsibility for ensuring chunk alignment has been moved to
the secondary allocator.
Buffer underflow detection has been implemented on hardware supporting
memory tagging by tagging the memory region between the start of the
mapping and the start of the allocation with a non-zero tag. Due to
the cost of pre-tagging secondary allocations and the memory bandwidth
cost of tagged accesses, the allocation itself uses a tag of 0 and
only the first four pages have memory tagging enabled.
This is a reland of commit 7a0da88943 which was reverted in commit
9678b07e42. This reland includes the following changes:
- Fix the calculation of BlockSize which led to incorrect statistics
returned by mallinfo().
- Add -Wno-pedantic to silence GCC warning.
- Optionally add some slack at the end of secondary allocations to help
work around buggy applications that read off the end of their
allocation.
Differential Revision: https://reviews.llvm.org/D93731
This CL introduces configuration options to allow pointers to be
compacted in the thread-specific caches and transfer batches. This
offers the possibility to have them use 32-bit of space instead of
64-bit for the 64-bit Primary, thus cutting the size of the caches
and batches by nearly half (and as such the memory used in size
class 0). The cost is an additional read from the region information
in the fast path.
This is not a new idea, as it's being used in the sanitizer_common
64-bit primary. The difference here is that it is configurable via
the allocator config, with the possibility of not compacting at all.
This CL enables compacting pointers in the Android and Fuchsia default
configurations.
Differential Revision: https://reviews.llvm.org/D96435
This patch enhances the secondary allocator to be able to detect buffer
overflow, and (on hardware supporting memory tagging) use-after-free
and buffer underflow.
Use-after-free detection is implemented by setting memory page
protection to PROT_NONE on free. Because this must be done immediately
rather than after the memory has been quarantined, we no longer use the
combined allocator quarantine for secondary allocations. Instead, a
quarantine has been added to the secondary allocator cache.
Buffer overflow detection is implemented by aligning the allocation
to the right of the writable pages, so that any overflows will
spill into the guard page to the right of the allocation, which
will have PROT_NONE page protection. Because this would require the
secondary allocator to produce a header at the correct position,
the responsibility for ensuring chunk alignment has been moved to
the secondary allocator.
Buffer underflow detection has been implemented on hardware supporting
memory tagging by tagging the memory region between the start of the
mapping and the start of the allocation with a non-zero tag. Due to
the cost of pre-tagging secondary allocations and the memory bandwidth
cost of tagged accesses, the allocation itself uses a tag of 0 and
only the first four pages have memory tagging enabled.
Differential Revision: https://reviews.llvm.org/D93731
The primary and secondary allocators will need to share this bit,
so move the management of the bit to the combined allocator and
make useMemoryTagging() a free function.
Differential Revision: https://reviews.llvm.org/D93730
canAllocate() does not take into account the header size so it does
not return the right answer in borderline cases. There was already
code handling this correctly in isTaggedAllocation() so split it out
into a separate function and call it from the test.
Furthermore the test was incorrect when MTE is enabled because MTE
does not pattern fill primary allocations. Fix it.
Differential Revision: https://reviews.llvm.org/D93437
Make these arguments named constants in the Config class instead
of being positional arguments to MapAllocatorCache. This makes the
configuration easier to follow.
Eventually we should follow suit with the other classes but this is
a start.
Differential Revision: https://reviews.llvm.org/D93251
Fix a potential UB in `appendSignedDecimal` (with -INT64_MIN) by making
it a special case.
Fix the terrible test cases for `isOwned`: I was pretty sloppy on those
and used some stack & static variables, but since `isOwned` accesses
memory prior to the pointer to check for the validity of the Scudo
header, it ended up being detected as some global and stack buffer out
of bounds accesses. So not I am using buffers with enough room so that
the test will not access memory prior to the variables.
With those fixes, the tests pass on the ASan+UBSan Fuchsia build.
Thanks to Roland for pointing those out!
Differential Revision: https://reviews.llvm.org/D88170
Here "memory initialization" refers to zero- or pattern-init on
non-MTE hardware, or (where possible to avoid) memory tagging on MTE
hardware. With shared TSD the per-thread memory initialization state
is stored in bit 0 of the TLS slot, similar to PointerIntPair in LLVM.
Differential Revision: https://reviews.llvm.org/D87739
I had left this as a TODO, but it turns out it wasn't complicated.
By specifying `MAP_RESIZABLE`, it allows us to keep the VMO which we
can then use for release purposes.
`releasePagesToOS` also had to be called the "proper" way, as Fuchsia
requires the `Offset` field to be correct. This has no impact on
non-Fuchsia platforms.
Differential Revision: https://reviews.llvm.org/D86800
Summary:
Partners have requested the ability to configure more parts of Scudo
at runtime, notably the Secondary cache options (maximum number of
blocks cached, maximum size) as well as the TSD registry options
(the maximum number of TSDs in use).
This CL adds a few more Scudo specific `mallopt` parameters that are
passed down to the various subcomponents of the Combined allocator.
- `M_CACHE_COUNT_MAX`: sets the maximum number of Secondary cached items
- `M_CACHE_SIZE_MAX`: sets the maximum size of a cacheable item in the Secondary
- `M_TSDS_COUNT_MAX`: sets the maximum number of TSDs that can be used (Shared Registry only)
Regarding the TSDs maximum count, this is a one way option, only
allowing to increase the count.
In order to allow for this, I rearranged the code to have some `setOption`
member function to the relevant classes, using the `scudo::Option` class
enum to determine what is to be set.
This also fixes an issue where a static variable (`Ready`) was used in
templated functions without being set back to `false` every time.
Reviewers: pcc, eugenis, hctim, cferris
Subscribers: jfb, llvm-commits, #sanitizers
Tags: #sanitizers
Differential Revision: https://reviews.llvm.org/D84667
make_unique is a C++14 feature, and this prevents us from building on
Ubuntu Trusty. While we do use a C++14 compatible toolchain for building
in general, we fall back to the system toolchain for building the
compiler-rt tests.
The reason is that those tests get cross-compiled for e.g. 32-bit and
64-bit x86, and while the toolchain provides libstdc++ in those
flavours, the resulting compiler-rt test binaries don't get RPATH set
and so won't start if they're linked with that toolchain.
We've tried linking the test binaries against libstdc++ statically, by
passing COMPILER_RT_TEST_COMPILER_CFLAGS=-static-libstdc++. That mostly
works, but some test targets append -lstdc++ to the compiler invocation.
So, after spending way too much time on this, let's just avoid C++14
here for now.
This guarantees that we will detect a buffer overflow or underflow
that overwrites an adjacent block. This spatial guarantee is similar
to the temporal guarantee that we provide for immediate use-after-free.
Enabling odd/even tags involves a tradeoff between use-after-free
detection and buffer overflow detection. Odd/even tags make it more
likely for buffer overflows to be detected by increasing the size of
the guaranteed "red zone" around the allocation, but on the other
hand use-after-free is less likely to be detected because the tag
space for any particular chunk is cut in half. Therefore we introduce
a tuning setting to control whether odd/even tags are enabled.
Differential Revision: https://reviews.llvm.org/D84361
Summary:
When enabling some malloc debug features on Android, multiple 32 bit
regions become exhausted, and the allocations fail. Allow allocations
to keep trying each bigger class in the Primary until it finds a fit.
In addition, some Android tests running on 32 bit fail sometimes due
to a running out of space in two regions, and then fail the allocation.
Reviewers: cryptoad
Reviewed By: cryptoad
Subscribers: #sanitizers, llvm-commits
Tags: #sanitizers
Differential Revision: https://reviews.llvm.org/D82070
Summary:
Implement pattern initialization of memory (excluding the secondary
allocator because it already has predictable memory contents).
Expose both zero and pattern initialization through the C API.
Reviewers: pcc, cryptoad
Subscribers: #sanitizers, llvm-commits
Tags: #sanitizers
Differential Revision: https://reviews.llvm.org/D79133
Summary:
We introduced a way to fallback to the immediately larger size class for
the Primary in the event a region was full, but in the event of the largest
size class, we would just fail.
This change allows to fallback to the Secondary when the last region of
the Primary is full. We also expand the trick to all platforms as opposed
to being Android only, and update the test to cover the new case.
Reviewers: hctim, cferris, eugenis, morehouse, pcc
Subscribers: #sanitizers, llvm-commits
Tags: #sanitizers
Differential Revision: https://reviews.llvm.org/D76430
Summary:
Due to Unity, we had to reduce our region sizes, but in some rare
situations, some programs (mostly tests AFAICT) manage to fill up
a region for a given size class.
So this adds a workaround for that attempts to allocate the block
from the immediately larger size class, wasting some memory but
allowing the application to keep going.
Reviewers: pcc, eugenis, cferris, hctim, morehouse
Subscribers: #sanitizers, llvm-commits
Tags: #sanitizers, #llvm
Differential Revision: https://reviews.llvm.org/D74567
Add an optional table lookup after the existing logarithm computation
for MidSize < Size <= MaxSize during size -> class lookups. The lookup is
O(1) due to indexing a precomputed (via constexpr) table based on a size
table. Switch to this approach for the Android size class maps.
Other approaches considered:
- Binary search was found to have an unacceptable (~30%) performance cost.
- An approach using NEON instructions (see older version of D73824) was found
to be slightly slower than this approach on newer SoCs but significantly
slower on older ones.
By selecting the values in the size tables to minimize wastage (for example,
by passing the malloc_info output of a target program to the included
compute_size_class_config program), we can increase the density of allocations
at a small (~0.5% on bionic malloc_sql_trace as measured using an identity
table) performance cost.
Reduces RSS on specific Android processes as follows (KB):
Before After
zygote (median of 50 runs) 26836 26792 (-0.2%)
zygote64 (median of 50 runs) 30384 30076 (-1.0%)
dex2oat (median of 3 runs) 375792 372952 (-0.8%)
I also measured the amount of whole-system idle dirty heap on Android by
rebooting the system and then running the following script repeatedly until
the results were stable:
for i in $(seq 1 50); do grep -A5 scudo: /proc/*/smaps | grep Pss: | cut -d: -f2 | awk '{s+=$1} END {print s}' ; sleep 1; done
I did this 3 times both before and after this change and the results were:
Before: 365650, 356795, 372663
After: 344521, 356328, 342589
These results are noisy so it is hard to make a definite conclusion, but
there does appear to be a significant effect.
On other platforms, increase the sizes of all size classes by a fixed offset
equal to the size of the allocation header. This has also been found to improve
density, since it is likely for allocation sizes to be a power of 2, which
would otherwise waste space by pushing the allocation into the next size class.
Differential Revision: https://reviews.llvm.org/D73824
Summary:
This CL changes multiple things to improve performance (notably on
Android).We introduce a cache class for the Secondary that is taking
care of this mechanism now.
The changes:
- change the Secondary "freelist" to an array. By keeping free secondary
blocks linked together through their headers, we were keeping a page
per block, which isn't great. Also we know touch less pages when
walking the new "freelist".
- fix an issue with the freelist getting full: if the pattern is an ever
increasing size malloc then free, the freelist would fill up and
entries would not be used. So now we empty the list if we get to many
"full" events;
- use the global release to os interval option for the secondary: it
was too costly to release all the time, particularly for pattern that
are malloc(X)/free(X)/malloc(X). Now the release will only occur
after the selected interval, when going through the deallocate path;
- allow release of the `BatchClassId` class: it is releasable, we just
have to make sure we don't mark the batches containing batches
pointers as free.
- change the default release interval to 1s for Android to match the
current Bionic allocator configuration. A patch is coming up to allow
changing it through `mallopt`.
- lower the smallest class that can be released to `PageSize/64`.
Reviewers: cferris, pcc, eugenis, morehouse, hctim
Subscribers: phosek, #sanitizers, llvm-commits
Tags: #sanitizers, #llvm
Differential Revision: https://reviews.llvm.org/D73507
Summary:
* Implement enable() and disable() in GWP-ASan.
* Setup atfork handler.
* Improve test harness sanity and re-enable GWP-ASan in Scudo.
Scudo_standalone disables embedded GWP-ASan as necessary around fork().
Standalone GWP-ASan sets the atfork handler in init() if asked to. This
requires a working malloc(), therefore GWP-ASan initialization in Scudo
is delayed to the post-init callback.
Test harness changes are about setting up a single global instance of
the GWP-ASan allocator so that pthread_atfork() does not create
dangling pointers.
Test case shamelessly stolen from D72470.
Reviewers: cryptoad, hctim, jfb
Subscribers: mgorny, jfb, #sanitizers, llvm-commits
Tags: #sanitizers, #llvm
Differential Revision: https://reviews.llvm.org/D73294
When the hardware and operating system support the ARM Memory Tagging
Extension, tag primary allocation granules with a random tag. The granules
either side of the allocation are tagged with tag 0, which is normally
excluded from the set of tags that may be selected randomly. Memory is
also retagged with a random tag when it is freed, and we opportunistically
reuse the new tag when the block is reused to reduce overhead. This causes
linear buffer overflows to be caught deterministically and non-linear buffer
overflows and use-after-free to be caught probabilistically.
This feature is currently only enabled for the Android allocator
and depends on an experimental Linux kernel branch available here:
https://github.com/pcc/linux/tree/android-experimental-mte
All code that depends on the kernel branch is hidden behind a macro,
ANDROID_EXPERIMENTAL_MTE. This is the same macro that is used by the Android
platform and may only be defined in non-production configurations. When the
userspace interface is finalized the code will be updated to use the stable
interface and all #ifdef ANDROID_EXPERIMENTAL_MTE will be removed.
Differential Revision: https://reviews.llvm.org/D70762
Summary:
In order to be compliant with tcmalloc's extension ownership
determination function, we have to expose a function that will
say if a chunk was allocated by us.
As to whether or not this has security consequences: someone
able to call this function repeatedly could use it to determine
secrets (cookie) or craft a valid header. So this should not be
exposed directly to untrusted user input.
Add related tests.
Additionally clang-format caught a few things to change.
Reviewers: hctim, pcc, cferris, eugenis, vitalybuka
Subscribers: JDevlieghere, jfb, #sanitizers, llvm-commits
Tags: #sanitizers, #llvm
Differential Revision: https://reviews.llvm.org/D70908
This test was previously effectively doing:
P = malloc(X); write X bytes to P; P = realloc(P, X - Y); P = realloc(P, X)
and expecting that all X bytes stored to P would still be identical after
the final realloc.
This happens to be true for the current scudo implementation of realloc,
but is not guaranteed to be true by the C standard ("Any bytes in the new
object beyond the size of the old object have indeterminate values.").
This implementation detail will change with the new memory tagging support,
which unconditionally zeros newly allocated granules when memory tagging
is enabled. Fix this by limiting the number of bytes that we test to the
minimum size that we realloc the allocation to.
Differential Revision: https://reviews.llvm.org/D70761
Summary:
This CL makes unit tests compatible with Fuchsia's zxtest. This
required a few changes here and there, but also unearthed some
incompatibilities that had to be addressed.
A header is introduced to allow to account for the zxtest/gtest
differences, some `#if SCUDO_FUCHSIA` are used to disable incompatible
code (the 32-bit primary, or the exclusive TSD).
It also brought to my attention that I was using
`__scudo_default_options` in different tests, which ended up in a
single binary, and I am not sure how that ever worked. So move
this to the main cpp.
Additionally fully disable the secondary freelist on Fuchsia as we do
not track VMOs for secondary allocations, so no release possible.
With some modifications to Scudo's BUILD.gn in Fuchsia:
```
[==========] 79 tests from 23 test cases ran (10280 ms total).
[ PASSED ] 79 tests
```
Reviewers: mcgrathr, phosek, hctim, pcc, eugenis, cferris
Subscribers: srhines, jfb, #sanitizers, llvm-commits
Tags: #sanitizers, #llvm
Differential Revision: https://reviews.llvm.org/D70682
Summary:
cferris@ found an issue where calling `releaseToOS` prior to any other
heap operation would lead to a crash, due to the allocator not being
properly initialized (it was discovered via `mallopt`).
The fix is to call `initThreadMaybe` prior to calling `releaseToOS` for
the Primary.
Add a test that crashes prior to fix.
Reviewers: hctim, cferris, pcc, eugenis
Subscribers: #sanitizers, llvm-commits
Tags: #sanitizers, #llvm
Differential Revision: https://reviews.llvm.org/D70552
Summary:
cferris@ found an issue due to the new Secondary free list behavior
and unfortunately it's completely my fault. The issue is twofold:
- I lost track of the (major) fact that the Combined assumes that
all chunks returned by the Secondary are zero'd out apprioriately
when dealing with `ZeroContents`. With the introduction of the
freelist, it's no longer the case as there can be a small portion
of memory between the header and the next page boundary that is
left untouched (the rest is zero'd via release). So the next time
that block is returned, it's not fully zero'd out.
- There was no test that would exercise that behavior :(
There are several ways to fix this, the one I chose makes the most
sense to me: we pass `ZeroContents` to the Secondary's `allocate`
and it zero's out the block if requested and it's coming from the
freelist. The prevents an extraneous `memset` in case the block
comes from `map`. Another possbility could have been to `memset`
in `deallocate`, but it's probably overzealous as all secondary
blocks don't need to be zero'd out.
Add a test that would have found the issue prior to fix.
Reviewers: morehouse, hctim, cferris, pcc, eugenis, vitalybuka
Subscribers: #sanitizers, llvm-commits
Tags: #sanitizers, #llvm
Differential Revision: https://reviews.llvm.org/D69675
Summary:
The secondary allocator is slow, because we map and unmap each block
on allocation and deallocation.
While I really like the security benefits of such a behavior, this
yields very disappointing performance numbers on Android for larger
allocation benchmarks.
So this change adds a free list to the secondary, that will hold
recently deallocated chunks, and (currently) release the extraneous
memory. This allows to save on some memory mapping operations on
allocation and deallocation. I do not think that this lowers the
security of the secondary, but can increase the memory footprint a
little bit (RSS & VA).
The maximum number of blocks the free list can hold is templatable,
`0U` meaning that we fallback to the old behavior. The higher that
number, the higher the extra memory footprint.
I added default configurations for all our platforms, but they are
likely to change in the near future based on needs and feedback.
Reviewers: hctim, morehouse, cferris, pcc, eugenis, vitalybuka
Subscribers: mgorny, #sanitizers, llvm-commits
Tags: #sanitizers, #llvm
Differential Revision: https://reviews.llvm.org/D69570
Summary:
Following up on D68471, this CL introduces some `getStats` APIs to
gather statistics in char buffers (`ScopedString` really) instead of
printing them out right away. Ultimately `printStats` will just
output the buffer, but that allows us to potentially do some work
on the intermediate buffer, and can be used for a `mallocz` type
of functionality. This allows us to pretty much get rid of all the
`Printf` calls around, but I am keeping the function in for
debugging purposes.
This changes the existing tests to use the new APIs when required.
I will add new tests as suggested in D68471 in another CL.
Reviewers: morehouse, hctim, vitalybuka, eugenis, cferris
Reviewed By: morehouse
Subscribers: delcypher, #sanitizers, llvm-commits
Tags: #llvm, #sanitizers
Differential Revision: https://reviews.llvm.org/D68653
llvm-svn: 374173
Summary:
Initially, our malloc_info was returning ENOTSUP, but Android would
rather have it return successfully and write a barebone XML to the
stream, so we will oblige.
Add an associated test.
Reviewers: cferris, morehouse, hctim, eugenis, vitalybuka
Reviewed By: morehouse
Subscribers: delcypher, #sanitizers, llvm-commits
Tags: #llvm, #sanitizers
Differential Revision: https://reviews.llvm.org/D68427
llvm-svn: 373754
Summary:
This changes a few things to improve memory footprint and performances
on Android, and fixes a test compilation error:
- add `stdlib.h` to `wrappers_c_test.cc` to address
https://bugs.llvm.org/show_bug.cgi?id=42810
- change Android size class maps, based on benchmarks, to improve
performances and lower the Svelte memory footprint. Also change the
32-bit region size for said configuration
- change the `reallocate` logic to reallocate in place for sizes larger
than the original chunk size, when they still fit in the same block.
This addresses patterns from `memory_replay` dumps like the following:
```
202: realloc 0xb48fd000 0xb4930650 12352
202: realloc 0xb48fd000 0xb48fd000 12420
202: realloc 0xb48fd000 0xb48fd000 12492
202: realloc 0xb48fd000 0xb48fd000 12564
202: realloc 0xb48fd000 0xb48fd000 12636
202: realloc 0xb48fd000 0xb48fd000 12708
202: realloc 0xb48fd000 0xb48fd000 12780
202: realloc 0xb48fd000 0xb48fd000 12852
202: realloc 0xb48fd000 0xb48fd000 12924
202: realloc 0xb48fd000 0xb48fd000 12996
202: realloc 0xb48fd000 0xb48fd000 13068
202: realloc 0xb48fd000 0xb48fd000 13140
202: realloc 0xb48fd000 0xb48fd000 13212
202: realloc 0xb48fd000 0xb48fd000 13284
202: realloc 0xb48fd000 0xb48fd000 13356
202: realloc 0xb48fd000 0xb48fd000 13428
202: realloc 0xb48fd000 0xb48fd000 13500
202: realloc 0xb48fd000 0xb48fd000 13572
202: realloc 0xb48fd000 0xb48fd000 13644
202: realloc 0xb48fd000 0xb48fd000 13716
202: realloc 0xb48fd000 0xb48fd000 13788
...
```
In this situation we were deallocating the old chunk, and
allocating a new one for every single one of those, but now we can
keep the same chunk (we just updated the header), which saves some
heap operations.
Reviewers: hctim, morehouse, vitalybuka, eugenis, cferris, rengolin
Reviewed By: morehouse
Subscribers: srhines, delcypher, #sanitizers, llvm-commits
Tags: #llvm, #sanitizers
Differential Revision: https://reviews.llvm.org/D67293
llvm-svn: 371628