This patch does a few cleanup things:
1. The non-standalone scudo has a problem where GWP-ASan allocations
may not meet alignment requirements where Scudo was requested to have
alignment >= 16. Use the new GWP-ASan API to fix this.
2. The standalone variant loses some debugging information inside of
GWP-ASan because we ask GWP-ASan to allocate an aligned size in the
frontend. This means reports end up with 'UaF on a 16-byte allocation'
for a 1-byte allocation with 16-byte alignment. Also use the new API to
fix this.
3. Add post-alloc hooks for GWP-ASan intercepted allocations, and add
stats tracking for GWP-ASan allocations.
4. Add a small test that checks the alignment of the frontend
allocator, so that it can be used under GWP-ASan torture mode.
5. Add GWP-ASan torture mode as a testing configuration to catch these
regressions.
Depends on D94830, D95889.
Reviewed By: cryptoad
Differential Revision: https://reviews.llvm.org/D95884
From a cache perspective it's better to store the header immediately
after loading it. If we delay this operation until after we've
retagged it's more likely that our header will have been evicted from
the cache and we'll need to fetch it again in order to perform the
compare-exchange operation.
For similar reasons, store the deallocation stack before retagging
instead of afterwards.
Differential Revision: https://reviews.llvm.org/D101137
It looks like there's some old version of gcc that doesn't like this
static_assert (I couldn't reproduce the issue with gcc 8, 9 or 10).
Work around the issue by only checking the static_assert under clang,
which should provide sufficient coverage.
Should hopefully fix this buildbot:
https://lab.llvm.org/buildbot/#/builders/112/builds/5356
With AndroidSizeClassMap all of the LSBs are in the range 4-6 so we
only need 2 bits of information per size class. Furthermore we have
32 size classes, which conveniently lets us fit all of the information
into a 64-bit integer. Do so if possible so that we can avoid a table
lookup entirely.
Differential Revision: https://reviews.llvm.org/D101105
In the most common case we call computeOddEvenMaskForPointerMaybe()
from quarantineOrDeallocateChunk(), in which case we need to look up
the class size from the SizeClassMap in order to compute the LSB. Since
we need to do a lookup anyway, we may as well look up the LSB itself
and avoid computing it every time.
While here, switch to a slightly more efficient way of computing the
odd/even mask.
Differential Revision: https://reviews.llvm.org/D101018
Since we already have a tagged pointer available to us, we can just
extract the tag from it and avoid an LDG instruction.
Differential Revision: https://reviews.llvm.org/D101014
Now that we have a more efficient implementation of storeTags(),
we should start using it from resizeTaggedChunk(). With that, plus
a new storeTag() function, resizeTaggedChunk() can be made generic,
and so can prepareTaggedChunk(). Make it so.
Now that the functions are generic, move them to combined.h so that
memtag.h no longer needs to know about chunks.
Differential Revision: https://reviews.llvm.org/D100911
DC GZVA can operate on multiple granules at a time (corresponding to
the CPU's cache line size) so we can generally expect it to be faster
than STZG in a loop.
Differential Revision: https://reviews.llvm.org/D100910
An empty macro that expands to just `... else ;` can get
warnings from some compilers (e.g. GCC's -Wempty-body).
Reviewed By: cryptoad, vitalybuka
Differential Revision: https://reviews.llvm.org/D100693
While attempting to roll the latest Scudo in Fuchsia, some issues
arose. While trying to debug them, it appeared that `DCHECK`s were
also never exercised in Fuchsia. This CL fixes the following
problems:
- the size of a block in the TransferBatch class must be a multiple
of the compact pointer scale. In some cases, it wasn't true, which
lead to obscure crashes. Now, we round up `sizeof(TransferBatch)`.
This only materialized in Fuchsia due to the specific parameters
of the `DefaultConfig`;
- 2 `DCHECK` statements in Fuchsia were incorrect;
- `map()` & co. require a size multiple of a page (as enforced in
Fuchsia `DCHECK`s), which wasn't the case for `PackedCounters`.
- In the Secondary, a parameter was marked as `UNUSED` while it is
actually used.
Differential Revision: https://reviews.llvm.org/D100524
The GNU assembler can't parse `.arch_extension ...` before a `;`.
So instead uniformly use raw string syntax with separate lines
instead of `;` separators in the assembly code.
Reviewed By: pcc
Differential Revision: https://reviews.llvm.org/D100413
D99763 fixed `SizeClassAllocatorLocalCache::drain` but with the
assumption that `BatchClassId` is 0 - which is currently true. I would
rather not make the assumption so that if we ever change the ID of
the batch class, the loop would still work. Since `BatchClassId` is
used more often in `local_cache.h`, introduce a constant so that we
don't have to specify `SizeClassMap::` every time.
Differential Revision: https://reviews.llvm.org/D100062
The check was removed in D99786 as it seems that quarantine is
irrelevant for the just created allocator. However there is internal
issues with tagged memory access.
We should be able to fix iterateOverChunks for taggin later.
Existing implementations took up to 30 minutues to execute on my setup.
Now it's more convenient to debug a single test.
Reviewed By: cryptoad
Differential Revision: https://reviews.llvm.org/D99786
Android's native bridge (i.e. AArch64 emulator) doesn't support TBI so
we need a way to disable TBI on Linux when targeting the native bridge.
This can also be used to test the no-TBI code path on Linux (currently
only used on Fuchsia), or make Scudo compatible with very old
(pre-commit d50240a5f6ceaf690a77b0fccb17be51cfa151c2 from June 2013)
Linux kernels that do not enable TBI.
Differential Revision: https://reviews.llvm.org/D98732
Since we are looking to remove the old Scudo, we have to have a .so for
parity purposes as some platforms use it.
I tested this on Fuchsia & Linux, not on Android though.
Differential Revision: https://reviews.llvm.org/D98456
There is no centralized store of information related to secondary
allocations. Moreover the allocations themselves become inaccessible
when the allocation is freed in order to implement UAF detection,
so we can't store information there to be used in case of UAF
anyway.
Therefore our storage location for tracking stack traces of secondary
allocations is a ring buffer. The ring buffer is copied to the process
creating the crash dump when a fault occurs.
The ring buffer is also used to store stack traces for primary
deallocations. Stack traces for primary allocations continue to be
stored inline.
In order to support the scenario where an access to the ring buffer
is interrupted by a concurrently occurring crash, the ring buffer is
accessed in a lock-free manner.
Differential Revision: https://reviews.llvm.org/D94212
This patch enhances the secondary allocator to be able to detect buffer
overflow, and (on hardware supporting memory tagging) use-after-free
and buffer underflow.
Use-after-free detection is implemented by setting memory page
protection to PROT_NONE on free. Because this must be done immediately
rather than after the memory has been quarantined, we no longer use the
combined allocator quarantine for secondary allocations. Instead, a
quarantine has been added to the secondary allocator cache.
Buffer overflow detection is implemented by aligning the allocation
to the right of the writable pages, so that any overflows will
spill into the guard page to the right of the allocation, which
will have PROT_NONE page protection. Because this would require the
secondary allocator to produce a header at the correct position,
the responsibility for ensuring chunk alignment has been moved to
the secondary allocator.
Buffer underflow detection has been implemented on hardware supporting
memory tagging by tagging the memory region between the start of the
mapping and the start of the allocation with a non-zero tag. Due to
the cost of pre-tagging secondary allocations and the memory bandwidth
cost of tagged accesses, the allocation itself uses a tag of 0 and
only the first four pages have memory tagging enabled.
This is a reland of commit 7a0da88943 which was reverted in commit
9678b07e42. This reland includes the following changes:
- Fix the calculation of BlockSize which led to incorrect statistics
returned by mallinfo().
- Add -Wno-pedantic to silence GCC warning.
- Optionally add some slack at the end of secondary allocations to help
work around buggy applications that read off the end of their
allocation.
Differential Revision: https://reviews.llvm.org/D93731
As of 4f395db86b which contains updates to
-Wfree-nonheap-object, a line in this test will trigger the warning. This
particular line is ok though since it's meant to test a free on a bad pointer.
Differential Revision: https://reviews.llvm.org/D97516
This CL introduces configuration options to allow pointers to be
compacted in the thread-specific caches and transfer batches. This
offers the possibility to have them use 32-bit of space instead of
64-bit for the 64-bit Primary, thus cutting the size of the caches
and batches by nearly half (and as such the memory used in size
class 0). The cost is an additional read from the region information
in the fast path.
This is not a new idea, as it's being used in the sanitizer_common
64-bit primary. The difference here is that it is configurable via
the allocator config, with the possibility of not compacting at all.
This CL enables compacting pointers in the Android and Fuchsia default
configurations.
Differential Revision: https://reviews.llvm.org/D96435
This patch enhances the secondary allocator to be able to detect buffer
overflow, and (on hardware supporting memory tagging) use-after-free
and buffer underflow.
Use-after-free detection is implemented by setting memory page
protection to PROT_NONE on free. Because this must be done immediately
rather than after the memory has been quarantined, we no longer use the
combined allocator quarantine for secondary allocations. Instead, a
quarantine has been added to the secondary allocator cache.
Buffer overflow detection is implemented by aligning the allocation
to the right of the writable pages, so that any overflows will
spill into the guard page to the right of the allocation, which
will have PROT_NONE page protection. Because this would require the
secondary allocator to produce a header at the correct position,
the responsibility for ensuring chunk alignment has been moved to
the secondary allocator.
Buffer underflow detection has been implemented on hardware supporting
memory tagging by tagging the memory region between the start of the
mapping and the start of the allocation with a non-zero tag. Due to
the cost of pre-tagging secondary allocations and the memory bandwidth
cost of tagged accesses, the allocation itself uses a tag of 0 and
only the first four pages have memory tagging enabled.
Differential Revision: https://reviews.llvm.org/D93731
GNU binutils accepts only `.arch_extension memtag` while Clang
accepts either that or `.arch_extension mte` to mean the same thing.
Reviewed By: pcc
Differential Revision: https://reviews.llvm.org/D95996
Adds a new allocation API to GWP-ASan that handles size+alignment
restrictions.
Reviewed By: cryptoad, eugenis
Differential Revision: https://reviews.llvm.org/D94830