212 lines
		
	
	
		
			9.3 KiB
		
	
	
	
		
			ReStructuredText
		
	
	
	
			
		
		
	
	
			212 lines
		
	
	
		
			9.3 KiB
		
	
	
	
		
			ReStructuredText
		
	
	
	
===============
 | 
						|
ShadowCallStack
 | 
						|
===============
 | 
						|
 | 
						|
.. contents::
 | 
						|
   :local:
 | 
						|
 | 
						|
Introduction
 | 
						|
============
 | 
						|
 | 
						|
ShadowCallStack is an instrumentation pass, currently only implemented for
 | 
						|
aarch64, that protects programs against return address overwrites
 | 
						|
(e.g. stack buffer overflows.) It works by saving a function's return address
 | 
						|
to a separately allocated 'shadow call stack' in the function prolog in
 | 
						|
non-leaf functions and loading the return address from the shadow call stack
 | 
						|
in the function epilog. The return address is also stored on the regular stack
 | 
						|
for compatibility with unwinders, but is otherwise unused.
 | 
						|
 | 
						|
The aarch64 implementation is considered production ready, and
 | 
						|
an `implementation of the runtime`_ has been added to Android's libc
 | 
						|
(bionic). An x86_64 implementation was evaluated using Chromium and was found
 | 
						|
to have critical performance and security deficiencies--it was removed in
 | 
						|
LLVM 9.0. Details on the x86_64 implementation can be found in the
 | 
						|
`Clang 7.0.1 documentation`_.
 | 
						|
 | 
						|
.. _`implementation of the runtime`: https://android.googlesource.com/platform/bionic/+/808d176e7e0dd727c7f929622ec017f6e065c582/libc/bionic/pthread_create.cpp#128
 | 
						|
.. _`Clang 7.0.1 documentation`: https://releases.llvm.org/7.0.1/tools/clang/docs/ShadowCallStack.html
 | 
						|
 | 
						|
Comparison
 | 
						|
----------
 | 
						|
 | 
						|
To optimize for memory consumption and cache locality, the shadow call
 | 
						|
stack stores only an array of return addresses. This is in contrast to other
 | 
						|
schemes, like :doc:`SafeStack`, that mirror the entire stack and trade-off
 | 
						|
consuming more memory for shorter function prologs and epilogs with fewer
 | 
						|
memory accesses.
 | 
						|
 | 
						|
`Return Flow Guard`_ is a pure software implementation of shadow call stacks
 | 
						|
on x86_64. Like the previous implementation of ShadowCallStack on x86_64, it is
 | 
						|
inherently racy due to the architecture's use of the stack for calls and
 | 
						|
returns.
 | 
						|
 | 
						|
Intel `Control-flow Enforcement Technology`_ (CET) is a proposed hardware
 | 
						|
extension that would add native support to use a shadow stack to store/check
 | 
						|
return addresses at call/return time. Being a hardware implementation, it
 | 
						|
would not suffer from race conditions and would not incur the overhead of
 | 
						|
function instrumentation, but it does require operating system support.
 | 
						|
 | 
						|
.. _`Return Flow Guard`: https://xlab.tencent.com/en/2016/11/02/return-flow-guard/
 | 
						|
.. _`Control-flow Enforcement Technology`: https://software.intel.com/sites/default/files/managed/4d/2a/control-flow-enforcement-technology-preview.pdf
 | 
						|
 | 
						|
Compatibility
 | 
						|
-------------
 | 
						|
 | 
						|
A runtime is not provided in compiler-rt so one must be provided by the
 | 
						|
compiled application or the operating system. Integrating the runtime into
 | 
						|
the operating system should be preferred since otherwise all thread creation
 | 
						|
and destruction would need to be intercepted by the application.
 | 
						|
 | 
						|
The instrumentation makes use of the platform register ``x18``.  On some
 | 
						|
platforms, ``x18`` is reserved, and on others, it is designated as a scratch
 | 
						|
register.  This generally means that any code that may run on the same thread
 | 
						|
as code compiled with ShadowCallStack must either target one of the platforms
 | 
						|
whose ABI reserves ``x18`` (currently Android, Darwin, Fuchsia and Windows)
 | 
						|
or be compiled with the flag ``-ffixed-x18``. If absolutely necessary, code
 | 
						|
compiled without ``-ffixed-x18`` may be run on the same thread as code that
 | 
						|
uses ShadowCallStack by saving the register value temporarily on the stack
 | 
						|
(`example in Android`_) but this should be done with care since it risks
 | 
						|
leaking the shadow call stack address.
 | 
						|
 | 
						|
.. _`example in Android`: https://android-review.googlesource.com/c/platform/frameworks/base/+/803717
 | 
						|
 | 
						|
Because of the use of register ``x18``, the ShadowCallStack feature is
 | 
						|
incompatible with any other feature that may use ``x18``. However, there
 | 
						|
is no inherent reason why ShadowCallStack needs to use register ``x18``
 | 
						|
specifically; in principle, a platform could choose to reserve and use another
 | 
						|
register for ShadowCallStack, but this would be incompatible with the AAPCS64.
 | 
						|
 | 
						|
Special unwind information is required on functions that are compiled
 | 
						|
with ShadowCallStack and that may be unwound, i.e. functions compiled with
 | 
						|
``-fexceptions`` (which is the default in C++). Some unwinders (such as the
 | 
						|
libgcc 4.9 unwinder) do not understand this unwind info and will segfault
 | 
						|
when encountering it. LLVM libunwind processes this unwind info correctly,
 | 
						|
however. This means that if exceptions are used together with ShadowCallStack,
 | 
						|
the program must use a compatible unwinder.
 | 
						|
 | 
						|
Security
 | 
						|
========
 | 
						|
 | 
						|
ShadowCallStack is intended to be a stronger alternative to
 | 
						|
``-fstack-protector``. It protects from non-linear overflows and arbitrary
 | 
						|
memory writes to the return address slot.
 | 
						|
 | 
						|
The instrumentation makes use of the ``x18`` register to reference the shadow
 | 
						|
call stack, meaning that references to the shadow call stack do not have
 | 
						|
to be stored in memory. This makes it possible to implement a runtime that
 | 
						|
avoids exposing the address of the shadow call stack to attackers that can
 | 
						|
read arbitrary memory. However, attackers could still try to exploit side
 | 
						|
channels exposed by the operating system `[1]`_ `[2]`_ or processor `[3]`_
 | 
						|
to discover the address of the shadow call stack.
 | 
						|
 | 
						|
.. _`[1]`: https://eyalitkin.wordpress.com/2017/09/01/cartography-lighting-up-the-shadows/
 | 
						|
.. _`[2]`: https://www.blackhat.com/docs/eu-16/materials/eu-16-Goktas-Bypassing-Clangs-SafeStack.pdf
 | 
						|
.. _`[3]`: https://www.vusec.net/projects/anc/
 | 
						|
 | 
						|
Unless care is taken when allocating the shadow call stack, it may be
 | 
						|
possible for an attacker to guess its address using the addresses of
 | 
						|
other allocations. Therefore, the address should be chosen to make this
 | 
						|
difficult. One way to do this is to allocate a large guard region without
 | 
						|
read/write permissions, randomly select a small region within it to be
 | 
						|
used as the address of the shadow call stack and mark only that region as
 | 
						|
read/write. This also mitigates somewhat against processor side channels.
 | 
						|
The intent is that the Android runtime `will do this`_, but the platform will
 | 
						|
first need to be `changed`_ to avoid using ``setrlimit(RLIMIT_AS)`` to limit
 | 
						|
memory allocations in certain processes, as this also limits the number of
 | 
						|
guard regions that can be allocated.
 | 
						|
 | 
						|
.. _`will do this`: https://android-review.googlesource.com/c/platform/bionic/+/891622
 | 
						|
.. _`changed`: https://android-review.googlesource.com/c/platform/frameworks/av/+/837745
 | 
						|
 | 
						|
The runtime will need the address of the shadow call stack in order to
 | 
						|
deallocate it when destroying the thread. If the entire program is compiled
 | 
						|
with ``-ffixed-x18``, this is trivial: the address can be derived from the
 | 
						|
value stored in ``x18`` (e.g. by masking out the lower bits). If a guard
 | 
						|
region is used, the address of the start of the guard region could then be
 | 
						|
stored at the start of the shadow call stack itself. But if it is possible
 | 
						|
for code compiled without ``-ffixed-x18`` to run on a thread managed by the
 | 
						|
runtime, which is the case on Android for example, the address must be stored
 | 
						|
somewhere else instead. On Android we store the address of the start of the
 | 
						|
guard region in TLS and deallocate the entire guard region including the
 | 
						|
shadow call stack at thread exit. This is considered acceptable given that
 | 
						|
the address of the start of the guard region is already somewhat guessable.
 | 
						|
 | 
						|
One way in which the address of the shadow call stack could leak is in the
 | 
						|
``jmp_buf`` data structure used by ``setjmp`` and ``longjmp``. The Android
 | 
						|
runtime `avoids this`_ by only storing the low bits of ``x18`` in the
 | 
						|
``jmp_buf``, which requires the address of the shadow call stack to be
 | 
						|
aligned to its size.
 | 
						|
 | 
						|
.. _`avoids this`: https://android.googlesource.com/platform/bionic/+/808d176e7e0dd727c7f929622ec017f6e065c582/libc/arch-arm64/bionic/setjmp.S#49
 | 
						|
 | 
						|
The architecture's call and return instructions (``bl`` and ``ret``) operate on
 | 
						|
a register rather than the stack, which means that leaf functions are generally
 | 
						|
protected from return address overwrites even without ShadowCallStack.
 | 
						|
 | 
						|
Usage
 | 
						|
=====
 | 
						|
 | 
						|
To enable ShadowCallStack, just pass the ``-fsanitize=shadow-call-stack``
 | 
						|
flag to both compile and link command lines. On aarch64, you also need to pass
 | 
						|
``-ffixed-x18`` unless your target already reserves ``x18``.
 | 
						|
 | 
						|
Low-level API
 | 
						|
-------------
 | 
						|
 | 
						|
``__has_feature(shadow_call_stack)``
 | 
						|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 | 
						|
 | 
						|
In some cases one may need to execute different code depending on whether
 | 
						|
ShadowCallStack is enabled. The macro ``__has_feature(shadow_call_stack)`` can
 | 
						|
be used for this purpose.
 | 
						|
 | 
						|
.. code-block:: c
 | 
						|
 | 
						|
    #if defined(__has_feature)
 | 
						|
    #  if __has_feature(shadow_call_stack)
 | 
						|
    // code that builds only under ShadowCallStack
 | 
						|
    #  endif
 | 
						|
    #endif
 | 
						|
 | 
						|
``__attribute__((no_sanitize("shadow-call-stack")))``
 | 
						|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 | 
						|
 | 
						|
Use ``__attribute__((no_sanitize("shadow-call-stack")))`` on a function
 | 
						|
declaration to specify that the shadow call stack instrumentation should not be
 | 
						|
applied to that function, even if enabled globally.
 | 
						|
 | 
						|
Example
 | 
						|
=======
 | 
						|
 | 
						|
The following example code:
 | 
						|
 | 
						|
.. code-block:: c++
 | 
						|
 | 
						|
    int foo() {
 | 
						|
      return bar() + 1;
 | 
						|
    }
 | 
						|
 | 
						|
Generates the following aarch64 assembly when compiled with ``-O2``:
 | 
						|
 | 
						|
.. code-block:: none
 | 
						|
 | 
						|
    stp     x29, x30, [sp, #-16]!
 | 
						|
    mov     x29, sp
 | 
						|
    bl      bar
 | 
						|
    add     w0, w0, #1
 | 
						|
    ldp     x29, x30, [sp], #16
 | 
						|
    ret
 | 
						|
 | 
						|
Adding ``-fsanitize=shadow-call-stack`` would output the following assembly:
 | 
						|
 | 
						|
.. code-block:: none
 | 
						|
 | 
						|
    str     x30, [x18], #8
 | 
						|
    stp     x29, x30, [sp, #-16]!
 | 
						|
    mov     x29, sp
 | 
						|
    bl      bar
 | 
						|
    add     w0, w0, #1
 | 
						|
    ldp     x29, x30, [sp], #16
 | 
						|
    ldr     x30, [x18, #-8]!
 | 
						|
    ret
 |