llvm-project

Commit Graph

Author	SHA1	Message	Date
Peter Collingbourne	3d049bce98	hwasan: Support for outlined checks in the Linux kernel. Add support for match-all tags and GOT-free runtime calls, which are both required for the kernel to be able to support outlined checks. This requires extending the access info to let the backend know when to enable these features. To make the code easier to maintain introduce an enum with the bit field positions for the access info. Allow outlined checks to be enabled with -mllvm -hwasan-inline-all-checks=0. Kernels that contain runtime support for outlined checks may pass this flag. Kernels lacking runtime support will continue to link because they do not pass the flag. Old versions of LLVM will ignore the flag and continue to use inline checks. With a separate kernel patch [1] I measured the code size of defconfig + tag-based KASAN, as well as boot time (i.e. time to init launch) on a DragonBoard 845c with an Android arm64 GKI kernel. The results are below: code size boot time before 92824064 6.18s after 38822400 6.65s [1] https://linux-review.googlesource.com/id/I1a30036c70ab3c3ee78d75ed9b87ef7cdc3fdb76 Depends on D90425 Differential Revision: https://reviews.llvm.org/D90426	2020-10-30 14:25:40 -07:00
Peter Collingbourne	c9b1a2b41d	AArch64: Use SBFX instead of UBFX to extract address granule in outlined HWASan checks. In a kernel (or in general in environments where bit 55 of the address is set) the shadow base needs to point to the end of the shadow region, not the beginning. Bit 55 needs to be sign extended into bits 52-63 of the shadow base offset, otherwise we end up loading from an invalid address. We can do this by using SBFX instead of UBFX. Using SBFX should have no effect in the userspace case where bit 55 of the address is clear so we do so unconditionally. I don't think we need a ABI version bump for this (but one will come anyway when we switch to x20 for the shadow base register). Differential Revision: https://reviews.llvm.org/D90424	2020-10-30 12:53:15 -07:00
Peter Collingbourne	3859fc653f	AArch64: Switch to x20 as the shadow base register for outlined HWASan checks. From a code size perspective it turns out to be better to use a callee-saved register to pass the shadow base. For non-leaf functions it avoids the need to reload the shadow base into x9 after each function call, at the cost of an additional stack slot to save the caller's x20. But with x9 there is also a stack size cost, either as a result of copying x9 to a callee-saved register across calls or by spilling it to stack, so for the non-leaf functions the change to stack usage is largely neutral. It is also code size (and stack size) neutral for many leaf functions. Although they now need to save/restore x20 this can typically be combined via LDP/STP into the x30 save/restore. In the case where the function needs callee-saved registers or stack spills we end up needing, on average, 8 more bytes of stack and 1 more instruction but given the improvements to other functions this seems like the right tradeoff. Unfortunately we cannot change the register for the v1 (non short granules) check because the runtime assumes that the shadow base register is stored in x9, so the v1 check still uses x9. Aside from that there is no change to the ABI because the choice of shadow base register is a contract between the caller and the outlined check function, both of which are compiler generated. We do need to rename the v2 check functions though because the functions are deduplicated based on their names, not on their contents, and we need to make sure that when object files from old and new compilers are linked together we don't end up with a function that uses x9 calling an outlined check that uses x20 or vice versa. With this change code size of /system/lib64/*.so in an Android build with HWASan goes from 200066976 bytes to 194085912 bytes, or a 3% decrease. Differential Revision: https://reviews.llvm.org/D90422	2020-10-30 12:51:30 -07:00
Peter Collingbourne	c336557f02	hwasan: Compatibility fixes for short granules. We can't use short granules with stack instrumentation when targeting older API levels because the rest of the system won't understand the short granule tags stored in shadow memory. Moreover, we need to be able to let old binaries (which won't understand short granule tags) run on a new system that supports short granule tags. Such binaries will call the __hwasan_tag_mismatch function when their outlined checks fail. We can compensate for the binary's lack of support for short granules by implementing the short granule part of the check in the __hwasan_tag_mismatch function. Unfortunately we can't do anything about inline checks, but I don't believe that we can generate these by default on aarch64, nor did we do so when the ABI was fixed. A new function, __hwasan_tag_mismatch_v2, is introduced that lets code targeting the new runtime avoid redoing the short granule check. Because tag mismatches are rare this isn't important from a performance perspective; the main benefit is that it introduces a symbol dependency that prevents binaries targeting the new runtime from running on older (i.e. incompatible) runtimes. Differential Revision: https://reviews.llvm.org/D68059 llvm-svn: 373035	2019-09-27 01:02:10 +00:00
Peter Collingbourne	1366262b74	hwasan: Improve precision of checks using short granule tags. A short granule is a granule of size between 1 and `TG-1` bytes. The size of a short granule is stored at the location in shadow memory where the granule's tag is normally stored, while the granule's actual tag is stored in the last byte of the granule. This means that in order to verify that a pointer tag matches a memory tag, HWASAN must check for two possibilities: * the pointer tag is equal to the memory tag in shadow memory, or * the shadow memory tag is actually a short granule size, the value being loaded is in bounds of the granule and the pointer tag is equal to the last byte of the granule. Pointer tags between 1 to `TG-1` are possible and are as likely as any other tag. This means that these tags in memory have two interpretations: the full tag interpretation (where the pointer tag is between 1 and `TG-1` and the last byte of the granule is ordinary data) and the short tag interpretation (where the pointer tag is stored in the granule). When HWASAN detects an error near a memory tag between 1 and `TG-1`, it will show both the memory tag and the last byte of the granule. Currently, it is up to the user to disambiguate the two possibilities. Because this functionality obsoletes the right aligned heap feature of the HWASAN memory allocator (and because we can no longer easily test it), the feature is removed. Also update the documentation to cover both short granule tags and outlined checks. Differential Revision: https://reviews.llvm.org/D63908 llvm-svn: 365551	2019-07-09 20:22:36 +00:00
Peter Collingbourne	2742eeb78e	hwasan: Shrink outlined checks by 1 instruction. Turns out that we can save an instruction by folding the right shift into the compare. Differential Revision: https://reviews.llvm.org/D63568 llvm-svn: 363874	2019-06-19 20:40:03 +00:00
Mitch Phillips	790edbc16e	[HWASan] Save + print registers when tag mismatch occurs in AArch64. Summary: This change change the instrumentation to allow users to view the registers at the point at which tag mismatch occured. Most of the heavy lifting is done in the runtime library, where we save the registers to the stack and emit unwind information. This allows us to reduce the overhead, as very little additional work needs to be done in each __hwasan_check instance. In this implementation, the fast path of __hwasan_check is unmodified. There are an additional 4 instructions (16B) emitted in the slow path in every __hwasan_check instance. This may increase binary size somewhat, but as most of the work is done in the runtime library, it's manageable. The failure trace now contains a list of registers at the point of which the failure occured, in a format similar to that of Android's tombstones. It currently has the following format: Registers where the failure occurred (pc 0x0055555561b4): x0 0000000000000014 x1 0000007ffffff6c0 x2 1100007ffffff6d0 x3 12000056ffffe025 x4 0000007fff800000 x5 0000000000000014 x6 0000007fff800000 x7 0000000000000001 x8 12000056ffffe020 x9 0200007700000000 x10 0200007700000000 x11 0000000000000000 x12 0000007fffffdde0 x13 0000000000000000 x14 02b65b01f7a97490 x15 0000000000000000 x16 0000007fb77376b8 x17 0000000000000012 x18 0000007fb7ed6000 x19 0000005555556078 x20 0000007ffffff768 x21 0000007ffffff778 x22 0000000000000001 x23 0000000000000000 x24 0000000000000000 x25 0000000000000000 x26 0000000000000000 x27 0000000000000000 x28 0000000000000000 x29 0000007ffffff6f0 x30 00000055555561b4 ... and prints after the dump of memory tags around the buggy address. Every register is saved exactly as it was at the point where the tag mismatch occurs, with the exception of x16/x17. These registers are used in the tag mismatch calculation as scratch registers during __hwasan_check, and cannot be saved without affecting the fast path. As these registers are designated as scratch registers for linking, there should be no important information in them that could aid in debugging. Reviewers: pcc, eugenis Reviewed By: pcc, eugenis Subscribers: srhines, kubamracek, mgorny, javed.absar, krytarowski, kristof.beyls, hiraditya, jdoerfert, llvm-commits, #sanitizers Tags: #sanitizers, #llvm Differential Revision: https://reviews.llvm.org/D58857 llvm-svn: 355738	2019-03-08 21:22:35 +00:00
Peter Collingbourne	73078ecd38	hwasan: Move memory access checks into small outlined functions on aarch64. Each hwasan check requires emitting a small piece of code like this: https://clang.llvm.org/docs/HardwareAssistedAddressSanitizerDesign.html#memory-accesses The problem with this is that these code blocks typically bloat code size significantly. An obvious solution is to outline these blocks of code. In fact, this has already been implemented under the -hwasan-instrument-with-calls flag. However, as currently implemented this has a number of problems: - The functions use the same calling convention as regular C functions. This means that the backend must spill all temporary registers as required by the platform's C calling convention, even though the check only needs two registers on the hot path. - The functions take the address to be checked in a fixed register, which increases register pressure. Both of these factors can diminish the code size effect and increase the performance hit of -hwasan-instrument-with-calls. The solution that this patch implements is to involve the aarch64 backend in outlining the checks. An intrinsic and pseudo-instruction are created to represent a hwasan check. The pseudo-instruction is register allocated like any other instruction, and we allow the register allocator to select almost any register for the address to check. A particular combination of (register selection, type of check) triggers the creation in the backend of a function to handle the check for specifically that pair. The resulting functions are deduplicated by the linker. The pseudo-instruction (really the function) is specified to preserve all registers except for the registers that the AAPCS specifies may be clobbered by a call. To measure the code size and performance effect of this change, I took a number of measurements using Chromium for Android on aarch64, comparing a browser with inlined checks (the baseline) against a browser with outlined checks. Code size: Size of .text decreases from 243897420 to 171619972 bytes, or a 30% decrease. Performance: Using Chromium's blink_perf.layout microbenchmarks I measured a median performance regression of 6.24%. The fact that a perf/size tradeoff is evident here suggests that we might want to make the new behaviour conditional on -Os/-Oz. But for now I've enabled it unconditionally, my reasoning being that hwasan users typically expect a relatively large perf hit, and ~6% isn't really adding much. We may want to revisit this decision in the future, though. I also tried experimenting with varying the number of registers selectable by the hwasan check pseudo-instruction (which would result in fewer variants being created), on the hypothesis that creating fewer variants of the function would expose another perf/size tradeoff by reducing icache pressure from the check functions at the cost of register pressure. Although I did observe a code size increase with fewer registers, I did not observe a strong correlation between the number of registers and the performance of the resulting browser on the microbenchmarks, so I conclude that we might as well use ~all registers to get the maximum code size improvement. My results are below: Regs \| .text size \| Perf hit -----+------------+--------- ~all \| 171619972 \| 6.24% 16 \| 171765192 \| 7.03% 8 \| 172917788 \| 5.82% 4 \| 177054016 \| 6.89% Differential Revision: https://reviews.llvm.org/D56954 llvm-svn: 351920	2019-01-23 02:20:10 +00:00

8 Commits