llvm-project

Commit Graph

Author	SHA1	Message	Date
Andrew Browne	065d2e1d8b	[DFSan] Fix handling of libAtomic external functions. Implementation based on MSan. Reviewed By: vitalybuka Differential Revision: https://reviews.llvm.org/D132070	2022-08-22 16:04:29 -07:00
Andrew Browne	204c12eef9	[DFSan] Print an error before calling null extern_weak functions, incase dfsan instrumentation optimized out a null check. Reviewed By: vitalybuka Differential Revision: https://reviews.llvm.org/D124051	2022-04-19 17:01:41 -07:00
Andrew Browne	dbf8c00b09	[DFSan] Remove trampolines to unblock opaque pointers. (Reland with fix) https://github.com/llvm/llvm-project/issues/54172 Reviewed By: pcc Differential Revision: https://reviews.llvm.org/D121250	2022-03-14 16:03:25 -07:00
Andrew Browne	edc33fa569	Revert "[DFSan] Remove trampolines to unblock opaque pointers." This reverts commit `84af90336f`.	2022-03-14 13:47:41 -07:00
Andrew Browne	84af90336f	[DFSan] Remove trampolines to unblock opaque pointers. https://github.com/llvm/llvm-project/issues/54172 Reviewed By: pcc Differential Revision: https://reviews.llvm.org/D121250	2022-03-14 13:39:49 -07:00
Andrew Browne	7607ddd981	[NFC][DFSan] Cleanup code to use align functions. Reviewed By: morehouse Differential Revision: https://reviews.llvm.org/D116761	2022-01-06 14:42:38 -08:00
Andrew Browne	32167bfe64	[DFSan] Refactor dfsan_mem_shadow_transfer. Reviewed By: morehouse Differential Revision: https://reviews.llvm.org/D116704	2022-01-06 09:33:19 -08:00
Andrew Browne	4e173585f6	[DFSan] Add option for conditional callbacks. This allows DFSan to find tainted values used to control program behavior. Reviewed By: morehouse Differential Revision: https://reviews.llvm.org/D116207	2022-01-05 15:07:09 -08:00
Andrew Browne	ed6c757d5c	[DFSan] Add functions to print origin trace from origin id instead of address. dfsan_print_origin_id_trace dfsan_sprint_origin_id_trace Reviewed By: vitalybuka Differential Revision: https://reviews.llvm.org/D116184	2021-12-22 16:45:54 -08:00
Vitaly Buka	63886c21ec	[NFC][dfsan] Split Init and ThreadStart	2021-11-08 19:16:55 -08:00
Martin Liska	13a442ca49	Enable -Wformat-pedantic and fix fallout. Differential Revision: https://reviews.llvm.org/D113172	2021-11-05 13:12:35 +01:00
Vitaly Buka	df43d419de	[NFC][sanitizer] Remove includes from header	2021-10-08 14:27:05 -07:00
Andrew Browne	d81723c99b	[DFSan] Optimize code for writing to shadow. Move SetShadow to namespace. Writing zeros to shadow (including checking for existing zero) is now ~2x faster on one example. Reviewed By: morehouse Differential Revision: https://reviews.llvm.org/D110733	2021-09-30 12:42:21 -07:00
Jianzhou Zhao	ae6648cee0	[dfsan] Expose dfsan_get_track_origins to get origin tracking status This allows application code checks if origin tracking is on before printing out traces. -dfsan-track-origins can be 0,1,2. The current code only distinguishes 1 and 2 in compile time, but not at runtime. Made runtime distinguish 1 and 2 too. Reviewed By: browneee Differential Revision: https://reviews.llvm.org/D105128	2021-06-29 20:32:39 +00:00
Andrew Browne	45f6d5522f	[DFSan] Change shadow and origin memory layouts to match MSan. Previously on x86_64: +--------------------+ 0x800000000000 (top of memory) \| application memory \| +--------------------+ 0x700000008000 (kAppAddr) \| \| \| unused \| \| \| +--------------------+ 0x300000000000 (kUnusedAddr) \| origin \| +--------------------+ 0x200000008000 (kOriginAddr) \| unused \| +--------------------+ 0x200000000000 \| shadow memory \| +--------------------+ 0x100000008000 (kShadowAddr) \| unused \| +--------------------+ 0x000000010000 \| reserved by kernel \| +--------------------+ 0x000000000000 MEM_TO_SHADOW(mem) = mem & ~0x600000000000 SHADOW_TO_ORIGIN(shadow) = kOriginAddr - kShadowAddr + shadow Now for x86_64: +--------------------+ 0x800000000000 (top of memory) \| application 3 \| +--------------------+ 0x700000000000 \| invalid \| +--------------------+ 0x610000000000 \| origin 1 \| +--------------------+ 0x600000000000 \| application 2 \| +--------------------+ 0x510000000000 \| shadow 1 \| +--------------------+ 0x500000000000 \| invalid \| +--------------------+ 0x400000000000 \| origin 3 \| +--------------------+ 0x300000000000 \| shadow 3 \| +--------------------+ 0x200000000000 \| origin 2 \| +--------------------+ 0x110000000000 \| invalid \| +--------------------+ 0x100000000000 \| shadow 2 \| +--------------------+ 0x010000000000 \| application 1 \| +--------------------+ 0x000000000000 MEM_TO_SHADOW(mem) = mem ^ 0x500000000000 SHADOW_TO_ORIGIN(shadow) = shadow + 0x100000000000 Reviewed By: stephan.yichao.zhao, gbalats Differential Revision: https://reviews.llvm.org/D104896	2021-06-25 17:00:38 -07:00
Andrew Browne	759e797767	[DFSan][NFC] Refactor Origin Address Alignment code. Reviewed By: stephan.yichao.zhao Differential Revision: https://reviews.llvm.org/D104565	2021-06-21 14:52:02 -07:00
Andrew Browne	14407332de	[DFSan] Cleanup code for platforms other than Linux x86_64. These other platforms are unsupported and untested. They could be re-added later based on MSan code. Reviewed By: gbalats, stephan.yichao.zhao Differential Revision: https://reviews.llvm.org/D104481	2021-06-18 11:21:46 -07:00
Andrew Browne	39295e92f7	Revert "[DFSan] Cleanup code for platforms other than Linux x86_64." This reverts commit `8441b993bd`. Buildbot failures.	2021-06-17 14:19:18 -07:00
Andrew Browne	8441b993bd	[DFSan] Cleanup code for platforms other than Linux x86_64. These other platforms are unsupported and untested. They could be re-added later based on MSan code. Reviewed By: gbalats, stephan.yichao.zhao Differential Revision: https://reviews.llvm.org/D104481	2021-06-17 14:08:40 -07:00
George Balatsouras	98504959a6	[dfsan] Add stack-trace printing functions to dfsan interface Reviewed By: stephan.yichao.zhao Differential Revision: https://reviews.llvm.org/D104165	2021-06-14 14:09:00 -07:00
George Balatsouras	5b4dda550e	[dfsan] Add full fast8 support Complete support for fast8: - amend shadow size and mapping in runtime - remove fast16 mode and -dfsan-fast-16-labels flag - remove legacy mode and make fast8 mode the default - remove dfsan-fast-8-labels flag - remove functions in dfsan interface only applicable to legacy - remove legacy-related instrumentation code and tests - update documentation. Reviewed By: stephan.yichao.zhao, browneee Differential Revision: https://reviews.llvm.org/D103745	2021-06-07 17:20:54 -07:00
Jianzhou Zhao	2c82588dac	[dfsan] Use the sanitizer allocator to reduce memory cost dfsan does not use sanitizer allocator as others. In practice, we let it use glibc's allocator since tcmalloc needs more work to be working with dfsan well. With glibc, we observe large memory leakage. This could relate to two things: 1) glibc allocator has limitation: for example, tcmalloc can reduce memory footprint 2x easily 2) glibc may call unmmap directly as an internal system call by using system call number. so DFSan has no way to release shadow spaces for those unmmap. Using sanitizer allocator addresses the above issues 1) its memory management is close to tcmalloc 2) we can register callback when sanitizer allocator calls unmmap, so dfsan can release shadow spaces correctly. Our experiment with internal server-based application proved that with the change, in a-few-day run, memory usage leakage is close to what tcmalloc does w/o dfsan. This change mainly follows MSan's code. 1) define allocator callbacks at dfsan_allocator.h\|cpp 2) mark allocator APIs to be discard 3) intercept allocator APIs 4) make dfsan_set_label consistent with MSan's SetShadow when setting 0 labels, define dfsan_release_meta_memory when unmap is called 5) add flags about whether zeroing memory after malloc/free. dfsan works at byte-level, so bit-level oparations can cause reading undefined shadow. See D96842. zeroing memory after malloc helps this. About zeroing after free, reading after free is definitely UB, but if user code does so, it is hard to debug an overtainting caused by this w/o running MSan. So we add the flag to help debugging. This change will be split to small changes for review. Before that, a question is "this code shares a lot of with MSan, for example, dfsan_allocator.* and dfsan_new_delete.*. Does it make sense to unify the code at sanitizer_common? will that introduce some maintenance issue?" Reviewed By: morehouse Differential Revision: https://reviews.llvm.org/D101204	2021-06-06 22:09:31 +00:00
George Balatsouras	a11cb10a36	[dfsan] Add function that prints origin stack trace to buffer Reviewed By: stephan.yichao.zhao Differential Revision: https://reviews.llvm.org/D102451	2021-05-24 11:09:03 -07:00
Jianzhou Zhao	1fb612d060	[dfsan] Add a DFSan allocator This is a part of https://reviews.llvm.org/D101204 Reviewed By: morehouse Differential Revision: https://reviews.llvm.org/D101666	2021-05-05 00:51:45 +00:00
Jianzhou Zhao	7fdf270965	[dfsan] Track origin at loads The first version of origin tracking tracks only memory stores. Although this is sufficient for understanding correct flows, it is hard to figure out where an undefined value is read from. To find reading undefined values, we still have to do a reverse binary search from the last store in the chain with printing and logging at possible code paths. This is quite inefficient. Tracking memory load instructions can help this case. The main issues of tracking loads are performance and code size overheads. With tracking only stores, the code size overhead is 38%, memory overhead is 1x, and cpu overhead is 3x. In practice #load is much larger than #store, so both code size and cpu overhead increases. The first blocker is code size overhead: link fails if we inline tracking loads. The workaround is using external function calls to propagate metadata. This is also the workaround ASan uses. The cpu overhead is ~10x. This is a trade off between debuggability and performance, and will be used only when debugging cases that tracking only stores is not enough. Reviewed By: gbalats Differential Revision: https://reviews.llvm.org/D100967	2021-04-22 16:25:24 +00:00
Jianzhou Zhao	1fe042041c	[dfsan] Add origin ABI wrappers supported: dl_get_tls_static_info, calloc, clock_gettime, dfsan_set_write_callback, dl_iterato_phdr, dlopen, memcpy, memmove, memset, pread, read, strcat, strdup, strncpy This is a part of https://reviews.llvm.org/D95835. Reviewed By: morehouse Differential Revision: https://reviews.llvm.org/D98790	2021-03-19 16:23:25 +00:00
Jianzhou Zhao	4e67ae7b6b	[dfsan] Add origin ABI wrappers for thread/signal/fork This is a part of https://reviews.llvm.org/D95835. See `bb91e02efd` about the similar issue of fork in MSan's origin tracking. Reviewed By: morehouse Differential Revision: https://reviews.llvm.org/D98359	2021-03-15 16:18:00 +00:00
Jianzhou Zhao	c20db7ea6a	[dfsan] Add utils to get and print origin paths and some test cases This is a part of https://reviews.llvm.org/D95835. Reviewed By: morehouse, gbalats Differential Revision: https://reviews.llvm.org/D97962	2021-03-06 00:11:35 +00:00
Jianzhou Zhao	a05aa0dd5e	[dfsan] Update memset and dfsan_(set\|add)_label with origin tracking This is a part of https://reviews.llvm.org/D95835. Reviewed-by: morehouse Differential Revision: https://reviews.llvm.org/D97302	2021-02-23 23:16:33 +00:00
Jianzhou Zhao	063a6fa87e	[dfsan] Add origin tls/move/read APIs This is a part of https://reviews.llvm.org/D95835. Added 1) TLS storage 2) a weak global used to set by instrumented code 3) move origins These APIs are similar to MSan's APIs https://github.com/llvm/llvm-project/blob/main/compiler-rt/lib/msan/msan_poisoning.cpp We first improved MSan's by https://reviews.llvm.org/D94572 and https://reviews.llvm.org/D94552. So the correctness has been verified by MSan. After the DFSan instrument code is ready, we wil be adding more test cases 4) read To reduce origin tracking cost, some of the read APIs return only the origin from the first taint data. Note that we did not add origin set APIs here because they are related to code instrumentation, will be added later with IR transformation code. Reviewed-by: morehouse Differential Revision: https://reviews.llvm.org/D96564	2021-02-18 17:48:20 +00:00
Jianzhou Zhao	a7538fee3a	[dfsan] Comment out ChainOrigin temporarily It was added by D96160, will be used by D96564. Some OS got errors if it is not used. Comment it out for the time being.	2021-02-12 18:13:24 +00:00
Jianzhou Zhao	7590c0078d	[dfsan] Turn off THP at dfsan_flush https://reviews.llvm.org/D89662 turned this off at dfsan_init. dfsan_flush also needs to turn it off. W/o this a program may get more and more memory usage after hours. Reviewed-by: morehouse Differential Revision: https://reviews.llvm.org/D96569	2021-02-12 17:10:09 +00:00
Jianzhou Zhao	5ebbc5802f	[dfsan] Introduce memory mapping for origin tracking Reviewed-by: morehouse Differential Revision: https://reviews.llvm.org/D96545	2021-02-11 22:33:16 +00:00
Jianzhou Zhao	2d9c6e10e9	[dfsan] Add origin chain utils This is a part of https://reviews.llvm.org/D95835. The design is based on MSan origin chains. An 4-byte origin is a hash of an origin chain. An origin chain is a pair of a stack hash id and a hash to its previous origin chain. 0 means no previous origin chains exist. We limit the length of a chain to be 16. With origin_history_size = 0, the limit is removed. The change does not have any test cases yet. The following change will be adding test cases when the APIs are used. Reviewed-by: morehouse Differential Revision: https://reviews.llvm.org/D96160	2021-02-11 19:10:11 +00:00
Jianzhou Zhao	0f3fd3b281	[dfsan] Add thread registration This is a part of https://reviews.llvm.org/D95835. This change is to address two problems 1) When recording stacks in origin tracking, libunwind is not async signal safe. Inside signal callbacks, we need to use fast unwind. Fast unwind needs threads 2) StackDepot used by origin tracking is not async signal safe, we set a flag per thread inside a signal callback to prevent from using it. The thread registration is similar to ASan and MSan. Related MSan changes are * `98f5ea0dba` * `f653cda269` * `5a7c364343` Some changes in the diff are used in the next diffs 1) The test case pthread.c is not very interesting for now. It will be extended to test origin tracking later. 2) DFsanThread::InSignalHandler will be used by origin tracking later. Reviewed-by: morehouse Differential Revision: https://reviews.llvm.org/D95963	2021-02-05 17:38:59 +00:00
Jianzhou Zhao	e1a4322f81	[dfsan] Clean TLS after sigaction callbacks DFSan uses TLS to pass metadata of arguments and return values. When an instrumented function accesses the TLS, if a signal callback happens, and the callback calls other instrumented functions with updating the same TLS, the TLS is in an inconsistent state after the callback ends. This may cause either under-tainting or over-tainting. This fix follows MSan's workaround. `cb22c67a21` It simply resets TLS at restore. This prevents from over-tainting. Although under-tainting may still happen, a taint flow can be found eventually if we run a DFSan-instrumented program multiple times. The alternative option is saving the entire TLS. However the TLS storage takes 2k bytes, and signal calls could be nested. So it does not seem worth. This diff fixes sigaction. A following diff will be fixing signal. Reviewed-by: morehouse Differential Revision: https://reviews.llvm.org/D95642	2021-02-02 22:07:17 +00:00
Jianzhou Zhao	80e326a8c4	[dfsan] Support passing non-i16 shadow values in TLS mode This is a child diff of D92261. It extended TLS arg/ret to work with aggregate types. For a function t foo(t1 a1, t2 a2, ... tn an) Its arguments shadow are saved in TLS args like a1_s, a2_s, ..., an_s TLS ret simply includes r_s. By calculating the type size of each shadow value, we can get their offset. This is similar to what MSan does. See __msan_retval_tls and __msan_param_tls from llvm/lib/Transforms/Instrumentation/MemorySanitizer.cpp. Note that this change does not add test cases for overflowed TLS arg/ret because this is hard to test w/o supporting aggregate shdow types. We will be adding them after supporting that. Reviewed-by: morehouse Differential Revision: https://reviews.llvm.org/D92440	2020-12-04 02:45:07 +00:00
Jianzhou Zhao	b4ac05d763	Replace the equivalent code by UnionTableAddr UnionTableAddr is always inlined. Reviewed-by: morehouse Differential Revision: https://reviews.llvm.org/DD91758	2020-11-19 20:15:25 +00:00
Jianzhou Zhao	3597fba4e5	Add a simple stack trace printer for DFSan Reviewed-by: morehouse Differential Revision: https://reviews.llvm.org/D91235	2020-11-11 19:00:59 +00:00
Jianzhou Zhao	91dc545bf2	Set Huge Page mode on shadow regions based on no_huge_pages_for_shadow It turned out that at dynamic shared library mode, the memory access pattern can increase memory footprint significantly on OS when transparent hugepages (THP) are enabled. This could cause >70x memory overhead than running a static linked binary. For example, a static binary with RSS overhead 300M can use > 23G RSS if it is built dynamically. /proc/../smaps shows in 6204552 kB RSS 6141952 kB relates to AnonHugePages. Also such a high RSS happens in some rate: around 25% runs may use > 23G RSS, the rest uses in between 6-23G. I guess this may relate to how user memory is allocated and distributted across huge pages. THP is a trade-off between time and space. We have a flag no_huge_pages_for_shadow for sanitizer. It is true by default but DFSan did not follow this. Depending on if a target is built statically or dynamically, maybe Clang can set no_huge_pages_for_shadow accordingly after this change. But it still seems fine to follow the default setting of no_huge_pages_for_shadow. If time is an issue, and users are fine with high RSS, this flag can be set to false selectively.	2020-10-20 16:50:59 +00:00
Jianzhou Zhao	cc07fbe37d	Release pages to OS when setting 0 label This is a follow up patch of https://reviews.llvm.org/D88755. When set 0 label for an address range, we can release pages within the corresponding shadow address range to OS, and set only addresses outside the pages to be 0. Reviewed-by: morehouse, eugenis Differential Revision: https://reviews.llvm.org/D89199	2020-10-20 16:22:11 +00:00
Matt Morehouse	69721fc9d1	[DFSan] Support fast16labels mode in dfsan_union. While the instrumentation never calls dfsan_union in fast16labels mode, the custom wrappers do. We detect fast16labels mode by checking whether any labels have been created. If not, we must be using fast16labels mode. Reviewed By: vitalybuka Differential Revision: https://reviews.llvm.org/D86012	2020-08-17 11:27:28 -07:00
Matt Morehouse	bb3a3da38d	[DFSan] Don't unmap during dfsan_flush(). Unmapping and remapping is dangerous since another thread could touch the shadow memory while it is unmapped. But there is really no need to unmap anyway, since mmap(MAP_FIXED) will happily clobber the existing mapping with zeroes. This is thread-safe since the mmap() is done under the same kernel lock as page faults are done. Reviewed By: vitalybuka Differential Revision: https://reviews.llvm.org/D85947	2020-08-14 11:43:49 -07:00
Matt Morehouse	e2d0b44a7c	[DFSan] Add efficient fast16labels instrumentation mode. Adds the -fast-16-labels flag, which enables efficient instrumentation for DFSan when the user needs <=16 labels. The instrumentation eliminates most branches and most calls to __dfsan_union or __dfsan_union_load. Reviewed By: vitalybuka Differential Revision: https://reviews.llvm.org/D84371	2020-07-29 18:58:47 +00:00
Matt Morehouse	c6f2142428	Reland "[DFSan] Handle fast16labels for all API functions." Support fast16labels in `dfsan_has_label`, and print an error for all other API functions. For `dfsan_dump_labels` we return silently rather than crashing since it is also called from the atexit handler where it is undefined behavior to call exit() again. Reviewed By: kcc Differential Revision: https://reviews.llvm.org/D84215	2020-07-23 21:19:39 +00:00
Matt Morehouse	df441c9015	Revert "[DFSan] Handle fast16labels for all API functions." This reverts commit `19d9c0397e` due to buildbot failure.	2020-07-23 17:49:55 +00:00
Matt Morehouse	19d9c0397e	[DFSan] Handle fast16labels for all API functions. Summary: Support fast16labels in `dfsan_has_label`, and print an error for all other API functions. Reviewers: kcc, vitalybuka, pcc Reviewed By: kcc Subscribers: jfb, llvm-commits, #sanitizers Tags: #sanitizers Differential Revision: https://reviews.llvm.org/D84215	2020-07-22 23:54:26 +00:00
Nico Weber	a9aa813792	compiler-rt: Rename .cc file in lib/{dfsan,stats,ubsan_minimal} to .cpp Like r367463, but for dfsan, stats, ubsan_minimal. llvm-svn: 367551	2019-08-01 12:41:23 +00:00

48 Commits