llvm-project

Commit Graph

Author	SHA1	Message	Date
Dmitry Vyukov	7ec308715c	tsan: prevent pathological slowdown for spurious races Prevent the following pathological behavior: Since memory access handling is not synchronized with DoReset, a thread running concurrently with DoReset can leave a bogus shadow value that will be later falsely detected as a race. For such false races RestoreStack will return false and we will not report it. However, consider that a thread leaves a whole lot of such bogus values and these values are later read by a whole lot of threads. This will cause massive amounts of ReportRace calls and lots of serialization. In very pathological cases the resulting slowdown can be >100x. This is very unlikely, but it was presumably observed in practice: https://github.com/google/sanitizers/issues/1552 If this happens, previous access sid+epoch will be the same for all of these false races b/c if the thread will try to increment epoch, it will notice that DoReset has happened and will stop producing bogus shadow values. So, last_spurious_race is used to remember the last sid+epoch for which RestoreStack returned false. Then it is used to filter out races with the same sid+epoch very early and quickly. It is of course possible that multiple threads left multiple bogus shadow values and all of them are read by lots of threads at the same time. In such case last_spurious_race will only be able to deduplicate a few races from one thread, then few from another and so on. An alternative would be to hold an array of such sid+epoch, but we consider such scenario as even less likely. Note: this can lead to some rare false negatives as well: 1. When a legit access with the same sid+epoch participates in a race as the "previous" memory access, it will be wrongly filtered out. 2. When RestoreStack returns false for a legit memory access because it was already evicted from the thread trace, we will still remember it in last_spurious_race. Then if there is another racing memory access from the same thread that happened in the same epoch, but was stored in the next thread trace part (which is still preserved in the thread trace), we will also wrongly filter it out while RestoreStack would actually succeed for that second memory access. Reviewed By: melver Differential Revision: https://reviews.llvm.org/D130269	2022-07-25 10:40:11 +02:00
Dmitry Vyukov	7505cc301f	tsan: remove tracking of racy addresses We used to deduplicate based on the race address to prevent lots of repeated reports about the same race. But now we clear the shadow for the racy address in DoReportRace: // This prevents trapping on this address in future. for (uptr i = 0; i < kShadowCnt; i++) StoreShadow(&shadow_mem[i], i == 0 ? Shadow::kRodata : Shadow::kEmpty); It should have the same effect of not reporting duplicates (and actually better because it's automatically reset when the memory is reallocated). So drop the address deduplication code. Both simpler and faster. Reviewed By: melver Differential Revision: https://reviews.llvm.org/D130240	2022-07-25 10:33:26 +02:00
Dmitry Vyukov	1d4d2cceda	[TSan] Add a runtime flag to print full thread creation stacks up to the main thread Currently, we only print how threads involved in data race are created from their parent threads. Add a runtime flag 'print_full_thread_history' to print thread creation stacks for the threads involved in the data race and their ancestors up to the main thread. Reviewed By: dvyukov Differential Revision: https://reviews.llvm.org/D122131	2022-03-24 17:30:27 +01:00
Dmitry Vyukov	9e66e5872c	tsan: print signal num in errno spoiling reports For errno spoiling reports we only print the stack where the signal handler is invoked. And the top frame is the signal handler function, which is supposed to give the info for debugging. But in same cases the top frame can be some common thunk, which does not give much info. E.g. for Go/cgo it's always runtime.cgoSigtramp. Print the signal number. This is what we can easily gather and it may give at least some hints regarding the issue. Reviewed By: melver, vitalybuka Differential Revision: https://reviews.llvm.org/D121979	2022-03-18 16:12:11 +01:00
Dmitry Vyukov	52a4a4a53c	tsan: remove unused ReportMutex::destroyed Depends on D113980. Reviewed By: melver Differential Revision: https://reviews.llvm.org/D113981	2021-12-21 11:37:01 +01:00
Dmitry Vyukov	69807fe161	tsan: change ReportMutex::id type to int We used to use u64 as mutex id because it was some tricky identifier built from address and reuse count. Now it's just the mutex index in the report (0, 1, 2...), so use int to represent it. Depends on D112603. Reviewed By: melver Differential Revision: https://reviews.llvm.org/D113980	2021-12-21 11:36:49 +01:00
Dmitry Vyukov	2eb3e20461	tsan: fix deadlock during race reporting SlotPairLocker calls SlotLock under ctx->multi_slot_mtx. SlotLock can invoke global reset DoReset if we are out of slots/epochs. But DoReset locks ctx->multi_slot_mtx as well, which leads to deadlock. Resolve the deadlock by removing SlotPairLocker/multi_slot_mtx and only lock one slot for which we will do RestoreStack. We need to lock that slot because RestoreStack accesses the slot journal. But it's unclear why we need to lock the current slot. Initially I did it just to be on the safer side (but at that time we dit not lock the second slot, so it was easy just to lock the current slot). Reviewed By: melver Differential Revision: https://reviews.llvm.org/D116040	2021-12-20 18:52:48 +01:00
Dmitry Vyukov	b332134921	tsan: new runtime (v3) This change switches tsan to the new runtime which features: - 2x smaller shadow memory (2x of app memory) - faster fully vectorized race detection - small fixed-size vector clocks (512b) - fast vectorized vector clock operations - unlimited number of alive threads/goroutimes Depends on D112602. Reviewed By: melver Differential Revision: https://reviews.llvm.org/D112603	2021-12-13 12:48:34 +01:00
Jonas Devlieghere	396113c19f	Revert "tsan: new runtime (v3)" This reverts commit `5a33e41281` becuase it breaks LLDB. https://green.lab.llvm.org/green/view/LLDB/job/lldb-cmake/39208/	2021-12-09 09:18:10 -08:00
Dmitry Vyukov	5a33e41281	tsan: new runtime (v3) This change switches tsan to the new runtime which features: - 2x smaller shadow memory (2x of app memory) - faster fully vectorized race detection - small fixed-size vector clocks (512b) - fast vectorized vector clock operations - unlimited number of alive threads/goroutimes Depends on D112602. Reviewed By: melver Differential Revision: https://reviews.llvm.org/D112603	2021-12-09 09:09:52 +01:00
Dmitry Vyukov	09859113ed	Revert "tsan: new runtime (v3)" This reverts commit `66d4ce7e26`. Chromium tests started failing: https://bugs.chromium.org/p/chromium/issues/detail?id=1275581	2021-12-01 18:00:46 +01:00
Dmitry Vyukov	66d4ce7e26	tsan: new runtime (v3) This change switches tsan to the new runtime which features: - 2x smaller shadow memory (2x of app memory) - faster fully vectorized race detection - small fixed-size vector clocks (512b) - fast vectorized vector clock operations - unlimited number of alive threads/goroutimes Depends on D112602. Reviewed By: melver Differential Revision: https://reviews.llvm.org/D112603	2021-11-25 18:32:04 +01:00
Dmitry Vyukov	b584741d06	tsan: fix Java heap block begin in reports We currently use a wrong value for heap block (only works for C++, but not for Java). Use the correct value (we already computed it before, just forgot to use). Depends on D114593. Reviewed By: melver Differential Revision: https://reviews.llvm.org/D114595	2021-11-25 17:07:53 +01:00
Weverything	1150f02c77	Revert "tsan: new runtime (v3)" This reverts commit `ebd47b0fb7`. This was causing unexpected behavior in programs.	2021-11-23 18:32:32 -08:00
Dmitry Vyukov	ebd47b0fb7	tsan: new runtime (v3) This change switches tsan to the new runtime which features: - 2x smaller shadow memory (2x of app memory) - faster fully vectorized race detection - small fixed-size vector clocks (512b) - fast vectorized vector clock operations - unlimited number of alive threads/goroutimes Differential Revision: https://reviews.llvm.org/D112603	2021-11-23 11:44:59 +01:00
Dmitry Vyukov	5f18ae3988	Revert "tsan: new runtime (v3)" Summary: This reverts commit `1784fe0532`. Broke some bots: https://lab.llvm.org/buildbot#builders/57/builds/12365 http://green.lab.llvm.org/green/job/clang-stage1-RA/25658/ Reviewers: vitalybuka, melver Subscribers:	2021-11-22 19:08:48 +01:00
Dmitry Vyukov	1784fe0532	tsan: new runtime (v3) This change switches tsan to the new runtime which features: - 2x smaller shadow memory (2x of app memory) - faster fully vectorized race detection - small fixed-size vector clocks (512b) - fast vectorized vector clock operations - unlimited number of alive threads/goroutimes Depends on D112602. Reviewed By: melver Differential Revision: https://reviews.llvm.org/D112603	2021-11-22 15:55:39 +01:00
Dmitry Vyukov	79fbba9b79	Revert "tsan: new runtime (v3)" Summary: This reverts commit `ac95b8d954`. There is a number of bot failures: http://45.33.8.238/mac/38755/step_4.txt https://green.lab.llvm.org/green/view/LLDB/job/lldb-cmake/38135/consoleFull#-148886289949ba4694-19c4-4d7e-bec5-911270d8a58c Reviewers: vitalybuka, melver Subscribers:	2021-11-12 17:49:47 +01:00
Dmitry Vyukov	ac95b8d954	tsan: new runtime (v3) This change switches tsan to the new runtime which features: - 2x smaller shadow memory (2x of app memory) - faster fully vectorized race detection - small fixed-size vector clocks (512b) - fast vectorized vector clock operations - unlimited number of alive threads/goroutimes Depends on D112602. Reviewed By: melver Differential Revision: https://reviews.llvm.org/D112603	2021-11-12 14:31:49 +01:00
Dmitry Vyukov	1b348902ea	tsan: add DynamicMutexSet helper MutexSet is too large to be allocated on stack. But we need local MutexSet objects in few places and use various hacks to allocate them. Add DynamicMutexSet helper that simplifies allocation of such objects. Reviewed By: melver Differential Revision: https://reviews.llvm.org/D112449	2021-10-25 19:45:06 +02:00
Dmitry Vyukov	b02938439d	tsan: uninline RacyStacks::operator== It's only used during race reporting. There is no point in polluting the main header file with it. Reviewed By: xgupta Differential Revision: https://reviews.llvm.org/D110470	2021-09-25 12:08:51 +02:00
Dmitry Vyukov	6fe35ef419	tsan: fix debug format strings Some of the DPrintf's currently produce -Wformat warnings if enabled. Fix these format strings. Reviewed By: melver Differential Revision: https://reviews.llvm.org/D110131	2021-09-21 13:23:10 +02:00
Marco Elver	f3b3c964c3	Revert "[tsan] Fix GCC 8.3 build after D107911" This reverts commit `797fe59e6b`. The use of "EventType type : 3" is replicated for all Event structs and therefore was still present. As a result this still caused failures on older GCCs (9.2 or 8.3 or earlier). The particular bot that was failing due to buggy GCC was fixed by `fef39cc472`. Therefore, no reason to keep the workaround around; revert it. Reviewed By: vitalybuka Differential Revision: https://reviews.llvm.org/D108192	2021-08-17 19:26:20 +02:00
Vitaly Buka	797fe59e6b	[tsan] Fix GCC 8.3 build after D107911 gcc 8.3 reports: __tsan::v3::Event::type’ is too small to hold all values of ‘enum class __tsan::v3::EventType’	2021-08-16 16:18:42 -07:00
Dmitry Vyukov	c97318996f	tsan: add new trace Add structures for the new trace format, functions that serialize and add events to the trace and trace replaying logic. Differential Revision: https://reviews.llvm.org/D107911	2021-08-16 10:24:11 +02:00
Dmitry Vyukov	a82c7476a7	tsan: introduce RawShadow type Currently we hardcode u64 type for shadow everywhere and do lots of uptr<->u64* casts. It makes it hard to change u64 to another type (e.g. u32) and makes it easy to introduce bugs. Introduce RawShadow type and use it in MemToShadow, ShadowToMem, IsShadowMem and throughout the code base as u64 replacement. This makes it possible to change u64 to something else in future and generally improves static typing. Depends on D107481. Reviewed By: vitalybuka Differential Revision: https://reviews.llvm.org/D107482	2021-08-05 13:37:10 +02:00
Dmitry Vyukov	9e3e97aa81	tsan: refactor MetaMap::GetAndLock interface Don't lock the sync object inside of MetaMap methods. This has several advantages: - the new interface does not confuse thread-safety analysis so we can remove a bunch of NO_THREAD_SAFETY_ANALYSIS attributes - this allows use of scoped lock objects - this allows more flexibility, e.g. locking some other mutex between searching and locking the sync object Also prefix the methods with GetSync to be consistent with GetBlock method. Also make interface wrappers inlinable, otherwise we either end up with 2 copies of the method, or with an additional call. Reviewed By: melver Differential Revision: https://reviews.llvm.org/D107256	2021-08-02 13:29:46 +02:00
Dmitry Vyukov	103d075b05	tsan: introduce Tid and StackID typedefs Currently we inconsistently use u32 and int for thread ids, there are also "unique tid" and "os tid" and just lots of other things identified by integers. Additionally new tsan runtime will introduce yet another thread identifier that is very different from current tids. Similarly for stack IDs, it's easy to confuse u32 with other integer identifiers. And when a function accepts u32 or a struct contains u32 field, it's not always clear what it is. Add Tid and StackID typedefs to make it clear what is what. Reviewed By: melver Differential Revision: https://reviews.llvm.org/D107152	2021-07-31 09:05:31 +02:00
Dmitry Vyukov	817f942a28	tsan: introduce New/Alloc/Free helpers We frequenty allocate sizeof(T) memory and call T ctor on that memory (C++ new keyword effectively). Currently it's quite verbose and usually takes 2 lines of code. Add New<T>() helper that does it much more concisely. Rename internal_free to Free that also sets the pointer to nullptr. Shorter and safer. Rename internal_alloc to Alloc, just shorter. Reviewed By: vitalybuka, melver Differential Revision: https://reviews.llvm.org/D107085	2021-07-30 11:51:55 +02:00
Dmitry Vyukov	0d68cfc996	tsan: store ThreadRegistry in Context by value It's unclear why we allocate ThreadRegistry separately, I assume it's some historical leftover. Embed ThreadRegistry into Context. Reviewed By: melver Differential Revision: https://reviews.llvm.org/D107045	2021-07-29 12:44:44 +02:00
Dmitry Vyukov	9dad34423b	tsan: strip __libc_start_main frame We strip all frames below main but in some cases it may be not enough. Namely, when main is instrumented but does not call any other instrumented code. In this case __tsan_func_entry in main obtains PC pointing to __libc_start_main (as we pass caller PC to __tsan_func_entry), but nothing obtains PC pointing to main itself (as main does not call any instrumented code). In such case we will not have main in the stack, and stripping everything below main won't work. So strip __libc_start_main explicitly as well. But keep stripping of main because __libc_start_main is glibc/linux-specific, so looking for main is more reliable (and usually main is present in stacks). Depends on D106957. Reviewed By: vitalybuka Differential Revision: https://reviews.llvm.org/D106958	2021-07-28 20:26:42 +02:00
Dmitry Vyukov	5237b14087	tsan: print alloc stack for Java objects We maintain information about Java allocations, but for some reason never printed it in reports. Print it. Reviewed By: vitalybuka Differential Revision: https://reviews.llvm.org/D106956	2021-07-28 20:25:11 +02:00
Dmitry Vyukov	b5bc386ca1	tsan: remove mblock types We used to count number of allocations/bytes based on the type and maybe record them in heap block headers. But that's all in the past, now it's not used for anything. Remove the mblock type. Reviewed By: vitalybuka Differential Revision: https://reviews.llvm.org/D106971	2021-07-28 20:09:25 +02:00
Dmitry Vyukov	0118a64934	tsan: switch to the new sanitizer_common mutex Now that sanitizer_common mutex has feature-parity with tsan mutex, switch tsan to the sanitizer_common mutex and remove tsan's custom mutex. Reviewed By: vitalybuka, melver Differential Revision: https://reviews.llvm.org/D106379	2021-07-23 09:13:26 +02:00
Dmitry Vyukov	8924d8e37e	tsan: disable thread safety analysis in more functions In preparation for replacing tsan Mutex with sanitizer_common Mutex, which has thread-safety annotations. Thread safety analysis does not understand MetaMap::GetAndLock which returns a locked sync object. Reviewed By: vitalybuka, melver Differential Revision: https://reviews.llvm.org/D106548	2021-07-23 09:12:59 +02:00
Nico Weber	557855e047	Revert "tsan: make obtaining current PC faster" This reverts commit `e33446ea58`. Doesn't build on mac, and causes other problems. See reports on https://reviews.llvm.org/D106046 and https://reviews.llvm.org/D106081 Also revert follow-up "tsan: strip top inlined internal frames" This reverts commit `7b302fc9b0`.	2021-07-15 19:29:19 -04:00
Dmitry Vyukov	7b302fc9b0	tsan: strip top inlined internal frames The new GET_CURRENT_PC() can lead to spurious top inlined internal frames. Here are 2 examples from bots, in both cases the malloc is supposed to be the top frame (#0): WARNING: ThreadSanitizer: signal-unsafe call inside of a signal #0 __sanitizer::StackTrace::GetNextInstructionPc(unsigned long) #1 malloc Location is heap block of size 99 at 0xbe3800003800 allocated by thread T1: #0 __sanitizer::StackTrace::GetNextInstructionPc(unsigned long) #1 malloc Let's strip these internal top frames from reports. With other code changes I also observed some top frames from __tsan::ScopedInterceptor, proactively remove these as well. Differential Revision: https://reviews.llvm.org/D106081	2021-07-15 19:37:44 +02:00
Dmitry Vyukov	2721e27c3a	sanitizer_common: deduplicate CheckFailed We have some significant amount of duplication around CheckFailed functionality. Each sanitizer copy-pasted a chunk of code. Some got random improvements like dealing with recursive failures better. These improvements could benefit all sanitizers, but they don't. Deduplicate CheckFailed logic across sanitizers and let each sanitizer only print the current stack trace. I've tried to dedup stack printing as well, but this got me into cmake hell. So let's keep this part duplicated in each sanitizer for now. Reviewed By: vitalybuka Differential Revision: https://reviews.llvm.org/D102221	2021-05-12 08:50:53 +02:00
Dmitry Vyukov	ed7bf7d73f	tsan: refactor fork handling Commit `efd254b636` ("tsan: fix deadlock in pthread_atfork callbacks") fixed another deadlock related to atfork handling. But builders with DCHECKs enabled reported failures of pthread_atfork_deadlock2.c and pthread_atfork_deadlock3.c tests related to the fact that we hold runtime locks on interceptor exit: https://lab.llvm.org/buildbot/#/builders/70/builds/6727 This issue is somewhat inherent to the current approach, we indeed execute user code (atfork callbacks) with runtime lock held. Refactor fork handling to not run user code (atfork callbacks) with runtime locks held. This change does this by installing own atfork callbacks during runtime initialization. Atfork callbacks run in LIFO order, so the expectation is that our callbacks run last, right before the actual fork. This way we lock runtime mutexes around fork, but not around user callbacks. Extend tests to also install after fork callbacks just to cover more scenarios. Some tests also started reporting real races that we previously suppressed. Also extend tests to cover fork syscall support. Reviewed By: vitalybuka Differential Revision: https://reviews.llvm.org/D101517	2021-04-30 08:48:20 +02:00
Tres Popp	d1e08b124c	Revert "tsan: refactor fork handling" This reverts commit `e1021dd1fd`.	2021-04-28 14:08:33 +02:00
Dmitry Vyukov	e1021dd1fd	tsan: refactor fork handling Commit `efd254b636` ("tsan: fix deadlock in pthread_atfork callbacks") fixed another deadlock related to atfork handling. But builders with DCHECKs enabled reported failures of pthread_atfork_deadlock2.c and pthread_atfork_deadlock3.c tests related to the fact that we hold runtime locks on interceptor exit: https://lab.llvm.org/buildbot/#/builders/70/builds/6727 This issue is somewhat inherent to the current approach, we indeed execute user code (atfork callbacks) with runtime lock held. Refactor fork handling to not run user code (atfork callbacks) with runtime locks held. This change does this by installing own atfork callbacks during runtime initialization. Atfork callbacks run in LIFO order, so the expectation is that our callbacks run last, right before the actual fork. This way we lock runtime mutexes around fork, but not around user callbacks. Extend tests to also install after fork callbacks just to cover more scenarios. Some tests also started reporting real races that we previously suppressed. Reviewed By: vitalybuka Differential Revision: https://reviews.llvm.org/D101385	2021-04-27 22:37:27 +02:00
Evgenii Stepanov	5275d772da	Revert "tsan: fix deadlock in pthread_atfork callbacks" Tests fail on debug builders. See the forward fix in https://reviews.llvm.org/D101385. This reverts commit `efd254b636`.	2021-04-27 12:36:31 -07:00
Dmitry Vyukov	efd254b636	tsan: fix deadlock in pthread_atfork callbacks We take report/thread_registry locks around fork. This means we cannot report any bugs in atfork handlers. We resolved this by enabling per-thread ignores around fork. This resolved some of the cases, but not all. The added test triggers a race report from a signal handler called from atfork callback, we reset per-thread ignores around signal handlers, so we tried to report it and deadlocked. But there are more cases: a signal handler can be called synchronously if it's sent to itself. Or any other report types would cause deadlocks as well: mutex misuse, signal handler spoiling errno, etc. Disable all reports for the duration of fork with thr->suppress_reports and don't re-enable them around signal handlers. Reviewed By: vitalybuka Differential Revision: https://reviews.llvm.org/D101154	2021-04-27 13:25:26 +02:00
Vitaly Buka	20e78eb304	[sanitizer][NFC] Fix few cpplint warnings	2020-10-13 20:39:37 -07:00
Joachim Protze	7358a1104a	[TSan] Optimize handling of racy address This patch splits the handling of racy address and racy stack into separate functions. If a race was already reported for the address, we can avoid the cost for collecting the involved stacks. This patch also removes the race condition in storing the racy address / racy stack. This race condition allowed all threads to report the race. This patch changes the transitive suppression of reports. Previously suppression could transitively chain memory location and racy stacks. Now racy memory and racy stack are separate suppressions. Commit again, now with fixed tests. Reviewed by: dvyukov Differential Revision: https://reviews.llvm.org/D83625	2020-07-16 16:22:57 +02:00
Joachim Protze	d3849dddd2	Revert "[TSan] Optimize handling of racy address" This reverts commit `00e3a1ddec`. The commit broke most build bots, investigating.	2020-07-15 17:40:28 +02:00
Joachim Protze	00e3a1ddec	[TSan] Optimize handling of racy address This patch splits the handling of racy address and racy stack into separate functions. If a race was already reported for the address, we can avoid the cost for collecting the involved stacks. This patch also removes the race condition in storing the racy address / racy stack. This race condition allowed all threads to report the race. This patch changes the transitive suppression of reports. Previously suppression could transitively chain memory location and racy stacks. Now racy memory and racy stack are separate suppressions. Reviewed by: dvyukov Differential Revision: https://reviews.llvm.org/D83625	2020-07-15 16:50:08 +02:00
Vitaly Buka	c0fa632236	Remove NOLINTs from compiler-rt llvm-svn: 371687	2019-09-11 23:19:48 +00:00
Nico Weber	bb7ad98a47	Follow-up for r367863 and r367656 llvm-svn: 367888	2019-08-05 16:50:56 +00:00
Nico Weber	5a3bb1a4d6	compiler-rt: Rename .cc file in lib/tsan/rtl to .cpp Like r367463, but for tsan/rtl. llvm-svn: 367564	2019-08-01 14:22:42 +00:00

50 Commits