Commit Graph

50 Commits

Author SHA1 Message Date
Dmitry Vyukov 7ec308715c tsan: prevent pathological slowdown for spurious races
Prevent the following pathological behavior:
Since memory access handling is not synchronized with DoReset,
a thread running concurrently with DoReset can leave a bogus shadow value
that will be later falsely detected as a race. For such false races
RestoreStack will return false and we will not report it.
However, consider that a thread leaves a whole lot of such bogus values
and these values are later read by a whole lot of threads.
This will cause massive amounts of ReportRace calls and lots of
serialization. In very pathological cases the resulting slowdown
can be >100x. This is very unlikely, but it was presumably observed
in practice: https://github.com/google/sanitizers/issues/1552
If this happens, previous access sid+epoch will be the same for all of
these false races b/c if the thread will try to increment epoch, it will
notice that DoReset has happened and will stop producing bogus shadow
values. So, last_spurious_race is used to remember the last sid+epoch
for which RestoreStack returned false. Then it is used to filter out
races with the same sid+epoch very early and quickly.
It is of course possible that multiple threads left multiple bogus shadow
values and all of them are read by lots of threads at the same time.
In such case last_spurious_race will only be able to deduplicate a few
races from one thread, then few from another and so on. An alternative
would be to hold an array of such sid+epoch, but we consider such scenario
as even less likely.
Note: this can lead to some rare false negatives as well:
1. When a legit access with the same sid+epoch participates in a race
as the "previous" memory access, it will be wrongly filtered out.
2. When RestoreStack returns false for a legit memory access because it
was already evicted from the thread trace, we will still remember it in
last_spurious_race. Then if there is another racing memory access from
the same thread that happened in the same epoch, but was stored in the
next thread trace part (which is still preserved in the thread trace),
we will also wrongly filter it out while RestoreStack would actually
succeed for that second memory access.

Reviewed By: melver

Differential Revision: https://reviews.llvm.org/D130269
2022-07-25 10:40:11 +02:00
Dmitry Vyukov 7505cc301f tsan: remove tracking of racy addresses
We used to deduplicate based on the race address to prevent lots
of repeated reports about the same race.

But now we clear the shadow for the racy address in DoReportRace:

  // This prevents trapping on this address in future.
  for (uptr i = 0; i < kShadowCnt; i++)
    StoreShadow(&shadow_mem[i], i == 0 ? Shadow::kRodata : Shadow::kEmpty);

It should have the same effect of not reporting duplicates
(and actually better because it's automatically reset when the memory is reallocated).

So drop the address deduplication code. Both simpler and faster.

Reviewed By: melver

Differential Revision: https://reviews.llvm.org/D130240
2022-07-25 10:33:26 +02:00
Dmitry Vyukov 1d4d2cceda [TSan] Add a runtime flag to print full thread creation stacks up to the main thread
Currently, we only print how threads involved in data race are created from their parent threads.
Add a runtime flag 'print_full_thread_history' to print thread creation stacks for the threads involved in the data race and their ancestors up to the main thread.

Reviewed By: dvyukov

Differential Revision: https://reviews.llvm.org/D122131
2022-03-24 17:30:27 +01:00
Dmitry Vyukov 9e66e5872c tsan: print signal num in errno spoiling reports
For errno spoiling reports we only print the stack
where the signal handler is invoked. And the top
frame is the signal handler function, which is supposed
to give the info for debugging.
But in same cases the top frame can be some common thunk,
which does not give much info. E.g. for Go/cgo it's always
runtime.cgoSigtramp.

Print the signal number.
This is what we can easily gather and it may give at least
some hints regarding the issue.

Reviewed By: melver, vitalybuka

Differential Revision: https://reviews.llvm.org/D121979
2022-03-18 16:12:11 +01:00
Dmitry Vyukov 52a4a4a53c tsan: remove unused ReportMutex::destroyed
Depends on D113980.

Reviewed By: melver

Differential Revision: https://reviews.llvm.org/D113981
2021-12-21 11:37:01 +01:00
Dmitry Vyukov 69807fe161 tsan: change ReportMutex::id type to int
We used to use u64 as mutex id because it was some
tricky identifier built from address and reuse count.
Now it's just the mutex index in the report (0, 1, 2...),
so use int to represent it.

Depends on D112603.

Reviewed By: melver

Differential Revision: https://reviews.llvm.org/D113980
2021-12-21 11:36:49 +01:00
Dmitry Vyukov 2eb3e20461 tsan: fix deadlock during race reporting
SlotPairLocker calls SlotLock under ctx->multi_slot_mtx.
SlotLock can invoke global reset DoReset if we are out of slots/epochs.
But DoReset locks ctx->multi_slot_mtx as well, which leads to deadlock.

Resolve the deadlock by removing SlotPairLocker/multi_slot_mtx
and only lock one slot for which we will do RestoreStack.
We need to lock that slot because RestoreStack accesses the slot journal.
But it's unclear why we need to lock the current slot.
Initially I did it just to be on the safer side (but at that time
we dit not lock the second slot, so it was easy just to lock the current slot).

Reviewed By: melver

Differential Revision: https://reviews.llvm.org/D116040
2021-12-20 18:52:48 +01:00
Dmitry Vyukov b332134921 tsan: new runtime (v3)
This change switches tsan to the new runtime which features:
 - 2x smaller shadow memory (2x of app memory)
 - faster fully vectorized race detection
 - small fixed-size vector clocks (512b)
 - fast vectorized vector clock operations
 - unlimited number of alive threads/goroutimes

Depends on D112602.

Reviewed By: melver

Differential Revision: https://reviews.llvm.org/D112603
2021-12-13 12:48:34 +01:00
Jonas Devlieghere 396113c19f Revert "tsan: new runtime (v3)"
This reverts commit 5a33e41281 becuase it
breaks LLDB.

https://green.lab.llvm.org/green/view/LLDB/job/lldb-cmake/39208/
2021-12-09 09:18:10 -08:00
Dmitry Vyukov 5a33e41281 tsan: new runtime (v3)
This change switches tsan to the new runtime which features:
 - 2x smaller shadow memory (2x of app memory)
 - faster fully vectorized race detection
 - small fixed-size vector clocks (512b)
 - fast vectorized vector clock operations
 - unlimited number of alive threads/goroutimes

Depends on D112602.

Reviewed By: melver

Differential Revision: https://reviews.llvm.org/D112603
2021-12-09 09:09:52 +01:00
Dmitry Vyukov 09859113ed Revert "tsan: new runtime (v3)"
This reverts commit 66d4ce7e26.

Chromium tests started failing:
https://bugs.chromium.org/p/chromium/issues/detail?id=1275581
2021-12-01 18:00:46 +01:00
Dmitry Vyukov 66d4ce7e26 tsan: new runtime (v3)
This change switches tsan to the new runtime which features:
 - 2x smaller shadow memory (2x of app memory)
 - faster fully vectorized race detection
 - small fixed-size vector clocks (512b)
 - fast vectorized vector clock operations
 - unlimited number of alive threads/goroutimes

Depends on D112602.

Reviewed By: melver

Differential Revision: https://reviews.llvm.org/D112603
2021-11-25 18:32:04 +01:00
Dmitry Vyukov b584741d06 tsan: fix Java heap block begin in reports
We currently use a wrong value for heap block
(only works for C++, but not for Java).
Use the correct value (we already computed it before, just forgot to use).

Depends on D114593.

Reviewed By: melver

Differential Revision: https://reviews.llvm.org/D114595
2021-11-25 17:07:53 +01:00
Weverything 1150f02c77 Revert "tsan: new runtime (v3)"
This reverts commit ebd47b0fb7.
This was causing unexpected behavior in programs.
2021-11-23 18:32:32 -08:00
Dmitry Vyukov ebd47b0fb7 tsan: new runtime (v3)
This change switches tsan to the new runtime which features:
 - 2x smaller shadow memory (2x of app memory)
 - faster fully vectorized race detection
 - small fixed-size vector clocks (512b)
 - fast vectorized vector clock operations
 - unlimited number of alive threads/goroutimes

Differential Revision: https://reviews.llvm.org/D112603
2021-11-23 11:44:59 +01:00
Dmitry Vyukov 5f18ae3988 Revert "tsan: new runtime (v3)"
Summary:
This reverts commit 1784fe0532.

Broke some bots:
https://lab.llvm.org/buildbot#builders/57/builds/12365
http://green.lab.llvm.org/green/job/clang-stage1-RA/25658/

Reviewers: vitalybuka, melver

Subscribers:
2021-11-22 19:08:48 +01:00
Dmitry Vyukov 1784fe0532 tsan: new runtime (v3)
This change switches tsan to the new runtime which features:
 - 2x smaller shadow memory (2x of app memory)
 - faster fully vectorized race detection
 - small fixed-size vector clocks (512b)
 - fast vectorized vector clock operations
 - unlimited number of alive threads/goroutimes

Depends on D112602.

Reviewed By: melver

Differential Revision: https://reviews.llvm.org/D112603
2021-11-22 15:55:39 +01:00
Dmitry Vyukov 79fbba9b79 Revert "tsan: new runtime (v3)"
Summary:
This reverts commit ac95b8d954.
There is a number of bot failures:
http://45.33.8.238/mac/38755/step_4.txt
https://green.lab.llvm.org/green/view/LLDB/job/lldb-cmake/38135/consoleFull#-148886289949ba4694-19c4-4d7e-bec5-911270d8a58c

Reviewers: vitalybuka, melver

Subscribers:
2021-11-12 17:49:47 +01:00
Dmitry Vyukov ac95b8d954 tsan: new runtime (v3)
This change switches tsan to the new runtime which features:
 - 2x smaller shadow memory (2x of app memory)
 - faster fully vectorized race detection
 - small fixed-size vector clocks (512b)
 - fast vectorized vector clock operations
 - unlimited number of alive threads/goroutimes

Depends on D112602.

Reviewed By: melver

Differential Revision: https://reviews.llvm.org/D112603
2021-11-12 14:31:49 +01:00
Dmitry Vyukov 1b348902ea tsan: add DynamicMutexSet helper
MutexSet is too large to be allocated on stack.
But we need local MutexSet objects in few places
and use various hacks to allocate them.
Add DynamicMutexSet helper that simplifies allocation
of such objects.

Reviewed By: melver

Differential Revision: https://reviews.llvm.org/D112449
2021-10-25 19:45:06 +02:00
Dmitry Vyukov b02938439d tsan: uninline RacyStacks::operator==
It's only used during race reporting.
There is no point in polluting the main header file with it.

Reviewed By: xgupta

Differential Revision: https://reviews.llvm.org/D110470
2021-09-25 12:08:51 +02:00
Dmitry Vyukov 6fe35ef419 tsan: fix debug format strings
Some of the DPrintf's currently produce -Wformat warnings if enabled.
Fix these format strings.

Reviewed By: melver

Differential Revision: https://reviews.llvm.org/D110131
2021-09-21 13:23:10 +02:00
Marco Elver f3b3c964c3 Revert "[tsan] Fix GCC 8.3 build after D107911"
This reverts commit 797fe59e6b.

The use of "EventType type : 3" is replicated for all Event structs and
therefore was still present. As a result this still caused failures on
older GCCs (9.2 or 8.3 or earlier).

The particular bot that was failing due to buggy GCC was fixed by
fef39cc472.

Therefore, no reason to keep the workaround around; revert it.

Reviewed By: vitalybuka

Differential Revision: https://reviews.llvm.org/D108192
2021-08-17 19:26:20 +02:00
Vitaly Buka 797fe59e6b [tsan] Fix GCC 8.3 build after D107911
gcc 8.3 reports:
__tsan::v3::Event::type’ is too small to hold all values of ‘enum class __tsan::v3::EventType’
2021-08-16 16:18:42 -07:00
Dmitry Vyukov c97318996f tsan: add new trace
Add structures for the new trace format,
functions that serialize and add events to the trace
and trace replaying logic.

Differential Revision: https://reviews.llvm.org/D107911
2021-08-16 10:24:11 +02:00
Dmitry Vyukov a82c7476a7 tsan: introduce RawShadow type
Currently we hardcode u64 type for shadow everywhere
and do lots of uptr<->u64* casts. It makes it hard to
change u64 to another type (e.g. u32) and makes it easy
to introduce bugs.
Introduce RawShadow type and use it in MemToShadow, ShadowToMem,
IsShadowMem and throughout the code base as u64 replacement.
This makes it possible to change u64 to something else in future
and generally improves static typing.

Depends on D107481.

Reviewed By: vitalybuka

Differential Revision: https://reviews.llvm.org/D107482
2021-08-05 13:37:10 +02:00
Dmitry Vyukov 9e3e97aa81 tsan: refactor MetaMap::GetAndLock interface
Don't lock the sync object inside of MetaMap methods.
This has several advantages:
 - the new interface does not confuse thread-safety analysis
   so we can remove a bunch of NO_THREAD_SAFETY_ANALYSIS attributes
 - this allows use of scoped lock objects
 - this allows more flexibility, e.g. locking some other mutex
   between searching and locking the sync object
Also prefix the methods with GetSync to be consistent with GetBlock method.
Also make interface wrappers inlinable, otherwise we either end up with
2 copies of the method, or with an additional call.

Reviewed By: melver

Differential Revision: https://reviews.llvm.org/D107256
2021-08-02 13:29:46 +02:00
Dmitry Vyukov 103d075b05 tsan: introduce Tid and StackID typedefs
Currently we inconsistently use u32 and int for thread ids,
there are also "unique tid" and "os tid" and just lots of other
things identified by integers.
Additionally new tsan runtime will introduce yet another
thread identifier that is very different from current tids.
Similarly for stack IDs, it's easy to confuse u32 with other
integer identifiers. And when a function accepts u32 or a struct
contains u32 field, it's not always clear what it is.

Add Tid and StackID typedefs to make it clear what is what.

Reviewed By: melver

Differential Revision: https://reviews.llvm.org/D107152
2021-07-31 09:05:31 +02:00
Dmitry Vyukov 817f942a28 tsan: introduce New/Alloc/Free helpers
We frequenty allocate sizeof(T) memory and call T ctor on that memory
(C++ new keyword effectively). Currently it's quite verbose and
usually takes 2 lines of code.
Add New<T>() helper that does it much more concisely.

Rename internal_free to Free that also sets the pointer to nullptr.
Shorter and safer.

Rename internal_alloc to Alloc, just shorter.

Reviewed By: vitalybuka, melver

Differential Revision: https://reviews.llvm.org/D107085
2021-07-30 11:51:55 +02:00
Dmitry Vyukov 0d68cfc996 tsan: store ThreadRegistry in Context by value
It's unclear why we allocate ThreadRegistry separately,
I assume it's some historical leftover.
Embed ThreadRegistry into Context.

Reviewed By: melver

Differential Revision: https://reviews.llvm.org/D107045
2021-07-29 12:44:44 +02:00
Dmitry Vyukov 9dad34423b tsan: strip __libc_start_main frame
We strip all frames below main but in some cases it may be not enough.
Namely, when main is instrumented but does not call any other instrumented code.
In this case __tsan_func_entry in main obtains PC pointing to __libc_start_main
(as we pass caller PC to __tsan_func_entry), but nothing obtains PC pointing
to main itself (as main does not call any instrumented code).
In such case we will not have main in the stack, and stripping everything
below main won't work.
So strip __libc_start_main explicitly as well.
But keep stripping of main because __libc_start_main is glibc/linux-specific,
so looking for main is more reliable (and usually main is present in stacks).

Depends on D106957.

Reviewed By: vitalybuka

Differential Revision: https://reviews.llvm.org/D106958
2021-07-28 20:26:42 +02:00
Dmitry Vyukov 5237b14087 tsan: print alloc stack for Java objects
We maintain information about Java allocations,
but for some reason never printed it in reports.
Print it.

Reviewed By: vitalybuka

Differential Revision: https://reviews.llvm.org/D106956
2021-07-28 20:25:11 +02:00
Dmitry Vyukov b5bc386ca1 tsan: remove mblock types
We used to count number of allocations/bytes based on the type
and maybe record them in heap block headers.
But that's all in the past, now it's not used for anything.
Remove the mblock type.

Reviewed By: vitalybuka

Differential Revision: https://reviews.llvm.org/D106971
2021-07-28 20:09:25 +02:00
Dmitry Vyukov 0118a64934 tsan: switch to the new sanitizer_common mutex
Now that sanitizer_common mutex has feature-parity with tsan mutex,
switch tsan to the sanitizer_common mutex and remove tsan's custom mutex.

Reviewed By: vitalybuka, melver

Differential Revision: https://reviews.llvm.org/D106379
2021-07-23 09:13:26 +02:00
Dmitry Vyukov 8924d8e37e tsan: disable thread safety analysis in more functions
In preparation for replacing tsan Mutex with sanitizer_common Mutex,
which has thread-safety annotations. Thread safety analysis does not
understand MetaMap::GetAndLock which returns a locked sync object.

Reviewed By: vitalybuka, melver

Differential Revision: https://reviews.llvm.org/D106548
2021-07-23 09:12:59 +02:00
Nico Weber 557855e047 Revert "tsan: make obtaining current PC faster"
This reverts commit e33446ea58.
Doesn't build on mac, and causes other problems. See reports
on https://reviews.llvm.org/D106046 and https://reviews.llvm.org/D106081

Also revert follow-up "tsan: strip top inlined internal frames"
This reverts commit 7b302fc9b0.
2021-07-15 19:29:19 -04:00
Dmitry Vyukov 7b302fc9b0 tsan: strip top inlined internal frames
The new GET_CURRENT_PC() can lead to spurious top inlined internal frames.
Here are 2 examples from bots, in both cases the malloc is supposed to be
the top frame (#0):

  WARNING: ThreadSanitizer: signal-unsafe call inside of a signal
    #0 __sanitizer::StackTrace::GetNextInstructionPc(unsigned long)
    #1 malloc

  Location is heap block of size 99 at 0xbe3800003800 allocated by thread T1:
    #0 __sanitizer::StackTrace::GetNextInstructionPc(unsigned long)
    #1 malloc

Let's strip these internal top frames from reports.
With other code changes I also observed some top frames
from __tsan::ScopedInterceptor, proactively remove these as well.

Differential Revision: https://reviews.llvm.org/D106081
2021-07-15 19:37:44 +02:00
Dmitry Vyukov 2721e27c3a sanitizer_common: deduplicate CheckFailed
We have some significant amount of duplication around
CheckFailed functionality. Each sanitizer copy-pasted
a chunk of code. Some got random improvements like
dealing with recursive failures better. These improvements
could benefit all sanitizers, but they don't.

Deduplicate CheckFailed logic across sanitizers and let each
sanitizer only print the current stack trace.
I've tried to dedup stack printing as well,
but this got me into cmake hell. So let's keep this part
duplicated in each sanitizer for now.

Reviewed By: vitalybuka

Differential Revision: https://reviews.llvm.org/D102221
2021-05-12 08:50:53 +02:00
Dmitry Vyukov ed7bf7d73f tsan: refactor fork handling
Commit efd254b636 ("tsan: fix deadlock in pthread_atfork callbacks")
fixed another deadlock related to atfork handling.
But builders with DCHECKs enabled reported failures of
pthread_atfork_deadlock2.c and pthread_atfork_deadlock3.c tests
related to the fact that we hold runtime locks on interceptor exit:
https://lab.llvm.org/buildbot/#/builders/70/builds/6727
This issue is somewhat inherent to the current approach,
we indeed execute user code (atfork callbacks) with runtime lock held.

Refactor fork handling to not run user code (atfork callbacks)
with runtime locks held. This change does this by installing
own atfork callbacks during runtime initialization.
Atfork callbacks run in LIFO order, so the expectation is that
our callbacks run last, right before the actual fork.
This way we lock runtime mutexes around fork, but not around
user callbacks.

Extend tests to also install after fork callbacks just to cover
more scenarios. Some tests also started reporting real races
that we previously suppressed.

Also extend tests to cover fork syscall support.

Reviewed By: vitalybuka

Differential Revision: https://reviews.llvm.org/D101517
2021-04-30 08:48:20 +02:00
Tres Popp d1e08b124c Revert "tsan: refactor fork handling"
This reverts commit e1021dd1fd.
2021-04-28 14:08:33 +02:00
Dmitry Vyukov e1021dd1fd tsan: refactor fork handling
Commit efd254b636 ("tsan: fix deadlock in pthread_atfork callbacks")
fixed another deadlock related to atfork handling.
But builders with DCHECKs enabled reported failures of
pthread_atfork_deadlock2.c and pthread_atfork_deadlock3.c tests
related to the fact that we hold runtime locks on interceptor exit:
https://lab.llvm.org/buildbot/#/builders/70/builds/6727
This issue is somewhat inherent to the current approach,
we indeed execute user code (atfork callbacks) with runtime lock held.

Refactor fork handling to not run user code (atfork callbacks)
with runtime locks held. This change does this by installing
own atfork callbacks during runtime initialization.
Atfork callbacks run in LIFO order, so the expectation is that
our callbacks run last, right before the actual fork.
This way we lock runtime mutexes around fork, but not around
user callbacks.

Extend tests to also install after fork callbacks just to cover
more scenarios. Some tests also started reporting real races
that we previously suppressed.

Reviewed By: vitalybuka

Differential Revision: https://reviews.llvm.org/D101385
2021-04-27 22:37:27 +02:00
Evgenii Stepanov 5275d772da Revert "tsan: fix deadlock in pthread_atfork callbacks"
Tests fail on debug builders. See the forward fix in
https://reviews.llvm.org/D101385.

This reverts commit efd254b636.
2021-04-27 12:36:31 -07:00
Dmitry Vyukov efd254b636 tsan: fix deadlock in pthread_atfork callbacks
We take report/thread_registry locks around fork.
This means we cannot report any bugs in atfork handlers.
We resolved this by enabling per-thread ignores around fork.
This resolved some of the cases, but not all.
The added test triggers a race report from a signal handler
called from atfork callback, we reset per-thread ignores
around signal handlers, so we tried to report it and deadlocked.
But there are more cases: a signal handler can be called
synchronously if it's sent to itself. Or any other report
types would cause deadlocks as well: mutex misuse,
signal handler spoiling errno, etc.
Disable all reports for the duration of fork with
thr->suppress_reports and don't re-enable them around
signal handlers.

Reviewed By: vitalybuka

Differential Revision: https://reviews.llvm.org/D101154
2021-04-27 13:25:26 +02:00
Vitaly Buka 20e78eb304 [sanitizer][NFC] Fix few cpplint warnings 2020-10-13 20:39:37 -07:00
Joachim Protze 7358a1104a [TSan] Optimize handling of racy address
This patch splits the handling of racy address and racy stack into separate
functions. If a race was already reported for the address, we can avoid the
cost for collecting the involved stacks.

This patch also removes the race condition in storing the racy address / racy
stack. This race condition allowed all threads to report the race.

This patch changes the transitive suppression of reports. Previously
suppression could transitively chain memory location and racy stacks.
Now racy memory and racy stack are separate suppressions.

Commit again, now with fixed tests.

Reviewed by: dvyukov

Differential Revision: https://reviews.llvm.org/D83625
2020-07-16 16:22:57 +02:00
Joachim Protze d3849dddd2 Revert "[TSan] Optimize handling of racy address"
This reverts commit 00e3a1ddec.
The commit broke most build bots, investigating.
2020-07-15 17:40:28 +02:00
Joachim Protze 00e3a1ddec [TSan] Optimize handling of racy address
This patch splits the handling of racy address and racy stack into separate
functions. If a race was already reported for the address, we can avoid the
cost for collecting the involved stacks.

This patch also removes the race condition in storing the racy address / racy
stack. This race condition allowed all threads to report the race.

This patch changes the transitive suppression of reports. Previously
suppression could transitively chain memory location and racy stacks.
Now racy memory and racy stack are separate suppressions.

Reviewed by: dvyukov

Differential Revision: https://reviews.llvm.org/D83625
2020-07-15 16:50:08 +02:00
Vitaly Buka c0fa632236 Remove NOLINTs from compiler-rt
llvm-svn: 371687
2019-09-11 23:19:48 +00:00
Nico Weber bb7ad98a47 Follow-up for r367863 and r367656
llvm-svn: 367888
2019-08-05 16:50:56 +00:00
Nico Weber 5a3bb1a4d6 compiler-rt: Rename .cc file in lib/tsan/rtl to .cpp
Like r367463, but for tsan/rtl.

llvm-svn: 367564
2019-08-01 14:22:42 +00:00