When creating and destroying fibers in tsan a thread state is created and destroyed. Currently, a memory mapping is leaked with each fiber (in __tsan_destroy_fiber). This causes applications with many short running fibers to crash or hang because of linux vm.max_map_count.
The root of this is that ThreadState holds a pointer to ThreadSignalContext for handling signals. The initialization and destruction of it is tied to platform specific events in tsan_interceptors_posix and missed when destroying a fiber (specifically, SigCtx is used to lazily create the ThreadSignalContext in tsan_interceptors_posix). This patch cleans up the memory by makinh the ThreadState create and destroy the ThreadSignalContext.
The relevant code causing the leak with fibers is the fiber destruction:
void FiberDestroy(ThreadState *thr, uptr pc, ThreadState *fiber) {
FiberSwitchImpl(thr, fiber);
ThreadFinish(fiber);
FiberSwitchImpl(fiber, thr);
internal_free(fiber);
}
Author: Florian
Reviewed-in: https://reviews.llvm.org/D76073
Temporarily revert "tsan: fix leak of ThreadSignalContext for fibers"
because it breaks the LLDB bot on GreenDragon.
This reverts commit 93f7743851.
This reverts commit d8a0f76de7.
When creating and destroying fibers in tsan a thread state
is created and destroyed. Currently, a memory mapping is
leaked with each fiber (in __tsan_destroy_fiber).
This causes applications with many short running fibers
to crash or hang because of linux vm.max_map_count.
The root of this is that ThreadState holds a pointer to
ThreadSignalContext for handling signals. The initialization
and destruction of it is tied to platform specific events
in tsan_interceptors_posix and missed when destroying a fiber
(specifically, SigCtx is used to lazily create the
ThreadSignalContext in tsan_interceptors_posix). This patch
cleans up the memory by inverting the control from the
platform specific code calling the generic ThreadFinish to
ThreadFinish calling a platform specific clean-up routine
after finishing a thread.
The relevant code causing the leak with fibers is the fiber destruction:
void FiberDestroy(ThreadState *thr, uptr pc, ThreadState *fiber) {
FiberSwitchImpl(thr, fiber);
ThreadFinish(fiber);
FiberSwitchImpl(fiber, thr);
internal_free(fiber);
}
I would appreciate feedback if this way of fixing the leak is ok.
Also, I think it would be worthwhile to more closely look at the
lifecycle of ThreadState (i.e. it uses no constructor/destructor,
thus requiring manual callbacks for cleanup) and how OS-Threads/user
level fibers are differentiated in the codebase. I would be happy to
contribute more if someone could point me at the right place to
discuss this issue.
Reviewed-in: https://reviews.llvm.org/D76073
Author: Florian (Florian)
Generally we ignore interceptors coming from called_from_lib-suppressed libraries.
However, we must not ignore critical interceptors like e.g. pthread_create,
otherwise runtime will lost track of threads.
pthread_detach is one of these interceptors we should not ignore as it affects
thread states and behavior of pthread_join which we don't ignore as well.
Currently we can produce very obscure false positives. For more context see:
https://groups.google.com/forum/#!topic/thread-sanitizer/ecH2P0QUqPs
The added test captures this pattern.
While we are here rename ThreadTid to ThreadConsumeTid to make it clear that
it's not just a "getter", it resets user_id to 0. This lead to confusion recently.
Reviewed in https://reviews.llvm.org/D74828