Go to file
fanzu8 4a0b112087 build: fix cgroup create failed if docker set Security Options: cgroupns
ref: https://docs.docker.com/reference/compose-file/services/#cgroup

Signed-off-by: fanzu8 <tuzengbing@gmail.com>
2025-07-16 16:30:27 +08:00
bpf vmlinux: use 4.18.0-193.6.3.el8_2.x86_64 as default header 2025-07-13 07:29:52 -04:00
build build: fix cgroup create failed if docker set Security Options: cgroupns 2025-07-16 16:30:27 +08:00
cmd/huatuo-bamai metric: add region option support 2025-07-13 06:54:04 -04:00
core bpf: rename metric softirq to softirq 2025-07-13 06:12:45 -04:00
docs doc: quick-start diagram from text to .png 2025-07-15 17:59:27 +08:00
internal cgrouputil: support cgroupv2 for runtime 2025-07-06 07:55:08 -04:00
pkg metric: add region option support 2025-07-13 06:54:04 -04:00
vendor cgrouputil: support cgroupv2 for runtime 2025-07-06 07:55:08 -04:00
.gitignore add .gitignore 2025-07-13 07:31:05 -04:00
.golangci.yaml build: support .golangci.yaml 2025-07-06 07:58:31 -04:00
Dockerfile build: add Dockerfile 2025-07-15 17:59:27 +08:00
LICENSE HUATUO: Initial Commit 2025-07-06 07:39:32 -04:00
Makefile HUATUO: Initial Commit 2025-07-06 07:39:32 -04:00
NEWS update NEWS 2025-07-13 08:29:10 -04:00
README.md build: fix cgroup create failed if docker set Security Options: cgroupns 2025-07-16 16:30:27 +08:00
README_EN.md build: fix cgroup create failed if docker set Security Options: cgroupns 2025-07-16 16:30:27 +08:00
go.mod update go.mod 2025-07-13 08:00:11 -04:00
go.sum HUATUO: Initial Commit 2025-07-06 07:39:32 -04:00
huatuo-bamai.conf HUATUO: Initial Commit 2025-07-06 07:39:32 -04:00

README_EN.md

简体中文 | English

Abstract

HuaTuo (华佗) aims to provide in-depth observability for the OS Linux kernel in complex cloud-native scenarios. The project is based on eBPF technology and has built a set of deep observation service components for the Linux kernel. By leveraging kernel dynamic tracing technologies such as kprobe, tracepoint, and ftrace, HuaTuo provides more observation perspectives for the Linux kernel, including kernel runtime context capture driven by anomalous events and more granular, accurate kernel per subsystem metrics.

HuaTuo also integrates core technologies such as automated tracing, profiling, and distributed tracing for system performance spikes. HuaTuo has been successfully applied on a large scale within Didi (DiDi Global Inc.), solidly guaranteeing the stability and performance optimization of cloud-native operating systems and showcasing the distinct advantages of eBPF technology in cloud-native scenarios.

Key Features

  • Continuous Kernel Observability: Achieves in-depth, low-overhead (less than 1% performance impact) instrumentation of various kernel subsystems, providing comprehensive metrics on memory, CPU scheduling, network stack, and disk I/O.
  • Kernel Anomaly-Driven Observability: Instruments the kernel's exception paths and slow paths to capture rich runtime context triggered by anomalous events, enabling more insightful observability data.
  • Automated Tracing (AutoTracing): Implements automated tracing capabilities to address system resource spikes and performance jitters (e.g., CPU idle drop, raising CPU sys utilization, I/O bursts, and Loadavg raising).
  • Smooth Transition to Popular Observability Stacks: Provides standard data sources for Prometheus and Pyroscope, integrates with Kubernetes container resources, and automatically correlates Kubernetes labels/annotations with kernel event metrics, eliminating data silos, ensuring seamless integration and analysis across various data sources for comprehensive system monitoring.

Getting Started

  • Instant Experience If you only care about the underlying principles and not about storage backends or frontend display, we provide a pre-built image containing all necessary components for HUATO's core operation. Just run:

    $ docker run --privileged --cgroupns=host --network=host -v /sys:/sys -v /run:/run huatuo/huatuo-bamai:latest
    
  • Quick Setup If you want to dive deeper into HUATO's operation mechanisms and architecture, you can easily set up all components locally. We provide container images and simple configurations for developers to quickly understand HUATO.

    HUATUO Component Workflow

    For a quick setup, we provide a one-command solution to launch elasticsearch, prometheus, grafana and huatuo-bamai. Once executed, click http://localhost:3000 to view the monitoring dashboards on your browser.

  • Data related to event-driven operations Autotracing and Events, are stored in elasticsearch

  • Metrics-related data is actively collected and stored by prometheus

  • elasticsearch data reporting port: 9200

  • prometheus data source port: 9090

  • grafana port: 3000

User-Defined Collection

The built-in modules cover most monitoring needs. Additionally, HuaTuo supports custom data collection with easy integration. How to Add Custom Collection

Architectures

Observability Overview

Exception Totals

Profiling

SKB dropwatch

Net Latency

Functionality Overview

Autotracing

Tracing Name Core Functionality Scenarios
cpu sys Detects rising host cpu.sys utilization Issues caused by abnormal cpu.sys load leading to jitters
cpu idle Detects low CPU idle in containers, provides call stack, flame graphs, process context info, etc. Abnormal container CPU usage, helps identify process hotspots
dload Tracks processes in the D (uninterruptible) state, provides container runtime info, D-state process call stack, etc. Issues caused by a sudden increase in the number of system D or R (runnable) state processes, leading to higher load. A spike in D-state processes is often related to unavailable resources or long-held locks, while R-state process spikes may indicate unreasonable user logic design
waitrate Detects CPU contention in containers, provides information about the contending containers CPU contention in containers can cause jitters, and the existing contention metrics lack specific container info. Waitrate tracking can provide the info about the containers involved in the contention, which can be used as a reference for resource isolation in hybrid deployment scenarios
mmburst Records burst memory allocation context Detects events where the host allocates a large amount of memory in a short time, which can lead to direct reclaim or OOM
iotracer When the host disk is full or I/O latency is abnormal, provides the file name, path, device, inode, and container context info for the abnormal I/O access Frequent disk I/O bandwidth saturation or sudden I/O spikes can lead to application request latency or system performance jitters

Events

Event Name Core Functionality Scenarios
softirq When the kernel delayed response in soft interrupts or prolonged shutdown, supports the call stack and process information of the soft interrupts that have been shut down for an extended period of time. This type of issue can severely impact network receive/transmit, leading to jitters or latency
dropwatch Detects TCP packet drops, provides host and network context info when drops occur This type of issue can cause jitters and latency
netrecvlat Captures latency events along the data packet receive path from the driver, TCP/IP stack, to user-level For network latency issues, there is a class where the receive-side exhibits latency, but the location is unclear. The netrecvlat case calculates latency by timestamping the skb at the interface, driver, TCP/IP stack, and user-level copy, and filters timed-out packets to point the latency location
oom Detects OOM events in the host or containers When OOM events occur at the host or container level, it can obtain information about the triggering process, the killed process, and container details, which is helpful for diagnosing process memory leaks, abnormal exits, etc.
softlockup When the system encounters a softlockup, it collects information about the target process, CPU, and kernel stack for per CPU Used for investigating system softlockup incidents
hungtask Provides the number of processes in the D (uninterruptible) state and their kernel stack info Used to identify and save the context of processes that suddenly enter the D state, for later investigation
memreclaim Records the latency when a process enters direct reclaim, if it exceeds a time threshold When under memory pressure, if a process requests memory, it may enter direct reclaim, a synchronous reclaim phase that can cause process jitters. This records the time a process spends in direct reclaim, helping assess the impact on the affected process

Metrics

Metrics collection involves various indicators from per subsystem, including CPU, memory, IO, network, etc. The primary sources of these metrics are procfs, eBPF, and computational aggregation, as follows is a summary. for details

Subsystem Metric Description Dimension
cpu sys, usr, util Percentage host, container
cpu burst, throttled Number of periods burst occurs, times the group has been throttled/limited container
cpu inner, exter_wait_rate Wait rate caused by processes inside/outside the container container
cpu nr_running, nr_uninterruptible The number of running/uninterruptible tasks in the container container
cpu load 1, 5, 15 System load avg over the last x minute container
cpu softirq_latency The number of NET_RX/NET_TX irq latency happened host
cpu runqlat_nlat The number of times when schedule latency of processes in host/container is within x~xms host, container
cpu reschedipi_oversell_probability The possibility of cpu overselling exists on the host where the vm is located host
memory direct_reclaim Time speed in page allocation in memory cgroup container
memory asyncreclaim Memory cgroup's direct reclaim time in cgroup async memory reclaim container
memory vmstat, memory_stat Memory statistics host, container
memory hungtask, oom, softlockup Count of event happened host, container
IO d2c Statistics of io latency when accessing the disk, including the time consumed by the driver and hardware components host, container
IO q2c Statistics of io latency for the entire io lifecycle when accessing the disk host, container
IO disk_freeze Statistics of disk freeze events host
IO disk_flush Statistics of delay for flush operations on disk raid device host, container
network arp ARP entries system, host, container
network tcp, udp mem Socket memory system
network qdisc Qdisc statistics host
network netdev Network device metrics host, container
network netstat Network statistics host, container
network sockstat Socket statistics host, container

Contact Us

You can report bugs, provide suggestions, or engage in discussions via Github Issues and Github Discussions. Alternatively, you can contact us using the following ways: