forked from ccfos/huatuo
136 lines
6.7 KiB
Markdown
136 lines
6.7 KiB
Markdown
[简体中文](./CUSTOM_CN.md) | English
|
|
|
|
HuaTuo framework provides three data collection modes: `autotracing`, `event`, and `metrics`, covering different monitoring scenarios, helping users gain comprehensive insights into system performance.
|
|
|
|
## Collection Mode Comparison
|
|
| Mode | Type | Trigger Condition | Data Output | Use Case |
|
|
|-----------------|----------------|-------------------|------------------|----------------|
|
|
| **Autotracing** | Event-driven | Triggered on system anomalies | ES + Local Storage, Prometheus (optional) | Non-routine operations, triggered on anomalies |
|
|
| **Event** | Event-driven | Continuously running, triggered on preset thresholds | ES + Local Storage, Prometheus (optional) | Continuous operations, directly dump context |
|
|
| **Metrics** | Metric collection | Passive collection | Prometheus format | Monitoring system metrics |
|
|
|
|
- **Autotracing**
|
|
- **Type**: Event-driven (tracing).
|
|
- **Function**: Automatically tracks system anomalies and dump context when anomalies occur.
|
|
- **Features**:
|
|
- When a system anomaly occurs, `autotracing` is triggered automatically to dump relevant context.
|
|
- Data is stored to ES in real-time and stored locally for subsequent analysis and troubleshooting. It can also be monitored in Prometheus format for statistics and alerts.
|
|
- Suitable for scenarios with high performance overhead, such as triggering captures when metrics exceed a threshold or rise too quickly.
|
|
- **Integrated Features**: CPU anomaly tracking (cpu idle), D-state tracking (dload), container contention (waitrate), memory burst allocation (memburst), disk anomaly tracking (iotracer).
|
|
|
|
- **Event**
|
|
- **Type**: Event-driven (tracing).
|
|
- **Function**: Continuously operates within the system context, directly dump context when preset thresholds are met.
|
|
- **Features**:
|
|
- Unlike `autotracing`, `event` continuously operates within the system context, rather than being triggered by anomalies.
|
|
- Data is also stored to ES and locally, and can be monitored in Prometheus format.
|
|
- Suitable for continuous monitoring and real-time analysis, enabling timely detection of abnormal behaviors. The performance impact of `event` collection is negligible.
|
|
- **Integrated Features**: Soft interrupt anomalies (softirq), memory allocation anomalies (oom), soft lockups (softlockup), D-state processes (hungtask), memory reclamation (memreclaim), packet droped abnormal (dropwatch), network ingress latency (netrecvlat).
|
|
|
|
- **Metrics**
|
|
- **Type**: Metric collection.
|
|
- **Function**: Collects performance metrics from subsystems.
|
|
- **Features**:
|
|
- Metric data can be sourced from regular procfs collection or derived from `tracing` (autotracing, event) data.
|
|
- Outputs in Prometheus format for easy integration into Prometheus monitoring systems.
|
|
- Unlike `tracing` data, `metrics` primarily focus on system performance metrics such as CPU usage, memory usage, and network traffic, etc.
|
|
- Suitable for monitoring system performance metrics, supporting real-time analysis and long-term trend observation.
|
|
- **Integrated Features**: CPU (sys, usr, util, load, nr_running, etc.), memory (vmstat, memory_stat, directreclaim, asyncreclaim, etc.), IO (d2c, q2c, freeze, flush, etc.), network (arp, socket mem, qdisc, netstat, netdev, sockstat, etc.).
|
|
|
|
## Multiple Purpose of Tracing Mode
|
|
Both `autotracing` and `event` belong to the **tracing** collection mode, offering the following dual purposes:
|
|
1. **Real-time storage to ES and local storage**: For tracing and analyzing anomalies, helping users quickly identify root causes.
|
|
2. **Output in Prometheus format**: As metric data integrated into Prometheus monitoring systems, providing comprehensive system monitoring capabilities.
|
|
|
|
By flexibly combining these three modes, users can comprehensively monitor system performance, capturing both contextual information during anomalies and continuous performance metrics to meet various monitoring needs.
|
|
|
|
# How to Add Custom Collection
|
|
The framework provides convenient APIs, including module startup, data storage, container information, BPF-related (load, attach, read, detach, unload), etc. You can implement custom collection logic and flexibly choose the appropriate collection mode and storage method.
|
|
|
|
## Tracing Type
|
|
Based on your scenarios, you can implement the `ITracingEvent` interface in the `core/autotracing` or `core/events` directory to complete tracing-type collection.
|
|
```go
|
|
// ITracingEvent represents a tracing/event
|
|
type ITracingEvent interface {
|
|
Start(ctx context.Context) error
|
|
}
|
|
```
|
|
|
|
example:
|
|
```go
|
|
type exampleTracing struct{}
|
|
|
|
// Register callback
|
|
func init() {
|
|
tracing.RegisterEventTracing("example", newExample)
|
|
}
|
|
|
|
// Create tracing
|
|
func newExample() (*tracing.EventTracingAttr, error) {
|
|
return &tracing.EventTracingAttr{
|
|
TracingData: &exampleTracing{},
|
|
Internal: 10, // Interval for enable tracing again (in seconds)
|
|
Flag: tracing.FlagTracing, // mark as tracing type
|
|
}, nil
|
|
}
|
|
|
|
// Implement ITracingEvent
|
|
func (t *exampleTracing) Start(ctx context.Context) error {
|
|
// do something
|
|
...
|
|
|
|
// Save data to ES and local file
|
|
storage.Save("example", ccontainerID, time.Now(), tracerData)
|
|
}
|
|
|
|
// Implement Collector interface for Prometheus format output (optional)
|
|
func (c *exampleTracing) Update() ([]*metric.Data, error) {
|
|
// from tracerData to prometheus.Metric
|
|
...
|
|
|
|
return data, nil
|
|
}
|
|
```
|
|
|
|
## Metric Type
|
|
Implement the `Collector` interface in the path `core/metrics` to complete metric-type collection.
|
|
|
|
```go
|
|
type Collector interface {
|
|
// Get new metrics and expose them via prometheus registry.
|
|
Update() ([]*Data, error)
|
|
}
|
|
```
|
|
|
|
example:
|
|
```go
|
|
type exampleMetric struct{}
|
|
|
|
// Register callback
|
|
func init() {
|
|
tracing.RegisterEventTracing("example", newExample)
|
|
}
|
|
|
|
// Create Metric
|
|
func newExample() (*tracing.EventTracingAttr, error) {
|
|
return &tracing.EventTracingAttr{
|
|
TracingData: &filenrCollector{
|
|
metric: []*metric.Data{
|
|
metric.NewGaugeData("name1", 0, "description of example_name1", nil),
|
|
metric.NewGaugeData("name2", 0, "description of example_name2", nil),
|
|
},
|
|
},
|
|
Flag: tracing.FlagMetric, // mark as Metric type
|
|
}, nil
|
|
}
|
|
|
|
// Implement Collector interface for Prometheus format output
|
|
func (c *exampleMetric) Update() ([]*metric.Data, error) {
|
|
// do something
|
|
...
|
|
|
|
return data, nil
|
|
}
|
|
```
|
|
|
|
The path `core` of the project includes multiple useful examples of the three collection modules, covering BPF code, map data interaction, container information, and more. For further details, refer to the corresponding code implementations. |