298 lines
7.9 KiB
Markdown
298 lines
7.9 KiB
Markdown
# BCC-Style eBPF Tracing Implementation
|
|
|
|
## Overview
|
|
|
|
This implementation adds powerful BCC-style (Berkeley Packet Filter Compiler) tracing capabilities to the diagnostic agent, similar to the `trace.py` tool from the iovisor BCC project. Instead of just filtering events, this system actually counts and traces real system calls with detailed argument parsing.
|
|
|
|
## Key Features
|
|
|
|
### 1. Real System Call Tracing
|
|
- **Actual event counting**: Unlike the previous implementation that just simulated events, this captures real system calls
|
|
- **Argument extraction**: Extracts function arguments (arg1, arg2, etc.) and return values
|
|
- **Multiple probe types**: Supports kprobes, kretprobes, tracepoints, and uprobes
|
|
- **Filtering capabilities**: Filter by process name, PID, UID, argument values
|
|
|
|
### 2. BCC-Style Syntax
|
|
Supports familiar BCC trace.py syntax patterns:
|
|
```bash
|
|
# Simple syscall tracing
|
|
"sys_open" # Trace open syscalls
|
|
"sys_read (arg3 > 1024)" # Trace reads >1024 bytes
|
|
"r::sys_open" # Return probe on open
|
|
|
|
# With format strings
|
|
"sys_write \"wrote %d bytes\", arg3"
|
|
"sys_open \"opening %s\", arg2@user"
|
|
```
|
|
|
|
### 3. Comprehensive Event Data
|
|
Each trace captures:
|
|
```json
|
|
{
|
|
"timestamp": 1234567890,
|
|
"pid": 1234,
|
|
"tid": 1234,
|
|
"process_name": "nginx",
|
|
"function": "__x64_sys_openat",
|
|
"message": "opening file: /var/log/access.log",
|
|
"raw_args": {
|
|
"arg1": "3",
|
|
"arg2": "/var/log/access.log",
|
|
"arg3": "577"
|
|
}
|
|
}
|
|
```
|
|
|
|
## Architecture
|
|
|
|
### Core Components
|
|
|
|
1. **BCCTraceManager** (`ebpf_trace_manager.go`)
|
|
- Main orchestrator for BCC-style tracing
|
|
- Generates bpftrace scripts dynamically
|
|
- Manages trace sessions and event collection
|
|
|
|
2. **TraceSpec** - Trace specification format
|
|
```go
|
|
type TraceSpec struct {
|
|
ProbeType string // "p", "r", "t", "u"
|
|
Target string // Function/syscall to trace
|
|
Format string // Output format string
|
|
Arguments []string // Arguments to extract
|
|
Filter string // Filter conditions
|
|
Duration int // Trace duration in seconds
|
|
ProcessName string // Process filter
|
|
PID int // Process ID filter
|
|
UID int // User ID filter
|
|
}
|
|
```
|
|
|
|
3. **EventScanner** (`ebpf_event_parser.go`)
|
|
- Parses bpftrace output in real-time
|
|
- Converts raw trace data to structured events
|
|
- Handles argument extraction and enrichment
|
|
|
|
4. **TraceSpecBuilder** - Fluent API for building specs
|
|
```go
|
|
spec := NewTraceSpecBuilder().
|
|
Kprobe("__x64_sys_write").
|
|
Format("write %d bytes to fd %d", "arg3", "arg1").
|
|
Filter("arg1 == 1").
|
|
Duration(30).
|
|
Build()
|
|
```
|
|
|
|
## Usage Examples
|
|
|
|
### 1. Basic System Call Tracing
|
|
|
|
```go
|
|
// Trace file open operations
|
|
spec := TraceSpec{
|
|
ProbeType: "p",
|
|
Target: "__x64_sys_openat",
|
|
Format: "opening file: %s",
|
|
Arguments: []string{"arg2@user"},
|
|
Duration: 30,
|
|
}
|
|
|
|
traceID, err := manager.StartTrace(spec)
|
|
```
|
|
|
|
### 2. Filtered Tracing
|
|
|
|
```go
|
|
// Trace only large reads
|
|
spec := TraceSpec{
|
|
ProbeType: "p",
|
|
Target: "__x64_sys_read",
|
|
Format: "read %d bytes from fd %d",
|
|
Arguments: []string{"arg3", "arg1"},
|
|
Filter: "arg3 > 1024",
|
|
Duration: 30,
|
|
}
|
|
```
|
|
|
|
### 3. Process-Specific Tracing
|
|
|
|
```go
|
|
// Trace only nginx processes
|
|
spec := TraceSpec{
|
|
ProbeType: "p",
|
|
Target: "__x64_sys_write",
|
|
ProcessName: "nginx",
|
|
Duration: 60,
|
|
}
|
|
```
|
|
|
|
### 4. Return Value Tracing
|
|
|
|
```go
|
|
// Trace return values from file operations
|
|
spec := TraceSpec{
|
|
ProbeType: "r",
|
|
Target: "__x64_sys_openat",
|
|
Format: "open returned: %d",
|
|
Arguments: []string{"retval"},
|
|
Duration: 30,
|
|
}
|
|
```
|
|
|
|
## Integration with Agent
|
|
|
|
### API Request Format
|
|
The remote API can send trace specifications in the `ebpf_programs` field:
|
|
|
|
```json
|
|
{
|
|
"commands": [
|
|
{"id": "cmd1", "command": "ps aux"}
|
|
],
|
|
"ebpf_programs": [
|
|
{
|
|
"name": "file_monitoring",
|
|
"type": "kprobe",
|
|
"target": "sys_open",
|
|
"duration": 30,
|
|
"filters": {"process": "nginx"},
|
|
"description": "Monitor file access by nginx"
|
|
}
|
|
]
|
|
}
|
|
```
|
|
|
|
### Agent Response Format
|
|
The agent returns detailed trace results:
|
|
|
|
```json
|
|
{
|
|
"name": "__x64_sys_openat",
|
|
"type": "bcc_trace",
|
|
"target": "__x64_sys_openat",
|
|
"duration": 30,
|
|
"status": "completed",
|
|
"success": true,
|
|
"event_count": 45,
|
|
"events": [
|
|
{
|
|
"timestamp": 1234567890,
|
|
"pid": 1234,
|
|
"process_name": "nginx",
|
|
"function": "__x64_sys_openat",
|
|
"message": "opening file: /var/log/access.log",
|
|
"raw_args": {"arg1": "3", "arg2": "/var/log/access.log"}
|
|
}
|
|
],
|
|
"statistics": {
|
|
"total_events": 45,
|
|
"events_per_second": 1.5,
|
|
"top_processes": [
|
|
{"process_name": "nginx", "event_count": 30},
|
|
{"process_name": "apache", "event_count": 15}
|
|
]
|
|
}
|
|
}
|
|
```
|
|
|
|
## Test Specifications
|
|
|
|
The implementation includes test specifications for unit testing:
|
|
|
|
- **test_sys_open**: File open operations
|
|
- **test_sys_read**: Read operations with filters
|
|
- **test_sys_write**: Write operations
|
|
- **test_process_creation**: Process execution
|
|
- **test_kretprobe**: Return value tracing
|
|
- **test_with_filter**: Filtered tracing
|
|
|
|
## Running Tests
|
|
|
|
```bash
|
|
# Run all BCC tracing tests
|
|
go test -v -run TestBCCTracing
|
|
|
|
# Test trace manager capabilities
|
|
go test -v -run TestTraceManagerCapabilities
|
|
|
|
# Test syscall suggestions
|
|
go test -v -run TestSyscallSuggestions
|
|
|
|
# Run all tests
|
|
go test -v
|
|
```
|
|
|
|
## Requirements
|
|
|
|
### System Requirements
|
|
- **Linux kernel 4.4+** with eBPF support
|
|
- **bpftrace** installed (`apt install bpftrace`)
|
|
- **Root privileges** for actual tracing
|
|
|
|
### Checking Capabilities
|
|
The trace manager automatically detects capabilities:
|
|
|
|
```bash
|
|
$ go test -run TestTraceManagerCapabilities
|
|
🔧 Trace Manager Capabilities:
|
|
✅ kernel_ebpf: Available
|
|
✅ bpftrace: Available
|
|
❌ root_access: Not Available
|
|
❌ debugfs_access: Not Available
|
|
```
|
|
|
|
## Advanced Features
|
|
|
|
### 1. Syscall Suggestions
|
|
The system can suggest appropriate syscalls based on issue descriptions:
|
|
|
|
```go
|
|
suggestions := SuggestSyscallTargets("file not found error")
|
|
// Returns: ["test_sys_open", "test_sys_read", "test_sys_write", "test_sys_unlink"]
|
|
```
|
|
|
|
### 2. BCC-Style Parsing
|
|
Parse BCC trace.py style specifications:
|
|
|
|
```go
|
|
parser := NewTraceSpecParser()
|
|
spec, err := parser.ParseFromBCCStyle("sys_write (arg1 == 1) \"stdout: %d bytes\", arg3")
|
|
```
|
|
|
|
### 3. Event Filtering and Aggregation
|
|
Post-processing capabilities for trace events:
|
|
|
|
```go
|
|
filter := &TraceEventFilter{
|
|
ProcessNames: []string{"nginx", "apache"},
|
|
MinTimestamp: startTime,
|
|
}
|
|
filteredEvents := filter.ApplyFilter(events)
|
|
|
|
aggregator := NewTraceEventAggregator(events)
|
|
topProcesses := aggregator.GetTopProcesses(5)
|
|
eventRate := aggregator.GetEventRate()
|
|
```
|
|
|
|
## Performance Considerations
|
|
|
|
- **Short durations**: Test specs use 5-second durations for quick testing
|
|
- **Efficient parsing**: Event scanner processes bpftrace output in real-time
|
|
- **Memory management**: Events are processed and aggregated efficiently
|
|
- **Timeout handling**: Automatic cleanup of hanging trace sessions
|
|
|
|
## Security Considerations
|
|
|
|
- **Root privileges required**: eBPF tracing requires root access
|
|
- **Resource limits**: Maximum trace duration of 10 minutes
|
|
- **Process isolation**: Each trace runs in its own context
|
|
- **Automatic cleanup**: Traces are automatically stopped and cleaned up
|
|
|
|
## Future Enhancements
|
|
|
|
1. **USDT probe support**: Add support for user-space tracing
|
|
2. **BTF integration**: Use BPF Type Format for better type information
|
|
3. **Flame graph generation**: Generate performance flame graphs
|
|
4. **Custom eBPF programs**: Allow uploading custom eBPF bytecode
|
|
5. **Distributed tracing**: Correlation across multiple hosts
|
|
|
|
This implementation provides a solid foundation for advanced system introspection and debugging, bringing the power of BCC-style tracing to the diagnostic agent. |