7.9 KiB
BCC-Style eBPF Tracing Implementation
Overview
This implementation adds powerful BCC-style (Berkeley Packet Filter Compiler) tracing capabilities to the diagnostic agent, similar to the trace.py tool from the iovisor BCC project. Instead of just filtering events, this system actually counts and traces real system calls with detailed argument parsing.
Key Features
1. Real System Call Tracing
- Actual event counting: Unlike the previous implementation that just simulated events, this captures real system calls
- Argument extraction: Extracts function arguments (arg1, arg2, etc.) and return values
- Multiple probe types: Supports kprobes, kretprobes, tracepoints, and uprobes
- Filtering capabilities: Filter by process name, PID, UID, argument values
2. BCC-Style Syntax
Supports familiar BCC trace.py syntax patterns:
# Simple syscall tracing
"sys_open" # Trace open syscalls
"sys_read (arg3 > 1024)" # Trace reads >1024 bytes
"r::sys_open" # Return probe on open
# With format strings
"sys_write \"wrote %d bytes\", arg3"
"sys_open \"opening %s\", arg2@user"
3. Comprehensive Event Data
Each trace captures:
{
"timestamp": 1234567890,
"pid": 1234,
"tid": 1234,
"process_name": "nginx",
"function": "__x64_sys_openat",
"message": "opening file: /var/log/access.log",
"raw_args": {
"arg1": "3",
"arg2": "/var/log/access.log",
"arg3": "577"
}
}
Architecture
Core Components
-
BCCTraceManager (
ebpf_trace_manager.go)- Main orchestrator for BCC-style tracing
- Generates bpftrace scripts dynamically
- Manages trace sessions and event collection
-
TraceSpec - Trace specification format
type TraceSpec struct { ProbeType string // "p", "r", "t", "u" Target string // Function/syscall to trace Format string // Output format string Arguments []string // Arguments to extract Filter string // Filter conditions Duration int // Trace duration in seconds ProcessName string // Process filter PID int // Process ID filter UID int // User ID filter } -
EventScanner (
ebpf_event_parser.go)- Parses bpftrace output in real-time
- Converts raw trace data to structured events
- Handles argument extraction and enrichment
-
TraceSpecBuilder - Fluent API for building specs
spec := NewTraceSpecBuilder(). Kprobe("__x64_sys_write"). Format("write %d bytes to fd %d", "arg3", "arg1"). Filter("arg1 == 1"). Duration(30). Build()
Usage Examples
1. Basic System Call Tracing
// Trace file open operations
spec := TraceSpec{
ProbeType: "p",
Target: "__x64_sys_openat",
Format: "opening file: %s",
Arguments: []string{"arg2@user"},
Duration: 30,
}
traceID, err := manager.StartTrace(spec)
2. Filtered Tracing
// Trace only large reads
spec := TraceSpec{
ProbeType: "p",
Target: "__x64_sys_read",
Format: "read %d bytes from fd %d",
Arguments: []string{"arg3", "arg1"},
Filter: "arg3 > 1024",
Duration: 30,
}
3. Process-Specific Tracing
// Trace only nginx processes
spec := TraceSpec{
ProbeType: "p",
Target: "__x64_sys_write",
ProcessName: "nginx",
Duration: 60,
}
4. Return Value Tracing
// Trace return values from file operations
spec := TraceSpec{
ProbeType: "r",
Target: "__x64_sys_openat",
Format: "open returned: %d",
Arguments: []string{"retval"},
Duration: 30,
}
Integration with Agent
API Request Format
The remote API can send trace specifications in the ebpf_programs field:
{
"commands": [
{"id": "cmd1", "command": "ps aux"}
],
"ebpf_programs": [
{
"name": "file_monitoring",
"type": "kprobe",
"target": "sys_open",
"duration": 30,
"filters": {"process": "nginx"},
"description": "Monitor file access by nginx"
}
]
}
Agent Response Format
The agent returns detailed trace results:
{
"name": "__x64_sys_openat",
"type": "bcc_trace",
"target": "__x64_sys_openat",
"duration": 30,
"status": "completed",
"success": true,
"event_count": 45,
"events": [
{
"timestamp": 1234567890,
"pid": 1234,
"process_name": "nginx",
"function": "__x64_sys_openat",
"message": "opening file: /var/log/access.log",
"raw_args": {"arg1": "3", "arg2": "/var/log/access.log"}
}
],
"statistics": {
"total_events": 45,
"events_per_second": 1.5,
"top_processes": [
{"process_name": "nginx", "event_count": 30},
{"process_name": "apache", "event_count": 15}
]
}
}
Test Specifications
The implementation includes test specifications for unit testing:
- test_sys_open: File open operations
- test_sys_read: Read operations with filters
- test_sys_write: Write operations
- test_process_creation: Process execution
- test_kretprobe: Return value tracing
- test_with_filter: Filtered tracing
Running Tests
# Run all BCC tracing tests
go test -v -run TestBCCTracing
# Test trace manager capabilities
go test -v -run TestTraceManagerCapabilities
# Test syscall suggestions
go test -v -run TestSyscallSuggestions
# Run all tests
go test -v
Requirements
System Requirements
- Linux kernel 4.4+ with eBPF support
- bpftrace installed (
apt install bpftrace) - Root privileges for actual tracing
Checking Capabilities
The trace manager automatically detects capabilities:
$ go test -run TestTraceManagerCapabilities
🔧 Trace Manager Capabilities:
✅ kernel_ebpf: Available
✅ bpftrace: Available
❌ root_access: Not Available
❌ debugfs_access: Not Available
Advanced Features
1. Syscall Suggestions
The system can suggest appropriate syscalls based on issue descriptions:
suggestions := SuggestSyscallTargets("file not found error")
// Returns: ["test_sys_open", "test_sys_read", "test_sys_write", "test_sys_unlink"]
2. BCC-Style Parsing
Parse BCC trace.py style specifications:
parser := NewTraceSpecParser()
spec, err := parser.ParseFromBCCStyle("sys_write (arg1 == 1) \"stdout: %d bytes\", arg3")
3. Event Filtering and Aggregation
Post-processing capabilities for trace events:
filter := &TraceEventFilter{
ProcessNames: []string{"nginx", "apache"},
MinTimestamp: startTime,
}
filteredEvents := filter.ApplyFilter(events)
aggregator := NewTraceEventAggregator(events)
topProcesses := aggregator.GetTopProcesses(5)
eventRate := aggregator.GetEventRate()
Performance Considerations
- Short durations: Test specs use 5-second durations for quick testing
- Efficient parsing: Event scanner processes bpftrace output in real-time
- Memory management: Events are processed and aggregated efficiently
- Timeout handling: Automatic cleanup of hanging trace sessions
Security Considerations
- Root privileges required: eBPF tracing requires root access
- Resource limits: Maximum trace duration of 10 minutes
- Process isolation: Each trace runs in its own context
- Automatic cleanup: Traces are automatically stopped and cleaned up
Future Enhancements
- USDT probe support: Add support for user-space tracing
- BTF integration: Use BPF Type Format for better type information
- Flame graph generation: Generate performance flame graphs
- Custom eBPF programs: Allow uploading custom eBPF bytecode
- Distributed tracing: Correlation across multiple hosts
This implementation provides a solid foundation for advanced system introspection and debugging, bringing the power of BCC-style tracing to the diagnostic agent.