somewhat working ebpf bpftrace
This commit is contained in:
298
BCC_TRACING.md
Normal file
298
BCC_TRACING.md
Normal file
@@ -0,0 +1,298 @@
|
||||
# BCC-Style eBPF Tracing Implementation
|
||||
|
||||
## Overview
|
||||
|
||||
This implementation adds powerful BCC-style (Berkeley Packet Filter Compiler) tracing capabilities to the diagnostic agent, similar to the `trace.py` tool from the iovisor BCC project. Instead of just filtering events, this system actually counts and traces real system calls with detailed argument parsing.
|
||||
|
||||
## Key Features
|
||||
|
||||
### 1. Real System Call Tracing
|
||||
- **Actual event counting**: Unlike the previous implementation that just simulated events, this captures real system calls
|
||||
- **Argument extraction**: Extracts function arguments (arg1, arg2, etc.) and return values
|
||||
- **Multiple probe types**: Supports kprobes, kretprobes, tracepoints, and uprobes
|
||||
- **Filtering capabilities**: Filter by process name, PID, UID, argument values
|
||||
|
||||
### 2. BCC-Style Syntax
|
||||
Supports familiar BCC trace.py syntax patterns:
|
||||
```bash
|
||||
# Simple syscall tracing
|
||||
"sys_open" # Trace open syscalls
|
||||
"sys_read (arg3 > 1024)" # Trace reads >1024 bytes
|
||||
"r::sys_open" # Return probe on open
|
||||
|
||||
# With format strings
|
||||
"sys_write \"wrote %d bytes\", arg3"
|
||||
"sys_open \"opening %s\", arg2@user"
|
||||
```
|
||||
|
||||
### 3. Comprehensive Event Data
|
||||
Each trace captures:
|
||||
```json
|
||||
{
|
||||
"timestamp": 1234567890,
|
||||
"pid": 1234,
|
||||
"tid": 1234,
|
||||
"process_name": "nginx",
|
||||
"function": "__x64_sys_openat",
|
||||
"message": "opening file: /var/log/access.log",
|
||||
"raw_args": {
|
||||
"arg1": "3",
|
||||
"arg2": "/var/log/access.log",
|
||||
"arg3": "577"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Architecture
|
||||
|
||||
### Core Components
|
||||
|
||||
1. **BCCTraceManager** (`ebpf_trace_manager.go`)
|
||||
- Main orchestrator for BCC-style tracing
|
||||
- Generates bpftrace scripts dynamically
|
||||
- Manages trace sessions and event collection
|
||||
|
||||
2. **TraceSpec** - Trace specification format
|
||||
```go
|
||||
type TraceSpec struct {
|
||||
ProbeType string // "p", "r", "t", "u"
|
||||
Target string // Function/syscall to trace
|
||||
Format string // Output format string
|
||||
Arguments []string // Arguments to extract
|
||||
Filter string // Filter conditions
|
||||
Duration int // Trace duration in seconds
|
||||
ProcessName string // Process filter
|
||||
PID int // Process ID filter
|
||||
UID int // User ID filter
|
||||
}
|
||||
```
|
||||
|
||||
3. **EventScanner** (`ebpf_event_parser.go`)
|
||||
- Parses bpftrace output in real-time
|
||||
- Converts raw trace data to structured events
|
||||
- Handles argument extraction and enrichment
|
||||
|
||||
4. **TraceSpecBuilder** - Fluent API for building specs
|
||||
```go
|
||||
spec := NewTraceSpecBuilder().
|
||||
Kprobe("__x64_sys_write").
|
||||
Format("write %d bytes to fd %d", "arg3", "arg1").
|
||||
Filter("arg1 == 1").
|
||||
Duration(30).
|
||||
Build()
|
||||
```
|
||||
|
||||
## Usage Examples
|
||||
|
||||
### 1. Basic System Call Tracing
|
||||
|
||||
```go
|
||||
// Trace file open operations
|
||||
spec := TraceSpec{
|
||||
ProbeType: "p",
|
||||
Target: "__x64_sys_openat",
|
||||
Format: "opening file: %s",
|
||||
Arguments: []string{"arg2@user"},
|
||||
Duration: 30,
|
||||
}
|
||||
|
||||
traceID, err := manager.StartTrace(spec)
|
||||
```
|
||||
|
||||
### 2. Filtered Tracing
|
||||
|
||||
```go
|
||||
// Trace only large reads
|
||||
spec := TraceSpec{
|
||||
ProbeType: "p",
|
||||
Target: "__x64_sys_read",
|
||||
Format: "read %d bytes from fd %d",
|
||||
Arguments: []string{"arg3", "arg1"},
|
||||
Filter: "arg3 > 1024",
|
||||
Duration: 30,
|
||||
}
|
||||
```
|
||||
|
||||
### 3. Process-Specific Tracing
|
||||
|
||||
```go
|
||||
// Trace only nginx processes
|
||||
spec := TraceSpec{
|
||||
ProbeType: "p",
|
||||
Target: "__x64_sys_write",
|
||||
ProcessName: "nginx",
|
||||
Duration: 60,
|
||||
}
|
||||
```
|
||||
|
||||
### 4. Return Value Tracing
|
||||
|
||||
```go
|
||||
// Trace return values from file operations
|
||||
spec := TraceSpec{
|
||||
ProbeType: "r",
|
||||
Target: "__x64_sys_openat",
|
||||
Format: "open returned: %d",
|
||||
Arguments: []string{"retval"},
|
||||
Duration: 30,
|
||||
}
|
||||
```
|
||||
|
||||
## Integration with Agent
|
||||
|
||||
### API Request Format
|
||||
The remote API can send trace specifications in the `ebpf_programs` field:
|
||||
|
||||
```json
|
||||
{
|
||||
"commands": [
|
||||
{"id": "cmd1", "command": "ps aux"}
|
||||
],
|
||||
"ebpf_programs": [
|
||||
{
|
||||
"name": "file_monitoring",
|
||||
"type": "kprobe",
|
||||
"target": "sys_open",
|
||||
"duration": 30,
|
||||
"filters": {"process": "nginx"},
|
||||
"description": "Monitor file access by nginx"
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
### Agent Response Format
|
||||
The agent returns detailed trace results:
|
||||
|
||||
```json
|
||||
{
|
||||
"name": "__x64_sys_openat",
|
||||
"type": "bcc_trace",
|
||||
"target": "__x64_sys_openat",
|
||||
"duration": 30,
|
||||
"status": "completed",
|
||||
"success": true,
|
||||
"event_count": 45,
|
||||
"events": [
|
||||
{
|
||||
"timestamp": 1234567890,
|
||||
"pid": 1234,
|
||||
"process_name": "nginx",
|
||||
"function": "__x64_sys_openat",
|
||||
"message": "opening file: /var/log/access.log",
|
||||
"raw_args": {"arg1": "3", "arg2": "/var/log/access.log"}
|
||||
}
|
||||
],
|
||||
"statistics": {
|
||||
"total_events": 45,
|
||||
"events_per_second": 1.5,
|
||||
"top_processes": [
|
||||
{"process_name": "nginx", "event_count": 30},
|
||||
{"process_name": "apache", "event_count": 15}
|
||||
]
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Test Specifications
|
||||
|
||||
The implementation includes test specifications for unit testing:
|
||||
|
||||
- **test_sys_open**: File open operations
|
||||
- **test_sys_read**: Read operations with filters
|
||||
- **test_sys_write**: Write operations
|
||||
- **test_process_creation**: Process execution
|
||||
- **test_kretprobe**: Return value tracing
|
||||
- **test_with_filter**: Filtered tracing
|
||||
|
||||
## Running Tests
|
||||
|
||||
```bash
|
||||
# Run all BCC tracing tests
|
||||
go test -v -run TestBCCTracing
|
||||
|
||||
# Test trace manager capabilities
|
||||
go test -v -run TestTraceManagerCapabilities
|
||||
|
||||
# Test syscall suggestions
|
||||
go test -v -run TestSyscallSuggestions
|
||||
|
||||
# Run all tests
|
||||
go test -v
|
||||
```
|
||||
|
||||
## Requirements
|
||||
|
||||
### System Requirements
|
||||
- **Linux kernel 4.4+** with eBPF support
|
||||
- **bpftrace** installed (`apt install bpftrace`)
|
||||
- **Root privileges** for actual tracing
|
||||
|
||||
### Checking Capabilities
|
||||
The trace manager automatically detects capabilities:
|
||||
|
||||
```bash
|
||||
$ go test -run TestTraceManagerCapabilities
|
||||
🔧 Trace Manager Capabilities:
|
||||
✅ kernel_ebpf: Available
|
||||
✅ bpftrace: Available
|
||||
❌ root_access: Not Available
|
||||
❌ debugfs_access: Not Available
|
||||
```
|
||||
|
||||
## Advanced Features
|
||||
|
||||
### 1. Syscall Suggestions
|
||||
The system can suggest appropriate syscalls based on issue descriptions:
|
||||
|
||||
```go
|
||||
suggestions := SuggestSyscallTargets("file not found error")
|
||||
// Returns: ["test_sys_open", "test_sys_read", "test_sys_write", "test_sys_unlink"]
|
||||
```
|
||||
|
||||
### 2. BCC-Style Parsing
|
||||
Parse BCC trace.py style specifications:
|
||||
|
||||
```go
|
||||
parser := NewTraceSpecParser()
|
||||
spec, err := parser.ParseFromBCCStyle("sys_write (arg1 == 1) \"stdout: %d bytes\", arg3")
|
||||
```
|
||||
|
||||
### 3. Event Filtering and Aggregation
|
||||
Post-processing capabilities for trace events:
|
||||
|
||||
```go
|
||||
filter := &TraceEventFilter{
|
||||
ProcessNames: []string{"nginx", "apache"},
|
||||
MinTimestamp: startTime,
|
||||
}
|
||||
filteredEvents := filter.ApplyFilter(events)
|
||||
|
||||
aggregator := NewTraceEventAggregator(events)
|
||||
topProcesses := aggregator.GetTopProcesses(5)
|
||||
eventRate := aggregator.GetEventRate()
|
||||
```
|
||||
|
||||
## Performance Considerations
|
||||
|
||||
- **Short durations**: Test specs use 5-second durations for quick testing
|
||||
- **Efficient parsing**: Event scanner processes bpftrace output in real-time
|
||||
- **Memory management**: Events are processed and aggregated efficiently
|
||||
- **Timeout handling**: Automatic cleanup of hanging trace sessions
|
||||
|
||||
## Security Considerations
|
||||
|
||||
- **Root privileges required**: eBPF tracing requires root access
|
||||
- **Resource limits**: Maximum trace duration of 10 minutes
|
||||
- **Process isolation**: Each trace runs in its own context
|
||||
- **Automatic cleanup**: Traces are automatically stopped and cleaned up
|
||||
|
||||
## Future Enhancements
|
||||
|
||||
1. **USDT probe support**: Add support for user-space tracing
|
||||
2. **BTF integration**: Use BPF Type Format for better type information
|
||||
3. **Flame graph generation**: Generate performance flame graphs
|
||||
4. **Custom eBPF programs**: Allow uploading custom eBPF bytecode
|
||||
5. **Distributed tracing**: Correlation across multiple hosts
|
||||
|
||||
This implementation provides a solid foundation for advanced system introspection and debugging, bringing the power of BCC-style tracing to the diagnostic agent.
|
||||
Reference in New Issue
Block a user