somewhat working ebpf bpftrace

This commit is contained in:
Harshavardhan Musanalli
2025-11-08 20:42:07 +01:00
parent 190e54dd38
commit 794111cb44
16 changed files with 2834 additions and 216 deletions

298
BCC_TRACING.md Normal file
View File

@@ -0,0 +1,298 @@
# BCC-Style eBPF Tracing Implementation
## Overview
This implementation adds powerful BCC-style (Berkeley Packet Filter Compiler) tracing capabilities to the diagnostic agent, similar to the `trace.py` tool from the iovisor BCC project. Instead of just filtering events, this system actually counts and traces real system calls with detailed argument parsing.
## Key Features
### 1. Real System Call Tracing
- **Actual event counting**: Unlike the previous implementation that just simulated events, this captures real system calls
- **Argument extraction**: Extracts function arguments (arg1, arg2, etc.) and return values
- **Multiple probe types**: Supports kprobes, kretprobes, tracepoints, and uprobes
- **Filtering capabilities**: Filter by process name, PID, UID, argument values
### 2. BCC-Style Syntax
Supports familiar BCC trace.py syntax patterns:
```bash
# Simple syscall tracing
"sys_open" # Trace open syscalls
"sys_read (arg3 > 1024)" # Trace reads >1024 bytes
"r::sys_open" # Return probe on open
# With format strings
"sys_write \"wrote %d bytes\", arg3"
"sys_open \"opening %s\", arg2@user"
```
### 3. Comprehensive Event Data
Each trace captures:
```json
{
"timestamp": 1234567890,
"pid": 1234,
"tid": 1234,
"process_name": "nginx",
"function": "__x64_sys_openat",
"message": "opening file: /var/log/access.log",
"raw_args": {
"arg1": "3",
"arg2": "/var/log/access.log",
"arg3": "577"
}
}
```
## Architecture
### Core Components
1. **BCCTraceManager** (`ebpf_trace_manager.go`)
- Main orchestrator for BCC-style tracing
- Generates bpftrace scripts dynamically
- Manages trace sessions and event collection
2. **TraceSpec** - Trace specification format
```go
type TraceSpec struct {
ProbeType string // "p", "r", "t", "u"
Target string // Function/syscall to trace
Format string // Output format string
Arguments []string // Arguments to extract
Filter string // Filter conditions
Duration int // Trace duration in seconds
ProcessName string // Process filter
PID int // Process ID filter
UID int // User ID filter
}
```
3. **EventScanner** (`ebpf_event_parser.go`)
- Parses bpftrace output in real-time
- Converts raw trace data to structured events
- Handles argument extraction and enrichment
4. **TraceSpecBuilder** - Fluent API for building specs
```go
spec := NewTraceSpecBuilder().
Kprobe("__x64_sys_write").
Format("write %d bytes to fd %d", "arg3", "arg1").
Filter("arg1 == 1").
Duration(30).
Build()
```
## Usage Examples
### 1. Basic System Call Tracing
```go
// Trace file open operations
spec := TraceSpec{
ProbeType: "p",
Target: "__x64_sys_openat",
Format: "opening file: %s",
Arguments: []string{"arg2@user"},
Duration: 30,
}
traceID, err := manager.StartTrace(spec)
```
### 2. Filtered Tracing
```go
// Trace only large reads
spec := TraceSpec{
ProbeType: "p",
Target: "__x64_sys_read",
Format: "read %d bytes from fd %d",
Arguments: []string{"arg3", "arg1"},
Filter: "arg3 > 1024",
Duration: 30,
}
```
### 3. Process-Specific Tracing
```go
// Trace only nginx processes
spec := TraceSpec{
ProbeType: "p",
Target: "__x64_sys_write",
ProcessName: "nginx",
Duration: 60,
}
```
### 4. Return Value Tracing
```go
// Trace return values from file operations
spec := TraceSpec{
ProbeType: "r",
Target: "__x64_sys_openat",
Format: "open returned: %d",
Arguments: []string{"retval"},
Duration: 30,
}
```
## Integration with Agent
### API Request Format
The remote API can send trace specifications in the `ebpf_programs` field:
```json
{
"commands": [
{"id": "cmd1", "command": "ps aux"}
],
"ebpf_programs": [
{
"name": "file_monitoring",
"type": "kprobe",
"target": "sys_open",
"duration": 30,
"filters": {"process": "nginx"},
"description": "Monitor file access by nginx"
}
]
}
```
### Agent Response Format
The agent returns detailed trace results:
```json
{
"name": "__x64_sys_openat",
"type": "bcc_trace",
"target": "__x64_sys_openat",
"duration": 30,
"status": "completed",
"success": true,
"event_count": 45,
"events": [
{
"timestamp": 1234567890,
"pid": 1234,
"process_name": "nginx",
"function": "__x64_sys_openat",
"message": "opening file: /var/log/access.log",
"raw_args": {"arg1": "3", "arg2": "/var/log/access.log"}
}
],
"statistics": {
"total_events": 45,
"events_per_second": 1.5,
"top_processes": [
{"process_name": "nginx", "event_count": 30},
{"process_name": "apache", "event_count": 15}
]
}
}
```
## Test Specifications
The implementation includes test specifications for unit testing:
- **test_sys_open**: File open operations
- **test_sys_read**: Read operations with filters
- **test_sys_write**: Write operations
- **test_process_creation**: Process execution
- **test_kretprobe**: Return value tracing
- **test_with_filter**: Filtered tracing
## Running Tests
```bash
# Run all BCC tracing tests
go test -v -run TestBCCTracing
# Test trace manager capabilities
go test -v -run TestTraceManagerCapabilities
# Test syscall suggestions
go test -v -run TestSyscallSuggestions
# Run all tests
go test -v
```
## Requirements
### System Requirements
- **Linux kernel 4.4+** with eBPF support
- **bpftrace** installed (`apt install bpftrace`)
- **Root privileges** for actual tracing
### Checking Capabilities
The trace manager automatically detects capabilities:
```bash
$ go test -run TestTraceManagerCapabilities
🔧 Trace Manager Capabilities:
✅ kernel_ebpf: Available
✅ bpftrace: Available
❌ root_access: Not Available
❌ debugfs_access: Not Available
```
## Advanced Features
### 1. Syscall Suggestions
The system can suggest appropriate syscalls based on issue descriptions:
```go
suggestions := SuggestSyscallTargets("file not found error")
// Returns: ["test_sys_open", "test_sys_read", "test_sys_write", "test_sys_unlink"]
```
### 2. BCC-Style Parsing
Parse BCC trace.py style specifications:
```go
parser := NewTraceSpecParser()
spec, err := parser.ParseFromBCCStyle("sys_write (arg1 == 1) \"stdout: %d bytes\", arg3")
```
### 3. Event Filtering and Aggregation
Post-processing capabilities for trace events:
```go
filter := &TraceEventFilter{
ProcessNames: []string{"nginx", "apache"},
MinTimestamp: startTime,
}
filteredEvents := filter.ApplyFilter(events)
aggregator := NewTraceEventAggregator(events)
topProcesses := aggregator.GetTopProcesses(5)
eventRate := aggregator.GetEventRate()
```
## Performance Considerations
- **Short durations**: Test specs use 5-second durations for quick testing
- **Efficient parsing**: Event scanner processes bpftrace output in real-time
- **Memory management**: Events are processed and aggregated efficiently
- **Timeout handling**: Automatic cleanup of hanging trace sessions
## Security Considerations
- **Root privileges required**: eBPF tracing requires root access
- **Resource limits**: Maximum trace duration of 10 minutes
- **Process isolation**: Each trace runs in its own context
- **Automatic cleanup**: Traces are automatically stopped and cleaned up
## Future Enhancements
1. **USDT probe support**: Add support for user-space tracing
2. **BTF integration**: Use BPF Type Format for better type information
3. **Flame graph generation**: Generate performance flame graphs
4. **Custom eBPF programs**: Allow uploading custom eBPF bytecode
5. **Distributed tracing**: Correlation across multiple hosts
This implementation provides a solid foundation for advanced system introspection and debugging, bringing the power of BCC-style tracing to the diagnostic agent.