Files
nannyagent/docs/EBPF_README.md
Harshavardhan Musanalli b15ae9b4a9 Remaining things
2025-10-22 10:12:28 +02:00

234 lines
6.8 KiB
Markdown

# eBPF Integration for Linux Diagnostic Agent
The Linux Diagnostic Agent now includes comprehensive eBPF (Extended Berkeley Packet Filter) capabilities for advanced system monitoring and investigation during diagnostic sessions.
## eBPF Capabilities
### Available Monitoring Types
1. **System Call Tracing** (`syscall_trace`)
- Monitors all system calls made by processes
- Useful for debugging process behavior and API usage
- Can filter by process ID or name
2. **Network Activity Tracing** (`network_trace`)
- Tracks TCP/UDP send/receive operations
- Monitors network connections and data flow
- Identifies network-related bottlenecks
3. **Process Monitoring** (`process_trace`)
- Tracks process creation, execution, and termination
- Monitors process lifecycle events
- Useful for debugging startup issues
4. **File System Monitoring** (`file_trace`)
- Monitors file open, create, delete operations
- Tracks file access patterns
- Can filter by specific paths
5. **Performance Monitoring** (`performance`)
- Collects CPU, memory, and I/O metrics
- Provides detailed performance profiling
- Uses perf integration when available
6. **Security Event Monitoring** (`security_event`)
- Detects privilege escalation attempts
- Monitors security-relevant system calls
- Tracks suspicious activities
## How eBPF Integration Works
### AI-Driven eBPF Selection
The AI agent can automatically request eBPF monitoring by including specific fields in its diagnostic response:
```json
{
"response_type": "diagnostic",
"reasoning": "Need to trace network activity to diagnose connection timeout issues",
"commands": [
{"id": "basic_net", "command": "ss -tulpn", "description": "Current network connections"},
{"id": "net_config", "command": "ip route show", "description": "Network configuration"}
],
"ebpf_capabilities": ["network_trace", "syscall_trace"],
"ebpf_duration_seconds": 15,
"ebpf_filters": {
"comm": "nginx",
"path": "/etc"
}
}
```
### eBPF Trace Execution
1. eBPF traces run in parallel with regular diagnostic commands
2. Multiple eBPF capabilities can be activated simultaneously
3. Traces collect structured JSON events in real-time
4. Results are automatically parsed and included in the diagnostic data
### Event Data Structure
eBPF events follow a consistent structure:
```json
{
"timestamp": 1634567890000000000,
"event_type": "syscall_enter",
"process_id": 1234,
"process_name": "nginx",
"user_id": 1000,
"data": {
"syscall": "openat",
"filename": "/etc/nginx/nginx.conf"
}
}
```
## Installation and Setup
### Prerequisites
The agent automatically detects available eBPF tools and capabilities. For full functionality, install:
**Ubuntu/Debian:**
```bash
sudo apt update
sudo apt install bpftrace linux-tools-generic linux-tools-$(uname -r)
sudo apt install bcc-tools python3-bcc # Optional, for additional tools
```
**RHEL/CentOS/Fedora:**
```bash
sudo dnf install bpftrace perf bcc-tools python3-bcc
```
**openSUSE:**
```bash
sudo zypper install bpftrace perf
```
### Automated Setup
Use the included helper script:
```bash
# Check current eBPF capabilities
./ebpf_helper.sh check
# Install eBPF tools (requires root)
sudo ./ebpf_helper.sh install
# Create monitoring scripts
./ebpf_helper.sh setup
# Test eBPF functionality
sudo ./ebpf_helper.sh test
```
## Usage Examples
### Network Issue Diagnosis
When describing network problems, the AI may automatically request network tracing:
```
User: "Web server is experiencing intermittent connection timeouts"
AI Response: Includes network_trace and syscall_trace capabilities
eBPF Output: Real-time network send/receive events, connection attempts, and related system calls
```
### Performance Issue Investigation
For performance problems, the AI can request comprehensive monitoring:
```
User: "System is running slowly, high CPU usage"
AI Response: Includes process_trace, performance, and syscall_trace
eBPF Output: Process execution patterns, performance metrics, and system call analysis
```
### Security Incident Analysis
For security concerns, specialized monitoring is available:
```
User: "Suspicious activity detected, possible privilege escalation"
AI Response: Includes security_event, process_trace, and file_trace
eBPF Output: Security-relevant events, process behavior, and file access patterns
```
## Filtering Options
eBPF traces can be filtered for focused monitoring:
- **Process ID**: `{"pid": "1234"}` - Monitor specific process
- **Process Name**: `{"comm": "nginx"}` - Monitor processes by name
- **File Path**: `{"path": "/etc"}` - Monitor specific path (file tracing)
## Integration with Existing Workflow
eBPF monitoring integrates seamlessly with the existing diagnostic workflow:
1. **Automatic Detection**: Agent detects available eBPF capabilities at startup
2. **AI Decision Making**: AI decides when eBPF monitoring would be helpful
3. **Parallel Execution**: eBPF traces run alongside regular diagnostic commands
4. **Structured Results**: eBPF data is included in command results for AI analysis
5. **Contextual Analysis**: AI correlates eBPF events with other diagnostic data
## Troubleshooting
### Common Issues
**Permission Errors:**
- Most eBPF operations require root privileges
- Run the agent with `sudo` for full eBPF functionality
**Tool Not Available:**
- Use `./ebpf_helper.sh check` to verify available tools
- Install missing tools with `./ebpf_helper.sh install`
**Kernel Compatibility:**
- eBPF requires Linux kernel 4.4+ (5.0+ recommended)
- Some features may require newer kernel versions
**Debugging eBPF Issues:**
```bash
# Check kernel eBPF support
sudo ./ebpf_helper.sh check
# Test basic eBPF functionality
sudo bpftrace -e 'BEGIN { print("eBPF works!"); exit(); }'
# Verify debugfs mount (required for ftrace)
sudo mount -t debugfs none /sys/kernel/debug
```
## Security Considerations
- eBPF monitoring provides deep system visibility
- Traces may contain sensitive information (file paths, process arguments)
- Traces are stored temporarily in `/tmp/nannyagent/ebpf/`
- Old traces are automatically cleaned up after 1 hour
- Consider the security implications of detailed system monitoring
## Performance Impact
- eBPF monitoring has minimal performance overhead
- Traces are time-limited (typically 10-30 seconds)
- Event collection is optimized for efficiency
- Heavy tracing may impact system performance on resource-constrained systems
## Contributing
To add new eBPF capabilities:
1. Extend the `EBPFCapability` enum in `ebpf_manager.go`
2. Add detection logic in `detectCapabilities()`
3. Implement trace command generation in `buildXXXTraceCommand()`
4. Update capability descriptions in `FormatSystemInfoWithEBPFForPrompt()`
The eBPF integration is designed to be extensible and can accommodate additional monitoring capabilities as needed.