Files
nannyagent/docs/EBPF_README.md
harsha f69e1dbc66 add-bpf-capability (#1)
1) add-bpf-capability
2) Not so clean but for now it's okay to start with

Co-authored-by: Harshavardhan Musanalli <harshavmb@gmail.com>
Reviewed-on: #1
2025-10-22 08:16:40 +00:00

6.8 KiB

eBPF Integration for Linux Diagnostic Agent

The Linux Diagnostic Agent now includes comprehensive eBPF (Extended Berkeley Packet Filter) capabilities for advanced system monitoring and investigation during diagnostic sessions.

eBPF Capabilities

Available Monitoring Types

  1. System Call Tracing (syscall_trace)

    • Monitors all system calls made by processes
    • Useful for debugging process behavior and API usage
    • Can filter by process ID or name
  2. Network Activity Tracing (network_trace)

    • Tracks TCP/UDP send/receive operations
    • Monitors network connections and data flow
    • Identifies network-related bottlenecks
  3. Process Monitoring (process_trace)

    • Tracks process creation, execution, and termination
    • Monitors process lifecycle events
    • Useful for debugging startup issues
  4. File System Monitoring (file_trace)

    • Monitors file open, create, delete operations
    • Tracks file access patterns
    • Can filter by specific paths
  5. Performance Monitoring (performance)

    • Collects CPU, memory, and I/O metrics
    • Provides detailed performance profiling
    • Uses perf integration when available
  6. Security Event Monitoring (security_event)

    • Detects privilege escalation attempts
    • Monitors security-relevant system calls
    • Tracks suspicious activities

How eBPF Integration Works

AI-Driven eBPF Selection

The AI agent can automatically request eBPF monitoring by including specific fields in its diagnostic response:

{
  "response_type": "diagnostic",
  "reasoning": "Need to trace network activity to diagnose connection timeout issues",
  "commands": [
    {"id": "basic_net", "command": "ss -tulpn", "description": "Current network connections"},
    {"id": "net_config", "command": "ip route show", "description": "Network configuration"}
  ],
  "ebpf_capabilities": ["network_trace", "syscall_trace"],
  "ebpf_duration_seconds": 15,
  "ebpf_filters": {
    "comm": "nginx",
    "path": "/etc"
  }
}

eBPF Trace Execution

  1. eBPF traces run in parallel with regular diagnostic commands
  2. Multiple eBPF capabilities can be activated simultaneously
  3. Traces collect structured JSON events in real-time
  4. Results are automatically parsed and included in the diagnostic data

Event Data Structure

eBPF events follow a consistent structure:

{
  "timestamp": 1634567890000000000,
  "event_type": "syscall_enter",
  "process_id": 1234,
  "process_name": "nginx",
  "user_id": 1000,
  "data": {
    "syscall": "openat",
    "filename": "/etc/nginx/nginx.conf"
  }
}

Installation and Setup

Prerequisites

The agent automatically detects available eBPF tools and capabilities. For full functionality, install:

Ubuntu/Debian:

sudo apt update
sudo apt install bpftrace linux-tools-generic linux-tools-$(uname -r)
sudo apt install bcc-tools python3-bcc  # Optional, for additional tools

RHEL/CentOS/Fedora:

sudo dnf install bpftrace perf bcc-tools python3-bcc

openSUSE:

sudo zypper install bpftrace perf

Automated Setup

Use the included helper script:

# Check current eBPF capabilities
./ebpf_helper.sh check

# Install eBPF tools (requires root)
sudo ./ebpf_helper.sh install

# Create monitoring scripts
./ebpf_helper.sh setup

# Test eBPF functionality
sudo ./ebpf_helper.sh test

Usage Examples

Network Issue Diagnosis

When describing network problems, the AI may automatically request network tracing:

User: "Web server is experiencing intermittent connection timeouts"

AI Response: Includes network_trace and syscall_trace capabilities
eBPF Output: Real-time network send/receive events, connection attempts, and related system calls

Performance Issue Investigation

For performance problems, the AI can request comprehensive monitoring:

User: "System is running slowly, high CPU usage"

AI Response: Includes process_trace, performance, and syscall_trace
eBPF Output: Process execution patterns, performance metrics, and system call analysis

Security Incident Analysis

For security concerns, specialized monitoring is available:

User: "Suspicious activity detected, possible privilege escalation"

AI Response: Includes security_event, process_trace, and file_trace
eBPF Output: Security-relevant events, process behavior, and file access patterns

Filtering Options

eBPF traces can be filtered for focused monitoring:

  • Process ID: {"pid": "1234"} - Monitor specific process
  • Process Name: {"comm": "nginx"} - Monitor processes by name
  • File Path: {"path": "/etc"} - Monitor specific path (file tracing)

Integration with Existing Workflow

eBPF monitoring integrates seamlessly with the existing diagnostic workflow:

  1. Automatic Detection: Agent detects available eBPF capabilities at startup
  2. AI Decision Making: AI decides when eBPF monitoring would be helpful
  3. Parallel Execution: eBPF traces run alongside regular diagnostic commands
  4. Structured Results: eBPF data is included in command results for AI analysis
  5. Contextual Analysis: AI correlates eBPF events with other diagnostic data

Troubleshooting

Common Issues

Permission Errors:

  • Most eBPF operations require root privileges
  • Run the agent with sudo for full eBPF functionality

Tool Not Available:

  • Use ./ebpf_helper.sh check to verify available tools
  • Install missing tools with ./ebpf_helper.sh install

Kernel Compatibility:

  • eBPF requires Linux kernel 4.4+ (5.0+ recommended)
  • Some features may require newer kernel versions

Debugging eBPF Issues:

# Check kernel eBPF support
sudo ./ebpf_helper.sh check

# Test basic eBPF functionality  
sudo bpftrace -e 'BEGIN { print("eBPF works!"); exit(); }'

# Verify debugfs mount (required for ftrace)
sudo mount -t debugfs none /sys/kernel/debug

Security Considerations

  • eBPF monitoring provides deep system visibility
  • Traces may contain sensitive information (file paths, process arguments)
  • Traces are stored temporarily in /tmp/nannyagent/ebpf/
  • Old traces are automatically cleaned up after 1 hour
  • Consider the security implications of detailed system monitoring

Performance Impact

  • eBPF monitoring has minimal performance overhead
  • Traces are time-limited (typically 10-30 seconds)
  • Event collection is optimized for efficiency
  • Heavy tracing may impact system performance on resource-constrained systems

Contributing

To add new eBPF capabilities:

  1. Extend the EBPFCapability enum in ebpf_manager.go
  2. Add detection logic in detectCapabilities()
  3. Implement trace command generation in buildXXXTraceCommand()
  4. Update capability descriptions in FormatSystemInfoWithEBPFForPrompt()

The eBPF integration is designed to be extensible and can accommodate additional monitoring capabilities as needed.