192 lines
6.2 KiB
Markdown
192 lines
6.2 KiB
Markdown
# eBPF Integration Summary for TensorZero
|
|
|
|
## 🎯 Overview
|
|
Your Linux diagnostic agent now has advanced eBPF monitoring capabilities integrated with the Cilium eBPF Go library. This enables real-time kernel-level monitoring alongside traditional system commands for unprecedented diagnostic precision.
|
|
|
|
## 🔄 Key Changes from Previous System Prompt
|
|
|
|
### Before (Traditional Commands Only):
|
|
```json
|
|
{
|
|
"response_type": "diagnostic",
|
|
"reasoning": "Need to check network connections",
|
|
"commands": [
|
|
{"id": "net_check", "command": "netstat -tulpn", "description": "Check connections"}
|
|
]
|
|
}
|
|
```
|
|
|
|
### After (eBPF-Enhanced):
|
|
```json
|
|
{
|
|
"response_type": "diagnostic",
|
|
"reasoning": "Network timeout issues require monitoring TCP connections and system calls to identify bottlenecks",
|
|
"commands": [
|
|
{"id": "net_status", "command": "ss -tulpn", "description": "Current network connections"}
|
|
],
|
|
"ebpf_programs": [
|
|
{
|
|
"name": "tcp_connect_monitor",
|
|
"type": "kprobe",
|
|
"target": "tcp_connect",
|
|
"duration": 15,
|
|
"description": "Monitor TCP connection attempts in real-time"
|
|
}
|
|
]
|
|
}
|
|
```
|
|
|
|
## 🔧 TensorZero Configuration Steps
|
|
|
|
### 1. Update System Prompt
|
|
Replace your current system prompt with the content from `TENSORZERO_SYSTEM_PROMPT.md`. Key additions:
|
|
|
|
- **eBPF program request format** in diagnostic responses
|
|
- **Comprehensive eBPF guidelines** for different issue types
|
|
- **Enhanced resolution format** with `ebpf_evidence` field
|
|
- **Specific tracepoint/kprobe recommendations** per issue category
|
|
|
|
### 2. Response Format Changes
|
|
|
|
#### Diagnostic Phase (Enhanced):
|
|
```json
|
|
{
|
|
"response_type": "diagnostic",
|
|
"reasoning": "Analysis explanation...",
|
|
"commands": [...],
|
|
"ebpf_programs": [
|
|
{
|
|
"name": "program_name",
|
|
"type": "tracepoint|kprobe|kretprobe",
|
|
"target": "kernel_function_or_tracepoint",
|
|
"duration": 10-30,
|
|
"filters": {"comm": "process_name", "pid": 1234},
|
|
"description": "Why this monitoring is needed"
|
|
}
|
|
]
|
|
}
|
|
```
|
|
|
|
#### Resolution Phase (Enhanced):
|
|
```json
|
|
{
|
|
"response_type": "resolution",
|
|
"root_cause": "Definitive root cause statement",
|
|
"resolution_plan": "Step-by-step fix plan",
|
|
"confidence": "High|Medium|Low",
|
|
"ebpf_evidence": "Summary of eBPF findings that led to diagnosis"
|
|
}
|
|
```
|
|
|
|
### 3. eBPF Program Categories (AI Guidelines)
|
|
|
|
The system prompt now includes specific eBPF program recommendations:
|
|
|
|
| Issue Type | Recommended eBPF Programs |
|
|
|------------|---------------------------|
|
|
| **Network** | `syscalls/sys_enter_connect`, `kprobe:tcp_connect`, `kprobe:tcp_sendmsg` |
|
|
| **Process** | `syscalls/sys_enter_execve`, `sched/sched_process_exit`, `kprobe:do_fork` |
|
|
| **File I/O** | `syscalls/sys_enter_openat`, `kprobe:vfs_read`, `kprobe:vfs_write` |
|
|
| **Performance** | `syscalls/sys_enter_*`, `kprobe:schedule`, `irq/irq_handler_entry` |
|
|
| **Memory** | `kprobe:__alloc_pages_nodemask`, `kmem/kmalloc` |
|
|
|
|
## 🔍 Data Flow
|
|
|
|
### 1. AI Request → Agent
|
|
```json
|
|
{
|
|
"ebpf_programs": [
|
|
{"name": "tcp_monitor", "type": "kprobe", "target": "tcp_connect", "duration": 15}
|
|
]
|
|
}
|
|
```
|
|
|
|
### 2. Agent → eBPF Manager
|
|
```go
|
|
programID, err := ebpfManager.StartEBPFProgram(ebpfRequest)
|
|
```
|
|
|
|
### 3. eBPF Results → AI
|
|
```json
|
|
{
|
|
"ebpf_results": {
|
|
"tcp_monitor_1695902400": {
|
|
"program_name": "tcp_monitor",
|
|
"event_count": 42,
|
|
"events": [
|
|
{
|
|
"timestamp": 1695902400000000000,
|
|
"process_id": 1234,
|
|
"process_name": "curl",
|
|
"event_type": "tcp_connect",
|
|
"data": {"destination": "192.168.1.1:443"}
|
|
}
|
|
],
|
|
"summary": "Captured 42 TCP connection attempts over 15 seconds"
|
|
}
|
|
}
|
|
}
|
|
```
|
|
|
|
## ✅ Validation Checklist
|
|
|
|
Before deploying to TensorZero:
|
|
|
|
- [ ] **System Prompt Updated**: Copy complete content from `TENSORZERO_SYSTEM_PROMPT.md`
|
|
- [ ] **JSON Format Validated**: Ensure AI model can generate structured eBPF requests
|
|
- [ ] **Agent Endpoint**: Verify `NANNYAPI_MODEL` environment variable points to your TensorZero function
|
|
- [ ] **Test Scenarios**: Prepare test cases for network, process, file, and performance issues
|
|
- [ ] **Root Privileges**: Ensure production agent runs with sufficient privileges for eBPF
|
|
|
|
## 🚀 Example Diagnostic Flow
|
|
|
|
### User Issue: "Application randomly fails to connect to database"
|
|
|
|
### AI Response:
|
|
```json
|
|
{
|
|
"response_type": "diagnostic",
|
|
"reasoning": "Database connection failures could be due to network issues, DNS resolution, or connection pool exhaustion. Need to monitor both system-level network activity and application behavior.",
|
|
"commands": [
|
|
{"id": "db_connections", "command": "ss -tlnp | grep :5432", "description": "Check PostgreSQL connections"},
|
|
{"id": "dns_check", "command": "nslookup db.example.com", "description": "Verify DNS resolution"}
|
|
],
|
|
"ebpf_programs": [
|
|
{
|
|
"name": "tcp_connect_db",
|
|
"type": "kprobe",
|
|
"target": "tcp_connect",
|
|
"duration": 20,
|
|
"filters": {"comm": "myapp"},
|
|
"description": "Monitor TCP connection attempts from application"
|
|
},
|
|
{
|
|
"name": "dns_queries",
|
|
"type": "tracepoint",
|
|
"target": "syscalls/sys_enter_connect",
|
|
"duration": 20,
|
|
"description": "Monitor network system calls for DNS resolution"
|
|
}
|
|
]
|
|
}
|
|
```
|
|
|
|
### Agent Execution:
|
|
1. Runs `ss` and `nslookup` commands
|
|
2. Starts eBPF programs to monitor TCP connections and DNS queries
|
|
3. Collects real-time kernel events for 20 seconds
|
|
4. Returns combined traditional + eBPF results to AI
|
|
|
|
### AI Analysis:
|
|
```json
|
|
{
|
|
"response_type": "resolution",
|
|
"root_cause": "Application is experiencing DNS resolution timeouts. eBPF traces show successful TCP connections to IP addresses but failed connections when using hostname.",
|
|
"resolution_plan": "1. Configure application to use IP address directly\n2. Fix DNS timeout: echo 'nameserver 8.8.8.8' >> /etc/resolv.conf\n3. Test connectivity: dig db.example.com",
|
|
"confidence": "High",
|
|
"ebpf_evidence": "eBPF tcp_connect traces show 15 successful connections to 10.0.1.50:5432 but 8 failed connection attempts during DNS lookups. DNS query monitoring revealed 3-5 second delays in resolution."
|
|
}
|
|
```
|
|
|
|
This integration provides your diagnostic agent with professional-grade system monitoring capabilities that were previously only available in dedicated observability tools!
|