Compare commits
8 Commits
main
...
d519bf77e9
| Author | SHA256 | Date | |
|---|---|---|---|
|
|
d519bf77e9 | ||
|
|
c268a3a42e | ||
|
|
794111cb44 | ||
|
|
190e54dd38 | ||
|
|
8328f8d5b3 | ||
|
|
8832450a1f | ||
|
|
0a8b2dc202 | ||
|
|
6fd403cb5f |
8
.gitignore
vendored
8
.gitignore
vendored
@@ -23,6 +23,10 @@ go.work
|
|||||||
go.work.sum
|
go.work.sum
|
||||||
|
|
||||||
# env file
|
# env file
|
||||||
.env
|
.env*
|
||||||
nannyagent*
|
nannyagent*
|
||||||
nanny-agent*
|
nanny-agent*
|
||||||
|
.vscode
|
||||||
|
|
||||||
|
# Build directory
|
||||||
|
build/
|
||||||
|
|||||||
298
BCC_TRACING.md
Normal file
298
BCC_TRACING.md
Normal file
@@ -0,0 +1,298 @@
|
|||||||
|
# BCC-Style eBPF Tracing Implementation
|
||||||
|
|
||||||
|
## Overview
|
||||||
|
|
||||||
|
This implementation adds powerful BCC-style (Berkeley Packet Filter Compiler) tracing capabilities to the diagnostic agent, similar to the `trace.py` tool from the iovisor BCC project. Instead of just filtering events, this system actually counts and traces real system calls with detailed argument parsing.
|
||||||
|
|
||||||
|
## Key Features
|
||||||
|
|
||||||
|
### 1. Real System Call Tracing
|
||||||
|
- **Actual event counting**: Unlike the previous implementation that just simulated events, this captures real system calls
|
||||||
|
- **Argument extraction**: Extracts function arguments (arg1, arg2, etc.) and return values
|
||||||
|
- **Multiple probe types**: Supports kprobes, kretprobes, tracepoints, and uprobes
|
||||||
|
- **Filtering capabilities**: Filter by process name, PID, UID, argument values
|
||||||
|
|
||||||
|
### 2. BCC-Style Syntax
|
||||||
|
Supports familiar BCC trace.py syntax patterns:
|
||||||
|
```bash
|
||||||
|
# Simple syscall tracing
|
||||||
|
"sys_open" # Trace open syscalls
|
||||||
|
"sys_read (arg3 > 1024)" # Trace reads >1024 bytes
|
||||||
|
"r::sys_open" # Return probe on open
|
||||||
|
|
||||||
|
# With format strings
|
||||||
|
"sys_write \"wrote %d bytes\", arg3"
|
||||||
|
"sys_open \"opening %s\", arg2@user"
|
||||||
|
```
|
||||||
|
|
||||||
|
### 3. Comprehensive Event Data
|
||||||
|
Each trace captures:
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"timestamp": 1234567890,
|
||||||
|
"pid": 1234,
|
||||||
|
"tid": 1234,
|
||||||
|
"process_name": "nginx",
|
||||||
|
"function": "__x64_sys_openat",
|
||||||
|
"message": "opening file: /var/log/access.log",
|
||||||
|
"raw_args": {
|
||||||
|
"arg1": "3",
|
||||||
|
"arg2": "/var/log/access.log",
|
||||||
|
"arg3": "577"
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
## Architecture
|
||||||
|
|
||||||
|
### Core Components
|
||||||
|
|
||||||
|
1. **BCCTraceManager** (`ebpf_trace_manager.go`)
|
||||||
|
- Main orchestrator for BCC-style tracing
|
||||||
|
- Generates bpftrace scripts dynamically
|
||||||
|
- Manages trace sessions and event collection
|
||||||
|
|
||||||
|
2. **TraceSpec** - Trace specification format
|
||||||
|
```go
|
||||||
|
type TraceSpec struct {
|
||||||
|
ProbeType string // "p", "r", "t", "u"
|
||||||
|
Target string // Function/syscall to trace
|
||||||
|
Format string // Output format string
|
||||||
|
Arguments []string // Arguments to extract
|
||||||
|
Filter string // Filter conditions
|
||||||
|
Duration int // Trace duration in seconds
|
||||||
|
ProcessName string // Process filter
|
||||||
|
PID int // Process ID filter
|
||||||
|
UID int // User ID filter
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
3. **EventScanner** (`ebpf_event_parser.go`)
|
||||||
|
- Parses bpftrace output in real-time
|
||||||
|
- Converts raw trace data to structured events
|
||||||
|
- Handles argument extraction and enrichment
|
||||||
|
|
||||||
|
4. **TraceSpecBuilder** - Fluent API for building specs
|
||||||
|
```go
|
||||||
|
spec := NewTraceSpecBuilder().
|
||||||
|
Kprobe("__x64_sys_write").
|
||||||
|
Format("write %d bytes to fd %d", "arg3", "arg1").
|
||||||
|
Filter("arg1 == 1").
|
||||||
|
Duration(30).
|
||||||
|
Build()
|
||||||
|
```
|
||||||
|
|
||||||
|
## Usage Examples
|
||||||
|
|
||||||
|
### 1. Basic System Call Tracing
|
||||||
|
|
||||||
|
```go
|
||||||
|
// Trace file open operations
|
||||||
|
spec := TraceSpec{
|
||||||
|
ProbeType: "p",
|
||||||
|
Target: "__x64_sys_openat",
|
||||||
|
Format: "opening file: %s",
|
||||||
|
Arguments: []string{"arg2@user"},
|
||||||
|
Duration: 30,
|
||||||
|
}
|
||||||
|
|
||||||
|
traceID, err := manager.StartTrace(spec)
|
||||||
|
```
|
||||||
|
|
||||||
|
### 2. Filtered Tracing
|
||||||
|
|
||||||
|
```go
|
||||||
|
// Trace only large reads
|
||||||
|
spec := TraceSpec{
|
||||||
|
ProbeType: "p",
|
||||||
|
Target: "__x64_sys_read",
|
||||||
|
Format: "read %d bytes from fd %d",
|
||||||
|
Arguments: []string{"arg3", "arg1"},
|
||||||
|
Filter: "arg3 > 1024",
|
||||||
|
Duration: 30,
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### 3. Process-Specific Tracing
|
||||||
|
|
||||||
|
```go
|
||||||
|
// Trace only nginx processes
|
||||||
|
spec := TraceSpec{
|
||||||
|
ProbeType: "p",
|
||||||
|
Target: "__x64_sys_write",
|
||||||
|
ProcessName: "nginx",
|
||||||
|
Duration: 60,
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### 4. Return Value Tracing
|
||||||
|
|
||||||
|
```go
|
||||||
|
// Trace return values from file operations
|
||||||
|
spec := TraceSpec{
|
||||||
|
ProbeType: "r",
|
||||||
|
Target: "__x64_sys_openat",
|
||||||
|
Format: "open returned: %d",
|
||||||
|
Arguments: []string{"retval"},
|
||||||
|
Duration: 30,
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
## Integration with Agent
|
||||||
|
|
||||||
|
### API Request Format
|
||||||
|
The remote API can send trace specifications in the `ebpf_programs` field:
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"commands": [
|
||||||
|
{"id": "cmd1", "command": "ps aux"}
|
||||||
|
],
|
||||||
|
"ebpf_programs": [
|
||||||
|
{
|
||||||
|
"name": "file_monitoring",
|
||||||
|
"type": "kprobe",
|
||||||
|
"target": "sys_open",
|
||||||
|
"duration": 30,
|
||||||
|
"filters": {"process": "nginx"},
|
||||||
|
"description": "Monitor file access by nginx"
|
||||||
|
}
|
||||||
|
]
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### Agent Response Format
|
||||||
|
The agent returns detailed trace results:
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"name": "__x64_sys_openat",
|
||||||
|
"type": "bcc_trace",
|
||||||
|
"target": "__x64_sys_openat",
|
||||||
|
"duration": 30,
|
||||||
|
"status": "completed",
|
||||||
|
"success": true,
|
||||||
|
"event_count": 45,
|
||||||
|
"events": [
|
||||||
|
{
|
||||||
|
"timestamp": 1234567890,
|
||||||
|
"pid": 1234,
|
||||||
|
"process_name": "nginx",
|
||||||
|
"function": "__x64_sys_openat",
|
||||||
|
"message": "opening file: /var/log/access.log",
|
||||||
|
"raw_args": {"arg1": "3", "arg2": "/var/log/access.log"}
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"statistics": {
|
||||||
|
"total_events": 45,
|
||||||
|
"events_per_second": 1.5,
|
||||||
|
"top_processes": [
|
||||||
|
{"process_name": "nginx", "event_count": 30},
|
||||||
|
{"process_name": "apache", "event_count": 15}
|
||||||
|
]
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
## Test Specifications
|
||||||
|
|
||||||
|
The implementation includes test specifications for unit testing:
|
||||||
|
|
||||||
|
- **test_sys_open**: File open operations
|
||||||
|
- **test_sys_read**: Read operations with filters
|
||||||
|
- **test_sys_write**: Write operations
|
||||||
|
- **test_process_creation**: Process execution
|
||||||
|
- **test_kretprobe**: Return value tracing
|
||||||
|
- **test_with_filter**: Filtered tracing
|
||||||
|
|
||||||
|
## Running Tests
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Run all BCC tracing tests
|
||||||
|
go test -v -run TestBCCTracing
|
||||||
|
|
||||||
|
# Test trace manager capabilities
|
||||||
|
go test -v -run TestTraceManagerCapabilities
|
||||||
|
|
||||||
|
# Test syscall suggestions
|
||||||
|
go test -v -run TestSyscallSuggestions
|
||||||
|
|
||||||
|
# Run all tests
|
||||||
|
go test -v
|
||||||
|
```
|
||||||
|
|
||||||
|
## Requirements
|
||||||
|
|
||||||
|
### System Requirements
|
||||||
|
- **Linux kernel 4.4+** with eBPF support
|
||||||
|
- **bpftrace** installed (`apt install bpftrace`)
|
||||||
|
- **Root privileges** for actual tracing
|
||||||
|
|
||||||
|
### Checking Capabilities
|
||||||
|
The trace manager automatically detects capabilities:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
$ go test -run TestTraceManagerCapabilities
|
||||||
|
🔧 Trace Manager Capabilities:
|
||||||
|
✅ kernel_ebpf: Available
|
||||||
|
✅ bpftrace: Available
|
||||||
|
❌ root_access: Not Available
|
||||||
|
❌ debugfs_access: Not Available
|
||||||
|
```
|
||||||
|
|
||||||
|
## Advanced Features
|
||||||
|
|
||||||
|
### 1. Syscall Suggestions
|
||||||
|
The system can suggest appropriate syscalls based on issue descriptions:
|
||||||
|
|
||||||
|
```go
|
||||||
|
suggestions := SuggestSyscallTargets("file not found error")
|
||||||
|
// Returns: ["test_sys_open", "test_sys_read", "test_sys_write", "test_sys_unlink"]
|
||||||
|
```
|
||||||
|
|
||||||
|
### 2. BCC-Style Parsing
|
||||||
|
Parse BCC trace.py style specifications:
|
||||||
|
|
||||||
|
```go
|
||||||
|
parser := NewTraceSpecParser()
|
||||||
|
spec, err := parser.ParseFromBCCStyle("sys_write (arg1 == 1) \"stdout: %d bytes\", arg3")
|
||||||
|
```
|
||||||
|
|
||||||
|
### 3. Event Filtering and Aggregation
|
||||||
|
Post-processing capabilities for trace events:
|
||||||
|
|
||||||
|
```go
|
||||||
|
filter := &TraceEventFilter{
|
||||||
|
ProcessNames: []string{"nginx", "apache"},
|
||||||
|
MinTimestamp: startTime,
|
||||||
|
}
|
||||||
|
filteredEvents := filter.ApplyFilter(events)
|
||||||
|
|
||||||
|
aggregator := NewTraceEventAggregator(events)
|
||||||
|
topProcesses := aggregator.GetTopProcesses(5)
|
||||||
|
eventRate := aggregator.GetEventRate()
|
||||||
|
```
|
||||||
|
|
||||||
|
## Performance Considerations
|
||||||
|
|
||||||
|
- **Short durations**: Test specs use 5-second durations for quick testing
|
||||||
|
- **Efficient parsing**: Event scanner processes bpftrace output in real-time
|
||||||
|
- **Memory management**: Events are processed and aggregated efficiently
|
||||||
|
- **Timeout handling**: Automatic cleanup of hanging trace sessions
|
||||||
|
|
||||||
|
## Security Considerations
|
||||||
|
|
||||||
|
- **Root privileges required**: eBPF tracing requires root access
|
||||||
|
- **Resource limits**: Maximum trace duration of 10 minutes
|
||||||
|
- **Process isolation**: Each trace runs in its own context
|
||||||
|
- **Automatic cleanup**: Traces are automatically stopped and cleaned up
|
||||||
|
|
||||||
|
## Future Enhancements
|
||||||
|
|
||||||
|
1. **USDT probe support**: Add support for user-space tracing
|
||||||
|
2. **BTF integration**: Use BPF Type Format for better type information
|
||||||
|
3. **Flame graph generation**: Generate performance flame graphs
|
||||||
|
4. **Custom eBPF programs**: Allow uploading custom eBPF bytecode
|
||||||
|
5. **Distributed tracing**: Correlation across multiple hosts
|
||||||
|
|
||||||
|
This implementation provides a solid foundation for advanced system introspection and debugging, bringing the power of BCC-style tracing to the diagnostic agent.
|
||||||
67
Makefile
67
Makefile
@@ -1,16 +1,21 @@
|
|||||||
.PHONY: build run clean test install
|
.PHONY: build run clean test install build-prod build-release install-system fmt lint help
|
||||||
|
|
||||||
|
VERSION := 0.0.1
|
||||||
|
BUILD_DIR := ./build
|
||||||
|
BINARY_NAME := nannyagent
|
||||||
|
|
||||||
# Build the application
|
# Build the application
|
||||||
build:
|
build:
|
||||||
go build -o nanny-agent .
|
go build -o $(BINARY_NAME) .
|
||||||
|
|
||||||
# Run the application
|
# Run the application
|
||||||
run: build
|
run: build
|
||||||
./nanny-agent
|
./$(BINARY_NAME)
|
||||||
|
|
||||||
# Clean build artifacts
|
# Clean build artifacts
|
||||||
clean:
|
clean:
|
||||||
rm -f nanny-agent
|
rm -f $(BINARY_NAME)
|
||||||
|
rm -rf $(BUILD_DIR)
|
||||||
|
|
||||||
# Run tests
|
# Run tests
|
||||||
test:
|
test:
|
||||||
@@ -21,14 +26,34 @@ install:
|
|||||||
go mod tidy
|
go mod tidy
|
||||||
go mod download
|
go mod download
|
||||||
|
|
||||||
# Build for production with optimizations
|
# Build for production with optimizations (current architecture)
|
||||||
build-prod:
|
build-prod:
|
||||||
CGO_ENABLED=0 GOOS=linux go build -a -installsuffix cgo -ldflags '-w -s' -o nanny-agent .
|
CGO_ENABLED=0 GOOS=linux go build -a -installsuffix cgo \
|
||||||
|
-ldflags '-w -s -X main.Version=$(VERSION)' \
|
||||||
|
-o $(BINARY_NAME) .
|
||||||
|
|
||||||
|
# Build release binaries for both architectures
|
||||||
|
build-release: clean
|
||||||
|
@echo "Building release binaries for version $(VERSION)..."
|
||||||
|
@mkdir -p $(BUILD_DIR)
|
||||||
|
@echo "Building for linux/amd64..."
|
||||||
|
@CGO_ENABLED=0 GOOS=linux GOARCH=amd64 go build -a -installsuffix cgo \
|
||||||
|
-ldflags '-w -s -X main.Version=$(VERSION)' \
|
||||||
|
-o $(BUILD_DIR)/$(BINARY_NAME)-linux-amd64 .
|
||||||
|
@echo "Building for linux/arm64..."
|
||||||
|
@CGO_ENABLED=0 GOOS=linux GOARCH=arm64 go build -a -installsuffix cgo \
|
||||||
|
-ldflags '-w -s -X main.Version=$(VERSION)' \
|
||||||
|
-o $(BUILD_DIR)/$(BINARY_NAME)-linux-arm64 .
|
||||||
|
@echo "Generating checksums..."
|
||||||
|
@cd $(BUILD_DIR) && sha256sum $(BINARY_NAME)-linux-amd64 > $(BINARY_NAME)-linux-amd64.sha256
|
||||||
|
@cd $(BUILD_DIR) && sha256sum $(BINARY_NAME)-linux-arm64 > $(BINARY_NAME)-linux-arm64.sha256
|
||||||
|
@echo "Build complete! Artifacts in $(BUILD_DIR)/"
|
||||||
|
@ls -lh $(BUILD_DIR)/
|
||||||
|
|
||||||
# Install system-wide (requires sudo)
|
# Install system-wide (requires sudo)
|
||||||
install-system: build-prod
|
install-system: build-prod
|
||||||
sudo cp nanny-agent /usr/local/bin/
|
sudo cp $(BINARY_NAME) /usr/local/bin/
|
||||||
sudo chmod +x /usr/local/bin/nanny-agent
|
sudo chmod +x /usr/local/bin/$(BINARY_NAME)
|
||||||
|
|
||||||
# Format code
|
# Format code
|
||||||
fmt:
|
fmt:
|
||||||
@@ -40,14 +65,18 @@ lint:
|
|||||||
|
|
||||||
# Show help
|
# Show help
|
||||||
help:
|
help:
|
||||||
@echo "Available commands:"
|
@echo "NannyAgent Makefile - Available commands:"
|
||||||
@echo " build - Build the application"
|
@echo ""
|
||||||
@echo " run - Build and run the application"
|
@echo " make build - Build the application for current platform"
|
||||||
@echo " clean - Clean build artifacts"
|
@echo " make run - Build and run the application"
|
||||||
@echo " test - Run tests"
|
@echo " make clean - Clean build artifacts"
|
||||||
@echo " install - Install dependencies"
|
@echo " make test - Run tests"
|
||||||
@echo " build-prod - Build for production"
|
@echo " make install - Install Go dependencies"
|
||||||
@echo " install-system- Install system-wide (requires sudo)"
|
@echo " make build-prod - Build for production (optimized, current arch)"
|
||||||
@echo " fmt - Format code"
|
@echo " make build-release - Build release binaries for amd64 and arm64"
|
||||||
@echo " lint - Run linter"
|
@echo " make install-system - Install system-wide (requires sudo)"
|
||||||
@echo " help - Show this help"
|
@echo " make fmt - Format code"
|
||||||
|
@echo " make lint - Run linter"
|
||||||
|
@echo " make help - Show this help"
|
||||||
|
@echo ""
|
||||||
|
@echo "Version: $(VERSION)"
|
||||||
|
|||||||
246
README.md
246
README.md
@@ -1,96 +1,135 @@
|
|||||||
# Linux Diagnostic Agent
|
# NannyAgent - Linux Diagnostic Agent
|
||||||
|
|
||||||
A Go-based AI agent that diagnoses Linux system issues using the NannyAPI gateway with OpenAI-compatible SDK.
|
A Go-based AI agent that diagnoses Linux system issues using eBPF-powered deep monitoring and TensorZero AI integration.
|
||||||
|
|
||||||
## Features
|
## Features
|
||||||
|
|
||||||
- Interactive command-line interface for submitting system issues
|
- 🤖 **AI-Powered Diagnostics** - Intelligent issue analysis and resolution planning
|
||||||
- **Automatic system information gathering** - Includes OS, kernel, CPU, memory, network info
|
- 🔍 **eBPF Deep Monitoring** - Real-time kernel-level tracing for network, processes, files, and security events
|
||||||
- **eBPF-powered deep system monitoring** - Advanced tracing for network, processes, files, and security events
|
- 🛡️ **Safe Command Execution** - Validates and executes diagnostic commands with timeouts
|
||||||
- Integrates with NannyAPI using OpenAI-compatible Go SDK
|
- 📊 **Automatic System Information Gathering** - Comprehensive OS, kernel, CPU, memory, and network metrics
|
||||||
- Executes diagnostic commands safely and collects output
|
- 🔄 **WebSocket Integration** - Real-time communication with backend investigation system
|
||||||
- Provides step-by-step resolution plans
|
- 🔐 **OAuth Device Flow Authentication** - Secure agent registration and authentication
|
||||||
- **Comprehensive integration tests** with realistic Linux problem scenarios
|
- ✅ **Comprehensive Integration Tests** - Realistic Linux problem scenarios
|
||||||
|
|
||||||
## Setup
|
## Requirements
|
||||||
|
|
||||||
1. Clone this repository
|
- **Operating System**: Linux only (no containers/LXC support)
|
||||||
2. Copy `.env.example` to `.env` and configure your NannyAPI endpoint:
|
- **Architecture**: amd64 (x86_64) or arm64 (aarch64)
|
||||||
|
- **Kernel Version**: Linux kernel 5.x or higher
|
||||||
|
- **Privileges**: Root/sudo access required for eBPF functionality
|
||||||
|
- **Dependencies**: bpftrace and bpfcc-tools (automatically installed by installer)
|
||||||
|
- **Network**: Connectivity to Supabase backend
|
||||||
|
|
||||||
|
## Quick Installation
|
||||||
|
|
||||||
|
### One-Line Install (Recommended)
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Download and run the installer
|
||||||
|
curl -fsSL https://your-domain.com/install.sh | sudo bash
|
||||||
|
```
|
||||||
|
|
||||||
|
Or download first, then install:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Download the installer
|
||||||
|
wget https://your-domain.com/install.sh
|
||||||
|
|
||||||
|
# Make it executable
|
||||||
|
chmod +x install.sh
|
||||||
|
|
||||||
|
# Run the installer
|
||||||
|
sudo ./install.sh
|
||||||
|
```
|
||||||
|
|
||||||
|
### Manual Installation
|
||||||
|
|
||||||
|
1. Clone this repository:
|
||||||
```bash
|
```bash
|
||||||
cp .env.example .env
|
git clone https://github.com/yourusername/nannyagent.git
|
||||||
|
cd nannyagent
|
||||||
```
|
```
|
||||||
3. Install dependencies:
|
|
||||||
|
2. Run the installer script:
|
||||||
```bash
|
```bash
|
||||||
go mod tidy
|
sudo ./install.sh
|
||||||
```
|
|
||||||
4. Build and run:
|
|
||||||
```bash
|
|
||||||
make build
|
|
||||||
./nanny-agent
|
|
||||||
```
|
```
|
||||||
|
|
||||||
|
The installer will:
|
||||||
|
- ✅ Verify system requirements (OS, architecture, kernel version)
|
||||||
|
- ✅ Check for existing installations
|
||||||
|
- ✅ Install eBPF tools (bpftrace, bpfcc-tools)
|
||||||
|
- ✅ Build the nannyagent binary
|
||||||
|
- ✅ Test connectivity to Supabase
|
||||||
|
- ✅ Install to `/usr/local/bin/nannyagent`
|
||||||
|
- ✅ Create configuration in `/etc/nannyagent/config.env`
|
||||||
|
- ✅ Create secure data directory `/var/lib/nannyagent`
|
||||||
|
|
||||||
## Configuration
|
## Configuration
|
||||||
|
|
||||||
The agent can be configured using environment variables:
|
After installation, configure your Supabase URL:
|
||||||
|
|
||||||
- `NANNYAPI_ENDPOINT`: The NannyAPI endpoint (default: `http://tensorzero.netcup.internal:3000/openai/v1`)
|
```bash
|
||||||
- `NANNYAPI_MODEL`: The model identifier (default: `nannyapi::function_name::diagnose_and_heal`)
|
# Edit the configuration file
|
||||||
|
sudo nano /etc/nannyagent/config.env
|
||||||
|
```
|
||||||
|
|
||||||
## Installation on Linux VM
|
Required configuration:
|
||||||
|
|
||||||
### Direct Installation
|
```bash
|
||||||
|
# Supabase Configuration
|
||||||
|
SUPABASE_PROJECT_URL=https://your-project.supabase.co
|
||||||
|
|
||||||
1. **Install Go** (if not already installed):
|
# Optional Configuration
|
||||||
```bash
|
TOKEN_PATH=/var/lib/nannyagent/token.json
|
||||||
# For Ubuntu/Debian
|
DEBUG=false
|
||||||
sudo apt update
|
```
|
||||||
sudo apt install golang-go
|
|
||||||
|
|
||||||
# For RHEL/CentOS/Fedora
|
## Command-Line Options
|
||||||
sudo dnf install golang
|
|
||||||
# or
|
|
||||||
sudo yum install golang
|
|
||||||
```
|
|
||||||
|
|
||||||
2. **Clone and build the agent**:
|
```bash
|
||||||
```bash
|
# Show version (no sudo required)
|
||||||
git clone <your-repo-url>
|
nannyagent --version
|
||||||
cd nannyagentv2
|
nannyagent -v
|
||||||
go mod tidy
|
|
||||||
make build
|
|
||||||
```
|
|
||||||
|
|
||||||
3. **Install as system service** (optional):
|
# Show help (no sudo required)
|
||||||
```bash
|
nannyagent --help
|
||||||
sudo cp nanny-agent /usr/local/bin/
|
nannyagent -h
|
||||||
sudo chmod +x /usr/local/bin/nanny-agent
|
|
||||||
```
|
|
||||||
|
|
||||||
4. **Set environment variables**:
|
# Run the agent (requires sudo)
|
||||||
```bash
|
sudo nannyagent
|
||||||
export NANNYAPI_ENDPOINT="http://your-nannyapi-endpoint:3000/openai/v1"
|
```
|
||||||
export NANNYAPI_MODEL="your-model-identifier"
|
|
||||||
```
|
|
||||||
|
|
||||||
## Usage
|
## Usage
|
||||||
|
|
||||||
1. Start the agent:
|
1. **First-time Setup** - Authenticate the agent:
|
||||||
```bash
|
```bash
|
||||||
./nanny-agent
|
sudo nannyagent
|
||||||
```
|
```
|
||||||
|
|
||||||
|
The agent will display a verification URL and code. Visit the URL and enter the code to authorize the agent.
|
||||||
|
|
||||||
2. Enter a system issue description when prompted:
|
2. **Interactive Diagnostics** - After authentication, enter system issues:
|
||||||
```
|
```
|
||||||
> On /var filesystem I cannot create any file but df -h shows 30% free space available.
|
> On /var filesystem I cannot create any file but df -h shows 30% free space available.
|
||||||
```
|
```
|
||||||
|
|
||||||
3. The agent will:
|
3. **The agent will**:
|
||||||
- Send the issue to the AI via NannyAPI using OpenAI SDK
|
- Gather comprehensive system information automatically
|
||||||
- Execute diagnostic commands as suggested by the AI
|
- Send the issue to AI for analysis via TensorZero
|
||||||
- Provide command outputs back to the AI
|
- Execute diagnostic commands safely
|
||||||
- Display the final diagnosis and resolution plan
|
- Run eBPF traces for deep kernel-level monitoring
|
||||||
|
- Provide AI-generated root cause analysis and resolution plan
|
||||||
|
|
||||||
4. Type `quit` or `exit` to stop the agent
|
4. **Exit the agent**:
|
||||||
|
```
|
||||||
|
> quit
|
||||||
|
```
|
||||||
|
or
|
||||||
|
```
|
||||||
|
> exit
|
||||||
|
```
|
||||||
|
|
||||||
## How It Works
|
## How It Works
|
||||||
|
|
||||||
@@ -119,14 +158,87 @@ The agent includes comprehensive integration tests that simulate realistic Linux
|
|||||||
|
|
||||||
### Run Integration Tests:
|
### Run Integration Tests:
|
||||||
```bash
|
```bash
|
||||||
# Interactive test scenarios
|
# Run unit tests
|
||||||
./test-examples.sh
|
make test
|
||||||
|
|
||||||
# Automated integration tests
|
# Run integration tests
|
||||||
./integration-tests.sh
|
./tests/test_ebpf_integration.sh
|
||||||
|
```
|
||||||
|
|
||||||
# Function discovery (find valid NannyAPI functions)
|
## Installation Exit Codes
|
||||||
./discover-functions.sh
|
|
||||||
|
The installer uses specific exit codes for different failure scenarios:
|
||||||
|
|
||||||
|
| Exit Code | Description |
|
||||||
|
|-----------|-------------|
|
||||||
|
| 0 | Success |
|
||||||
|
| 1 | Not running as root |
|
||||||
|
| 2 | Unsupported operating system (non-Linux) |
|
||||||
|
| 3 | Unsupported architecture (not amd64/arm64) |
|
||||||
|
| 4 | Container/LXC environment detected |
|
||||||
|
| 5 | Kernel version < 5.x |
|
||||||
|
| 6 | Existing installation detected |
|
||||||
|
| 7 | eBPF tools installation failed |
|
||||||
|
| 8 | Go not installed |
|
||||||
|
| 9 | Binary build failed |
|
||||||
|
| 10 | Directory creation failed |
|
||||||
|
| 11 | Binary installation failed |
|
||||||
|
|
||||||
|
## Troubleshooting
|
||||||
|
|
||||||
|
### Installation Issues
|
||||||
|
|
||||||
|
**Error: "Kernel version X.X is not supported"**
|
||||||
|
- NannyAgent requires Linux kernel 5.x or higher
|
||||||
|
- Upgrade your kernel or use a different system
|
||||||
|
|
||||||
|
**Error: "Another instance may already be installed"**
|
||||||
|
- Check if `/var/lib/nannyagent` exists
|
||||||
|
- Remove it if you're sure: `sudo rm -rf /var/lib/nannyagent`
|
||||||
|
- Then retry installation
|
||||||
|
|
||||||
|
**Warning: "Cannot connect to Supabase"**
|
||||||
|
- Check your network connectivity
|
||||||
|
- Verify firewall settings allow HTTPS connections
|
||||||
|
- Ensure SUPABASE_PROJECT_URL is correctly configured in `/etc/nannyagent/config.env`
|
||||||
|
|
||||||
|
### Runtime Issues
|
||||||
|
|
||||||
|
**Error: "This program must be run as root"**
|
||||||
|
- eBPF requires root privileges
|
||||||
|
- Always run with: `sudo nannyagent`
|
||||||
|
|
||||||
|
**Error: "Cannot determine kernel version"**
|
||||||
|
- Ensure `uname` command is available
|
||||||
|
- Check system integrity
|
||||||
|
|
||||||
|
## Development
|
||||||
|
|
||||||
|
### Building from Source
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Clone repository
|
||||||
|
git clone https://github.com/yourusername/nannyagent.git
|
||||||
|
cd nannyagent
|
||||||
|
|
||||||
|
# Install Go dependencies
|
||||||
|
go mod tidy
|
||||||
|
|
||||||
|
# Build binary
|
||||||
|
make build
|
||||||
|
|
||||||
|
# Run locally (requires sudo)
|
||||||
|
sudo ./nannyagent
|
||||||
|
```
|
||||||
|
|
||||||
|
### Running Tests
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Run unit tests
|
||||||
|
make test
|
||||||
|
|
||||||
|
# Test eBPF capabilities
|
||||||
|
./tests/test_ebpf_integration.sh
|
||||||
```
|
```
|
||||||
|
|
||||||
## Safety
|
## Safety
|
||||||
|
|||||||
480
agent.go
480
agent.go
@@ -2,99 +2,113 @@ package main
|
|||||||
|
|
||||||
import (
|
import (
|
||||||
"bytes"
|
"bytes"
|
||||||
"context"
|
|
||||||
"encoding/json"
|
"encoding/json"
|
||||||
"fmt"
|
"fmt"
|
||||||
"io"
|
"io"
|
||||||
"net/http"
|
"net/http"
|
||||||
"os"
|
"os"
|
||||||
|
"strings"
|
||||||
"time"
|
"time"
|
||||||
|
|
||||||
|
"nannyagentv2/internal/ebpf"
|
||||||
|
"nannyagentv2/internal/executor"
|
||||||
|
"nannyagentv2/internal/logging"
|
||||||
|
"nannyagentv2/internal/system"
|
||||||
|
"nannyagentv2/internal/types"
|
||||||
|
|
||||||
"github.com/sashabaranov/go-openai"
|
"github.com/sashabaranov/go-openai"
|
||||||
)
|
)
|
||||||
|
|
||||||
// DiagnosticResponse represents the diagnostic phase response from AI
|
// AgentConfig holds configuration for concurrent execution (local to agent)
|
||||||
type DiagnosticResponse struct {
|
type AgentConfig struct {
|
||||||
ResponseType string `json:"response_type"`
|
MaxConcurrentTasks int `json:"max_concurrent_tasks"`
|
||||||
Reasoning string `json:"reasoning"`
|
CollectiveResults bool `json:"collective_results"`
|
||||||
Commands []Command `json:"commands"`
|
|
||||||
}
|
}
|
||||||
|
|
||||||
// ResolutionResponse represents the resolution phase response from AI
|
// DefaultAgentConfig returns default configuration
|
||||||
type ResolutionResponse struct {
|
func DefaultAgentConfig() *AgentConfig {
|
||||||
ResponseType string `json:"response_type"`
|
return &AgentConfig{
|
||||||
RootCause string `json:"root_cause"`
|
MaxConcurrentTasks: 10, // Default to 10 concurrent forks
|
||||||
ResolutionPlan string `json:"resolution_plan"`
|
CollectiveResults: true, // Send results collectively when all finish
|
||||||
Confidence string `json:"confidence"`
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
// Command represents a command to be executed
|
//
|
||||||
type Command struct {
|
// LinuxDiagnosticAgent represents the main diagnostic agent
|
||||||
ID string `json:"id"`
|
|
||||||
Command string `json:"command"`
|
|
||||||
Description string `json:"description"`
|
|
||||||
}
|
|
||||||
|
|
||||||
// CommandResult represents the result of executing a command
|
// LinuxDiagnosticAgent represents the main diagnostic agent
|
||||||
type CommandResult struct {
|
|
||||||
ID string `json:"id"`
|
|
||||||
Command string `json:"command"`
|
|
||||||
Output string `json:"output"`
|
|
||||||
ExitCode int `json:"exit_code"`
|
|
||||||
Error string `json:"error,omitempty"`
|
|
||||||
}
|
|
||||||
|
|
||||||
// LinuxDiagnosticAgent represents the main agent
|
|
||||||
type LinuxDiagnosticAgent struct {
|
type LinuxDiagnosticAgent struct {
|
||||||
client *openai.Client
|
client *openai.Client
|
||||||
model string
|
model string
|
||||||
executor *CommandExecutor
|
executor *executor.CommandExecutor
|
||||||
episodeID string // TensorZero episode ID for conversation continuity
|
episodeID string // TensorZero episode ID for conversation continuity
|
||||||
ebpfManager EBPFManagerInterface // eBPF monitoring capabilities
|
ebpfManager *ebpf.BCCTraceManager // eBPF tracing manager
|
||||||
|
config *AgentConfig // Configuration for concurrent execution
|
||||||
|
authManager interface{} // Authentication manager for TensorZero requests
|
||||||
|
logger *logging.Logger
|
||||||
}
|
}
|
||||||
|
|
||||||
// NewLinuxDiagnosticAgent creates a new diagnostic agent
|
// NewLinuxDiagnosticAgent creates a new diagnostic agent
|
||||||
func NewLinuxDiagnosticAgent() *LinuxDiagnosticAgent {
|
func NewLinuxDiagnosticAgent() *LinuxDiagnosticAgent {
|
||||||
endpoint := os.Getenv("NANNYAPI_ENDPOINT")
|
// Get Supabase project URL for TensorZero proxy
|
||||||
if endpoint == "" {
|
supabaseURL := os.Getenv("SUPABASE_PROJECT_URL")
|
||||||
// Default endpoint - OpenAI SDK will append /chat/completions automatically
|
if supabaseURL == "" {
|
||||||
endpoint = "http://tensorzero.netcup.internal:3000/openai/v1"
|
logging.Warning("SUPABASE_PROJECT_URL not set, TensorZero integration will not work")
|
||||||
}
|
}
|
||||||
|
|
||||||
model := os.Getenv("NANNYAPI_MODEL")
|
// Default model for diagnostic and healing
|
||||||
if model == "" {
|
model := "tensorzero::function_name::diagnose_and_heal"
|
||||||
model = "tensorzero::function_name::diagnose_and_heal"
|
|
||||||
fmt.Printf("Warning: Using default model '%s'. Set NANNYAPI_MODEL environment variable for your specific function.\n", model)
|
|
||||||
}
|
|
||||||
|
|
||||||
// Create OpenAI client with custom base URL
|
|
||||||
// Note: The OpenAI SDK automatically appends "/chat/completions" to the base URL
|
|
||||||
config := openai.DefaultConfig("")
|
|
||||||
config.BaseURL = endpoint
|
|
||||||
client := openai.NewClientWithConfig(config)
|
|
||||||
|
|
||||||
agent := &LinuxDiagnosticAgent{
|
agent := &LinuxDiagnosticAgent{
|
||||||
client: client,
|
client: nil, // Not used - we use direct HTTP to Supabase proxy
|
||||||
model: model,
|
model: model,
|
||||||
executor: NewCommandExecutor(10 * time.Second), // 10 second timeout for commands
|
executor: executor.NewCommandExecutor(10 * time.Second), // 10 second timeout for commands
|
||||||
|
config: DefaultAgentConfig(), // Default concurrent execution config
|
||||||
}
|
}
|
||||||
|
|
||||||
// Initialize eBPF capabilities
|
// Initialize eBPF manager
|
||||||
agent.ebpfManager = NewCiliumEBPFManager()
|
agent.ebpfManager = ebpf.NewBCCTraceManager()
|
||||||
|
agent.logger = logging.NewLogger()
|
||||||
|
|
||||||
|
return agent
|
||||||
|
}
|
||||||
|
|
||||||
|
// NewLinuxDiagnosticAgentWithAuth creates a new diagnostic agent with authentication
|
||||||
|
func NewLinuxDiagnosticAgentWithAuth(authManager interface{}) *LinuxDiagnosticAgent {
|
||||||
|
// Get Supabase project URL for TensorZero proxy
|
||||||
|
supabaseURL := os.Getenv("SUPABASE_PROJECT_URL")
|
||||||
|
if supabaseURL == "" {
|
||||||
|
logging.Warning("SUPABASE_PROJECT_URL not set, TensorZero integration will not work")
|
||||||
|
}
|
||||||
|
|
||||||
|
// Default model for diagnostic and healing
|
||||||
|
model := "tensorzero::function_name::diagnose_and_heal"
|
||||||
|
|
||||||
|
agent := &LinuxDiagnosticAgent{
|
||||||
|
client: nil, // Not used - we use direct HTTP to Supabase proxy
|
||||||
|
model: model,
|
||||||
|
executor: executor.NewCommandExecutor(10 * time.Second), // 10 second timeout for commands
|
||||||
|
config: DefaultAgentConfig(), // Default concurrent execution config
|
||||||
|
authManager: authManager, // Store auth manager for TensorZero requests
|
||||||
|
}
|
||||||
|
|
||||||
|
// Initialize eBPF manager
|
||||||
|
agent.ebpfManager = ebpf.NewBCCTraceManager()
|
||||||
|
agent.logger = logging.NewLogger()
|
||||||
|
|
||||||
return agent
|
return agent
|
||||||
}
|
}
|
||||||
|
|
||||||
// DiagnoseIssue starts the diagnostic process for a given issue
|
// DiagnoseIssue starts the diagnostic process for a given issue
|
||||||
func (a *LinuxDiagnosticAgent) DiagnoseIssue(issue string) error {
|
func (a *LinuxDiagnosticAgent) DiagnoseIssue(issue string) error {
|
||||||
fmt.Printf("Diagnosing issue: %s\n", issue)
|
logging.Info("Diagnosing issue: %s", issue)
|
||||||
fmt.Println("Gathering system information...")
|
logging.Info("Gathering system information...")
|
||||||
|
|
||||||
// Gather system information
|
// Gather system information
|
||||||
systemInfo := GatherSystemInfo()
|
systemInfo := system.GatherSystemInfo()
|
||||||
|
|
||||||
// Format the initial prompt with system information
|
// Format the initial prompt with system information
|
||||||
initialPrompt := FormatSystemInfoForPrompt(systemInfo) + "\n" + issue
|
initialPrompt := system.FormatSystemInfoForPrompt(systemInfo) + "\n" + issue
|
||||||
|
|
||||||
// Start conversation with initial issue including system info
|
// Start conversation with initial issue including system info
|
||||||
messages := []openai.ChatCompletionMessage{
|
messages := []openai.ChatCompletionMessage{
|
||||||
@@ -106,7 +120,7 @@ func (a *LinuxDiagnosticAgent) DiagnoseIssue(issue string) error {
|
|||||||
|
|
||||||
for {
|
for {
|
||||||
// Send request to TensorZero API via OpenAI SDK
|
// Send request to TensorZero API via OpenAI SDK
|
||||||
response, err := a.sendRequest(messages)
|
response, err := a.SendRequestWithEpisode(messages, a.episodeID)
|
||||||
if err != nil {
|
if err != nil {
|
||||||
return fmt.Errorf("failed to send request: %w", err)
|
return fmt.Errorf("failed to send request: %w", err)
|
||||||
}
|
}
|
||||||
@@ -116,37 +130,80 @@ func (a *LinuxDiagnosticAgent) DiagnoseIssue(issue string) error {
|
|||||||
}
|
}
|
||||||
|
|
||||||
content := response.Choices[0].Message.Content
|
content := response.Choices[0].Message.Content
|
||||||
fmt.Printf("\nAI Response:\n%s\n", content)
|
logging.Debug("AI Response: %s", content)
|
||||||
|
|
||||||
// Parse the response to determine next action
|
// Parse the response to determine next action
|
||||||
var diagnosticResp DiagnosticResponse
|
var diagnosticResp types.EBPFEnhancedDiagnosticResponse
|
||||||
var resolutionResp ResolutionResponse
|
var resolutionResp types.ResolutionResponse
|
||||||
|
|
||||||
// Try to parse as diagnostic response first
|
// Try to parse as diagnostic response first (with eBPF support)
|
||||||
|
logging.Debug("Attempting to parse response as diagnostic...")
|
||||||
if err := json.Unmarshal([]byte(content), &diagnosticResp); err == nil && diagnosticResp.ResponseType == "diagnostic" {
|
if err := json.Unmarshal([]byte(content), &diagnosticResp); err == nil && diagnosticResp.ResponseType == "diagnostic" {
|
||||||
|
logging.Debug("Successfully parsed as diagnostic response with %d commands", len(diagnosticResp.Commands))
|
||||||
// Handle diagnostic phase
|
// Handle diagnostic phase
|
||||||
fmt.Printf("\nReasoning: %s\n", diagnosticResp.Reasoning)
|
logging.Debug("Reasoning: %s", diagnosticResp.Reasoning)
|
||||||
|
|
||||||
if len(diagnosticResp.Commands) == 0 {
|
|
||||||
fmt.Println("No commands to execute in diagnostic phase")
|
|
||||||
break
|
|
||||||
}
|
|
||||||
|
|
||||||
// Execute commands and collect results
|
// Execute commands and collect results
|
||||||
commandResults := make([]CommandResult, 0, len(diagnosticResp.Commands))
|
commandResults := make([]types.CommandResult, 0, len(diagnosticResp.Commands))
|
||||||
for _, cmd := range diagnosticResp.Commands {
|
if len(diagnosticResp.Commands) > 0 {
|
||||||
fmt.Printf("\nExecuting command '%s': %s\n", cmd.ID, cmd.Command)
|
logging.Info("Executing %d diagnostic commands", len(diagnosticResp.Commands))
|
||||||
result := a.executor.Execute(cmd)
|
for i, cmdStr := range diagnosticResp.Commands {
|
||||||
commandResults = append(commandResults, result)
|
// Convert string command to Command struct (auto-generate ID and description)
|
||||||
|
cmd := types.Command{
|
||||||
|
ID: fmt.Sprintf("cmd_%d", i+1),
|
||||||
|
Command: cmdStr,
|
||||||
|
Description: fmt.Sprintf("Diagnostic command: %s", cmdStr),
|
||||||
|
}
|
||||||
|
result := a.executor.Execute(cmd)
|
||||||
|
commandResults = append(commandResults, result)
|
||||||
|
|
||||||
fmt.Printf("Output:\n%s\n", result.Output)
|
if result.ExitCode != 0 {
|
||||||
if result.Error != "" {
|
logging.Warning("Command '%s' failed with exit code %d", cmd.ID, result.ExitCode)
|
||||||
fmt.Printf("Error: %s\n", result.Error)
|
}
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
// Prepare command results as user message
|
// Execute eBPF programs if present - support both old and new formats
|
||||||
resultsJSON, err := json.MarshalIndent(commandResults, "", " ")
|
var ebpfResults []map[string]interface{}
|
||||||
|
if len(diagnosticResp.EBPFPrograms) > 0 {
|
||||||
|
logging.Info("AI requested %d eBPF traces for enhanced diagnostics", len(diagnosticResp.EBPFPrograms))
|
||||||
|
|
||||||
|
// Convert EBPFPrograms to TraceSpecs and execute concurrently using the eBPF service
|
||||||
|
traceSpecs := a.ConvertEBPFProgramsToTraceSpecs(diagnosticResp.EBPFPrograms)
|
||||||
|
ebpfResults = a.ExecuteEBPFTraces(traceSpecs)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Prepare combined results as user message
|
||||||
|
allResults := map[string]interface{}{
|
||||||
|
"command_results": commandResults,
|
||||||
|
"executed_commands": len(commandResults),
|
||||||
|
}
|
||||||
|
|
||||||
|
// Include eBPF results if any were executed
|
||||||
|
if len(ebpfResults) > 0 {
|
||||||
|
allResults["ebpf_results"] = ebpfResults
|
||||||
|
allResults["executed_ebpf_programs"] = len(ebpfResults)
|
||||||
|
|
||||||
|
// Extract evidence summary for TensorZero
|
||||||
|
evidenceSummary := make([]string, 0)
|
||||||
|
for _, result := range ebpfResults {
|
||||||
|
target := result["target"]
|
||||||
|
eventCount := result["event_count"]
|
||||||
|
summary := result["summary"]
|
||||||
|
success := result["success"]
|
||||||
|
|
||||||
|
status := "failed"
|
||||||
|
if success == true {
|
||||||
|
status = "success"
|
||||||
|
}
|
||||||
|
|
||||||
|
summaryStr := fmt.Sprintf("%s: %v events (%s) - %s", target, eventCount, status, summary)
|
||||||
|
evidenceSummary = append(evidenceSummary, summaryStr)
|
||||||
|
}
|
||||||
|
allResults["ebpf_evidence_summary"] = evidenceSummary
|
||||||
|
}
|
||||||
|
|
||||||
|
resultsJSON, err := json.MarshalIndent(allResults, "", " ")
|
||||||
if err != nil {
|
if err != nil {
|
||||||
return fmt.Errorf("failed to marshal command results: %w", err)
|
return fmt.Errorf("failed to marshal command results: %w", err)
|
||||||
}
|
}
|
||||||
@@ -162,87 +219,97 @@ func (a *LinuxDiagnosticAgent) DiagnoseIssue(issue string) error {
|
|||||||
})
|
})
|
||||||
|
|
||||||
continue
|
continue
|
||||||
|
} else {
|
||||||
|
logging.Debug("Failed to parse as diagnostic. Error: %v, ResponseType: '%s'", err, diagnosticResp.ResponseType)
|
||||||
}
|
}
|
||||||
|
|
||||||
// Try to parse as resolution response
|
// Try to parse as resolution response
|
||||||
if err := json.Unmarshal([]byte(content), &resolutionResp); err == nil && resolutionResp.ResponseType == "resolution" {
|
if err := json.Unmarshal([]byte(content), &resolutionResp); err == nil && resolutionResp.ResponseType == "resolution" {
|
||||||
// Handle resolution phase
|
// Handle resolution phase
|
||||||
fmt.Printf("\n=== DIAGNOSIS COMPLETE ===\n")
|
logging.Info("=== DIAGNOSIS COMPLETE ===")
|
||||||
fmt.Printf("Root Cause: %s\n", resolutionResp.RootCause)
|
logging.Info("Root Cause: %s", resolutionResp.RootCause)
|
||||||
fmt.Printf("Resolution Plan: %s\n", resolutionResp.ResolutionPlan)
|
logging.Info("Resolution Plan: %s", resolutionResp.ResolutionPlan)
|
||||||
fmt.Printf("Confidence: %s\n", resolutionResp.Confidence)
|
logging.Info("Confidence: %s", resolutionResp.Confidence)
|
||||||
break
|
break
|
||||||
}
|
}
|
||||||
|
|
||||||
// If we can't parse the response, treat it as an error or unexpected format
|
// If we can't parse the response, treat it as an error or unexpected format
|
||||||
fmt.Printf("Unexpected response format or error from AI:\n%s\n", content)
|
logging.Error("Unexpected response format or error from AI: %s", content)
|
||||||
break
|
break
|
||||||
}
|
}
|
||||||
|
|
||||||
return nil
|
return nil
|
||||||
}
|
}
|
||||||
|
|
||||||
// TensorZeroRequest represents a request structure compatible with TensorZero's episode_id
|
// sendRequest sends a request to TensorZero via Supabase proxy (without episode ID)
|
||||||
type TensorZeroRequest struct {
|
func (a *LinuxDiagnosticAgent) SendRequest(messages []openai.ChatCompletionMessage) (*openai.ChatCompletionResponse, error) {
|
||||||
Model string `json:"model"`
|
return a.SendRequestWithEpisode(messages, "")
|
||||||
Messages []openai.ChatCompletionMessage `json:"messages"`
|
|
||||||
EpisodeID string `json:"tensorzero::episode_id,omitempty"`
|
|
||||||
}
|
}
|
||||||
|
|
||||||
// TensorZeroResponse represents TensorZero's response with episode_id
|
// ExecuteCommand executes a command using the agent's executor
|
||||||
type TensorZeroResponse struct {
|
func (a *LinuxDiagnosticAgent) ExecuteCommand(cmd types.Command) types.CommandResult {
|
||||||
openai.ChatCompletionResponse
|
return a.executor.Execute(cmd)
|
||||||
EpisodeID string `json:"episode_id"`
|
|
||||||
}
|
}
|
||||||
|
|
||||||
// sendRequest sends a request to the TensorZero API with tensorzero::episode_id support
|
// sendRequestWithEpisode sends a request to TensorZero via Supabase proxy with episode ID for conversation continuity
|
||||||
func (a *LinuxDiagnosticAgent) sendRequest(messages []openai.ChatCompletionMessage) (*openai.ChatCompletionResponse, error) {
|
func (a *LinuxDiagnosticAgent) SendRequestWithEpisode(messages []openai.ChatCompletionMessage, episodeID string) (*openai.ChatCompletionResponse, error) {
|
||||||
ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
|
// Convert messages to the expected format
|
||||||
defer cancel()
|
messageMaps := make([]map[string]interface{}, len(messages))
|
||||||
|
for i, msg := range messages {
|
||||||
// Create TensorZero-compatible request
|
messageMaps[i] = map[string]interface{}{
|
||||||
tzRequest := TensorZeroRequest{
|
"role": msg.Role,
|
||||||
Model: a.model,
|
"content": msg.Content,
|
||||||
Messages: messages,
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
// Include tensorzero::episode_id for conversation continuity (if we have one)
|
// Create TensorZero request
|
||||||
if a.episodeID != "" {
|
tzRequest := map[string]interface{}{
|
||||||
tzRequest.EpisodeID = a.episodeID
|
"model": a.model,
|
||||||
|
"messages": messageMaps,
|
||||||
}
|
}
|
||||||
|
|
||||||
fmt.Printf("Debug: Sending request to model: %s", a.model)
|
// Add episode ID if provided
|
||||||
if a.episodeID != "" {
|
if episodeID != "" {
|
||||||
fmt.Printf(" (episode: %s)", a.episodeID)
|
tzRequest["tensorzero::episode_id"] = episodeID
|
||||||
}
|
}
|
||||||
fmt.Println()
|
|
||||||
|
|
||||||
// Marshal the request
|
// Marshal request
|
||||||
requestBody, err := json.Marshal(tzRequest)
|
requestBody, err := json.Marshal(tzRequest)
|
||||||
if err != nil {
|
if err != nil {
|
||||||
return nil, fmt.Errorf("failed to marshal request: %w", err)
|
return nil, fmt.Errorf("failed to marshal request: %w", err)
|
||||||
}
|
}
|
||||||
|
|
||||||
// Create HTTP request
|
// Get Supabase URL
|
||||||
endpoint := os.Getenv("NANNYAPI_ENDPOINT")
|
supabaseURL := os.Getenv("SUPABASE_PROJECT_URL")
|
||||||
if endpoint == "" {
|
if supabaseURL == "" {
|
||||||
endpoint = "http://tensorzero.netcup.internal:3000/openai/v1"
|
return nil, fmt.Errorf("SUPABASE_PROJECT_URL not set")
|
||||||
}
|
}
|
||||||
|
|
||||||
// Ensure the endpoint ends with /chat/completions
|
// Create HTTP request to TensorZero proxy (includes OpenAI-compatible path)
|
||||||
if endpoint[len(endpoint)-1] != '/' {
|
endpoint := fmt.Sprintf("%s/functions/v1/tensorzero-proxy/openai/v1/chat/completions", supabaseURL)
|
||||||
endpoint += "/"
|
logging.Debug("Calling TensorZero proxy at: %s", endpoint)
|
||||||
}
|
req, err := http.NewRequest("POST", endpoint, bytes.NewBuffer(requestBody))
|
||||||
endpoint += "chat/completions"
|
|
||||||
|
|
||||||
req, err := http.NewRequestWithContext(ctx, "POST", endpoint, bytes.NewBuffer(requestBody))
|
|
||||||
if err != nil {
|
if err != nil {
|
||||||
return nil, fmt.Errorf("failed to create request: %w", err)
|
return nil, fmt.Errorf("failed to create request: %w", err)
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// Set headers
|
||||||
req.Header.Set("Content-Type", "application/json")
|
req.Header.Set("Content-Type", "application/json")
|
||||||
|
req.Header.Set("Accept", "application/json")
|
||||||
|
|
||||||
// Make the request
|
// Add authentication if auth manager is available (same pattern as investigation_server.go)
|
||||||
|
if a.authManager != nil {
|
||||||
|
// The authManager should be *auth.AuthManager, so let's use the exact same pattern
|
||||||
|
if authMgr, ok := a.authManager.(interface {
|
||||||
|
LoadToken() (*types.AuthToken, error)
|
||||||
|
}); ok {
|
||||||
|
if authToken, err := authMgr.LoadToken(); err == nil && authToken != nil {
|
||||||
|
req.Header.Set("Authorization", fmt.Sprintf("Bearer %s", authToken.AccessToken))
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// Send request
|
||||||
client := &http.Client{Timeout: 30 * time.Second}
|
client := &http.Client{Timeout: 30 * time.Second}
|
||||||
resp, err := client.Do(req)
|
resp, err := client.Do(req)
|
||||||
if err != nil {
|
if err != nil {
|
||||||
@@ -250,27 +317,174 @@ func (a *LinuxDiagnosticAgent) sendRequest(messages []openai.ChatCompletionMessa
|
|||||||
}
|
}
|
||||||
defer resp.Body.Close()
|
defer resp.Body.Close()
|
||||||
|
|
||||||
// Read response body
|
// Check status code
|
||||||
body, err := io.ReadAll(resp.Body)
|
if resp.StatusCode != 200 {
|
||||||
if err != nil {
|
body, _ := io.ReadAll(resp.Body)
|
||||||
return nil, fmt.Errorf("failed to read response: %w", err)
|
return nil, fmt.Errorf("TensorZero proxy error: %d, body: %s", resp.StatusCode, string(body))
|
||||||
}
|
}
|
||||||
|
|
||||||
if resp.StatusCode != http.StatusOK {
|
// Parse response
|
||||||
return nil, fmt.Errorf("API request failed with status %d: %s", resp.StatusCode, string(body))
|
var tzResponse map[string]interface{}
|
||||||
|
if err := json.NewDecoder(resp.Body).Decode(&tzResponse); err != nil {
|
||||||
|
return nil, fmt.Errorf("failed to decode response: %w", err)
|
||||||
}
|
}
|
||||||
|
|
||||||
// Parse TensorZero response
|
// Convert to OpenAI format for compatibility
|
||||||
var tzResponse TensorZeroResponse
|
choices, ok := tzResponse["choices"].([]interface{})
|
||||||
if err := json.Unmarshal(body, &tzResponse); err != nil {
|
if !ok || len(choices) == 0 {
|
||||||
return nil, fmt.Errorf("failed to unmarshal response: %w", err)
|
return nil, fmt.Errorf("no choices in response")
|
||||||
}
|
}
|
||||||
|
|
||||||
// Extract episode_id from first response
|
// Extract the first choice
|
||||||
if a.episodeID == "" && tzResponse.EpisodeID != "" {
|
firstChoice, ok := choices[0].(map[string]interface{})
|
||||||
a.episodeID = tzResponse.EpisodeID
|
if !ok {
|
||||||
fmt.Printf("Debug: Extracted episode ID: %s\n", a.episodeID)
|
return nil, fmt.Errorf("invalid choice format")
|
||||||
}
|
}
|
||||||
|
|
||||||
return &tzResponse.ChatCompletionResponse, nil
|
message, ok := firstChoice["message"].(map[string]interface{})
|
||||||
|
if !ok {
|
||||||
|
return nil, fmt.Errorf("invalid message format")
|
||||||
|
}
|
||||||
|
|
||||||
|
content, ok := message["content"].(string)
|
||||||
|
if !ok {
|
||||||
|
return nil, fmt.Errorf("invalid content format")
|
||||||
|
}
|
||||||
|
|
||||||
|
// Create OpenAI-compatible response
|
||||||
|
response := &openai.ChatCompletionResponse{
|
||||||
|
Choices: []openai.ChatCompletionChoice{
|
||||||
|
{
|
||||||
|
Message: openai.ChatCompletionMessage{
|
||||||
|
Role: openai.ChatMessageRoleAssistant,
|
||||||
|
Content: content,
|
||||||
|
},
|
||||||
|
},
|
||||||
|
},
|
||||||
|
}
|
||||||
|
|
||||||
|
// Update episode ID if provided in response
|
||||||
|
if respEpisodeID, ok := tzResponse["episode_id"].(string); ok && respEpisodeID != "" {
|
||||||
|
a.episodeID = respEpisodeID
|
||||||
|
}
|
||||||
|
|
||||||
|
return response, nil
|
||||||
|
}
|
||||||
|
|
||||||
|
// ConvertEBPFProgramsToTraceSpecs converts old EBPFProgram format to new TraceSpec format
|
||||||
|
func (a *LinuxDiagnosticAgent) ConvertEBPFProgramsToTraceSpecs(ebpfPrograms []types.EBPFRequest) []ebpf.TraceSpec {
|
||||||
|
var traceSpecs []ebpf.TraceSpec
|
||||||
|
|
||||||
|
for _, prog := range ebpfPrograms {
|
||||||
|
spec := a.convertToTraceSpec(prog)
|
||||||
|
traceSpecs = append(traceSpecs, spec)
|
||||||
|
}
|
||||||
|
|
||||||
|
return traceSpecs
|
||||||
|
}
|
||||||
|
|
||||||
|
// convertToTraceSpec converts an EBPFRequest to a TraceSpec for BCC-style tracing
|
||||||
|
func (a *LinuxDiagnosticAgent) convertToTraceSpec(prog types.EBPFRequest) ebpf.TraceSpec {
|
||||||
|
// Determine probe type based on target and type
|
||||||
|
probeType := "p" // default to kprobe
|
||||||
|
target := prog.Target
|
||||||
|
|
||||||
|
if strings.HasPrefix(target, "tracepoint:") {
|
||||||
|
probeType = "t"
|
||||||
|
target = strings.TrimPrefix(target, "tracepoint:")
|
||||||
|
} else if strings.HasPrefix(target, "kprobe:") {
|
||||||
|
probeType = "p"
|
||||||
|
target = strings.TrimPrefix(target, "kprobe:")
|
||||||
|
} else if prog.Type == "tracepoint" {
|
||||||
|
probeType = "t"
|
||||||
|
} else if prog.Type == "syscall" {
|
||||||
|
// Convert syscall names to kprobe targets
|
||||||
|
if !strings.HasPrefix(target, "__x64_sys_") && !strings.Contains(target, ":") {
|
||||||
|
if strings.HasPrefix(target, "sys_") {
|
||||||
|
target = "__x64_" + target
|
||||||
|
} else {
|
||||||
|
target = "__x64_sys_" + target
|
||||||
|
}
|
||||||
|
}
|
||||||
|
probeType = "p"
|
||||||
|
}
|
||||||
|
|
||||||
|
// Set default duration if not specified
|
||||||
|
duration := prog.Duration
|
||||||
|
if duration <= 0 {
|
||||||
|
duration = 5 // default 5 seconds
|
||||||
|
}
|
||||||
|
|
||||||
|
return ebpf.TraceSpec{
|
||||||
|
ProbeType: probeType,
|
||||||
|
Target: target,
|
||||||
|
Format: prog.Description, // Use description as format
|
||||||
|
Arguments: []string{}, // Start with no arguments for compatibility
|
||||||
|
Duration: duration,
|
||||||
|
UID: -1, // No UID filter (don't default to 0 which means root only)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// executeEBPFTraces executes multiple eBPF traces using the eBPF service
|
||||||
|
func (a *LinuxDiagnosticAgent) ExecuteEBPFTraces(traceSpecs []ebpf.TraceSpec) []map[string]interface{} {
|
||||||
|
if len(traceSpecs) == 0 {
|
||||||
|
return []map[string]interface{}{}
|
||||||
|
}
|
||||||
|
|
||||||
|
a.logger.Info("Executing %d eBPF traces", len(traceSpecs))
|
||||||
|
|
||||||
|
results := make([]map[string]interface{}, 0, len(traceSpecs))
|
||||||
|
|
||||||
|
// Execute each trace using the eBPF manager
|
||||||
|
for i, spec := range traceSpecs {
|
||||||
|
a.logger.Debug("Starting trace %d: %s", i, spec.Target)
|
||||||
|
|
||||||
|
// Start the trace
|
||||||
|
traceID, err := a.ebpfManager.StartTrace(spec)
|
||||||
|
if err != nil {
|
||||||
|
a.logger.Error("Failed to start trace %d: %v", i, err)
|
||||||
|
result := map[string]interface{}{
|
||||||
|
"index": i,
|
||||||
|
"target": spec.Target,
|
||||||
|
"success": false,
|
||||||
|
"error": err.Error(),
|
||||||
|
}
|
||||||
|
results = append(results, result)
|
||||||
|
continue
|
||||||
|
}
|
||||||
|
|
||||||
|
// Wait for the trace duration
|
||||||
|
time.Sleep(time.Duration(spec.Duration) * time.Second)
|
||||||
|
|
||||||
|
// Get the trace result
|
||||||
|
traceResult, err := a.ebpfManager.GetTraceResult(traceID)
|
||||||
|
if err != nil {
|
||||||
|
a.logger.Error("Failed to get results for trace %d: %v", i, err)
|
||||||
|
result := map[string]interface{}{
|
||||||
|
"index": i,
|
||||||
|
"target": spec.Target,
|
||||||
|
"success": false,
|
||||||
|
"error": err.Error(),
|
||||||
|
}
|
||||||
|
results = append(results, result)
|
||||||
|
continue
|
||||||
|
}
|
||||||
|
|
||||||
|
// Build successful result
|
||||||
|
result := map[string]interface{}{
|
||||||
|
"index": i,
|
||||||
|
"target": spec.Target,
|
||||||
|
"success": true,
|
||||||
|
"event_count": traceResult.EventCount,
|
||||||
|
"events_per_second": traceResult.Statistics.EventsPerSecond,
|
||||||
|
"duration": traceResult.EndTime.Sub(traceResult.StartTime).Seconds(),
|
||||||
|
"summary": traceResult.Summary,
|
||||||
|
}
|
||||||
|
results = append(results, result)
|
||||||
|
|
||||||
|
a.logger.Debug("Completed trace %d: %d events", i, traceResult.EventCount)
|
||||||
|
}
|
||||||
|
|
||||||
|
a.logger.Info("Completed %d eBPF traces", len(results))
|
||||||
|
return results
|
||||||
}
|
}
|
||||||
|
|||||||
107
agent_test.go
107
agent_test.go
@@ -1,107 +0,0 @@
|
|||||||
package main
|
|
||||||
|
|
||||||
import (
|
|
||||||
"testing"
|
|
||||||
"time"
|
|
||||||
)
|
|
||||||
|
|
||||||
func TestCommandExecutor_ValidateCommand(t *testing.T) {
|
|
||||||
executor := NewCommandExecutor(5 * time.Second)
|
|
||||||
|
|
||||||
tests := []struct {
|
|
||||||
name string
|
|
||||||
command string
|
|
||||||
wantErr bool
|
|
||||||
}{
|
|
||||||
{
|
|
||||||
name: "safe command - ls",
|
|
||||||
command: "ls -la /var",
|
|
||||||
wantErr: false,
|
|
||||||
},
|
|
||||||
{
|
|
||||||
name: "safe command - df",
|
|
||||||
command: "df -h",
|
|
||||||
wantErr: false,
|
|
||||||
},
|
|
||||||
{
|
|
||||||
name: "safe command - ps",
|
|
||||||
command: "ps aux | grep nginx",
|
|
||||||
wantErr: false,
|
|
||||||
},
|
|
||||||
{
|
|
||||||
name: "dangerous command - rm",
|
|
||||||
command: "rm -rf /tmp/*",
|
|
||||||
wantErr: true,
|
|
||||||
},
|
|
||||||
{
|
|
||||||
name: "dangerous command - dd",
|
|
||||||
command: "dd if=/dev/zero of=/dev/sda",
|
|
||||||
wantErr: true,
|
|
||||||
},
|
|
||||||
{
|
|
||||||
name: "dangerous command - sudo",
|
|
||||||
command: "sudo systemctl stop nginx",
|
|
||||||
wantErr: true,
|
|
||||||
},
|
|
||||||
{
|
|
||||||
name: "dangerous command - redirection",
|
|
||||||
command: "echo 'test' > /etc/passwd",
|
|
||||||
wantErr: true,
|
|
||||||
},
|
|
||||||
}
|
|
||||||
|
|
||||||
for _, tt := range tests {
|
|
||||||
t.Run(tt.name, func(t *testing.T) {
|
|
||||||
err := executor.validateCommand(tt.command)
|
|
||||||
if (err != nil) != tt.wantErr {
|
|
||||||
t.Errorf("validateCommand() error = %v, wantErr %v", err, tt.wantErr)
|
|
||||||
}
|
|
||||||
})
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
func TestCommandExecutor_Execute(t *testing.T) {
|
|
||||||
executor := NewCommandExecutor(5 * time.Second)
|
|
||||||
|
|
||||||
// Test safe command execution
|
|
||||||
cmd := Command{
|
|
||||||
ID: "test_echo",
|
|
||||||
Command: "echo 'Hello, World!'",
|
|
||||||
Description: "Test echo command",
|
|
||||||
}
|
|
||||||
|
|
||||||
result := executor.Execute(cmd)
|
|
||||||
|
|
||||||
if result.ExitCode != 0 {
|
|
||||||
t.Errorf("Expected exit code 0, got %d", result.ExitCode)
|
|
||||||
}
|
|
||||||
|
|
||||||
if result.Output != "Hello, World!\n" {
|
|
||||||
t.Errorf("Expected 'Hello, World!\\n', got '%s'", result.Output)
|
|
||||||
}
|
|
||||||
|
|
||||||
if result.Error != "" {
|
|
||||||
t.Errorf("Expected no error, got '%s'", result.Error)
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
func TestCommandExecutor_ExecuteUnsafeCommand(t *testing.T) {
|
|
||||||
executor := NewCommandExecutor(5 * time.Second)
|
|
||||||
|
|
||||||
// Test unsafe command rejection
|
|
||||||
cmd := Command{
|
|
||||||
ID: "test_rm",
|
|
||||||
Command: "rm -rf /tmp/test",
|
|
||||||
Description: "Dangerous rm command",
|
|
||||||
}
|
|
||||||
|
|
||||||
result := executor.Execute(cmd)
|
|
||||||
|
|
||||||
if result.ExitCode != 1 {
|
|
||||||
t.Errorf("Expected exit code 1 for unsafe command, got %d", result.ExitCode)
|
|
||||||
}
|
|
||||||
|
|
||||||
if result.Error == "" {
|
|
||||||
t.Error("Expected error for unsafe command, got none")
|
|
||||||
}
|
|
||||||
}
|
|
||||||
@@ -1,141 +0,0 @@
|
|||||||
#!/bin/bash
|
|
||||||
|
|
||||||
# Test the eBPF-enhanced NannyAgent
|
|
||||||
# This script demonstrates the new eBPF integration capabilities
|
|
||||||
|
|
||||||
set -e
|
|
||||||
|
|
||||||
echo "🔬 Testing eBPF-Enhanced NannyAgent"
|
|
||||||
echo "=================================="
|
|
||||||
echo ""
|
|
||||||
|
|
||||||
AGENT="./nannyagent-ebpf"
|
|
||||||
|
|
||||||
if [ ! -f "$AGENT" ]; then
|
|
||||||
echo "Building agent..."
|
|
||||||
go build -o nannyagent-ebpf .
|
|
||||||
fi
|
|
||||||
|
|
||||||
echo "1. Checking eBPF Capabilities"
|
|
||||||
echo "-----------------------------"
|
|
||||||
./ebpf_helper.sh check
|
|
||||||
echo ""
|
|
||||||
|
|
||||||
echo "2. Testing eBPF Manager Initialization"
|
|
||||||
echo "-------------------------------------"
|
|
||||||
echo "Starting agent in test mode..."
|
|
||||||
echo ""
|
|
||||||
|
|
||||||
# Create a test script that will send a predefined issue to test eBPF
|
|
||||||
cat > /tmp/test_ebpf_issue.txt << 'EOF'
|
|
||||||
Network connection timeouts to external services. Applications report intermittent failures when trying to connect to remote APIs. The issue occurs randomly and affects multiple processes.
|
|
||||||
EOF
|
|
||||||
|
|
||||||
echo "Test Issue: Network connection timeouts"
|
|
||||||
echo "Expected eBPF Programs: Network tracing, syscall monitoring"
|
|
||||||
echo ""
|
|
||||||
|
|
||||||
echo "3. Demonstration of eBPF Program Suggestions"
|
|
||||||
echo "-------------------------------------------"
|
|
||||||
|
|
||||||
# Show what eBPF programs would be suggested for different issues
|
|
||||||
echo "For NETWORK issues - Expected eBPF programs:"
|
|
||||||
echo "- tracepoint:syscalls/sys_enter_connect (network connections)"
|
|
||||||
echo "- kprobe:tcp_connect (TCP connection attempts)"
|
|
||||||
echo "- kprobe:tcp_sendmsg (network send operations)"
|
|
||||||
echo ""
|
|
||||||
|
|
||||||
echo "For PROCESS issues - Expected eBPF programs:"
|
|
||||||
echo "- tracepoint:syscalls/sys_enter_execve (process execution)"
|
|
||||||
echo "- tracepoint:sched/sched_process_exit (process termination)"
|
|
||||||
echo "- kprobe:do_fork (process creation)"
|
|
||||||
echo ""
|
|
||||||
|
|
||||||
echo "For FILE issues - Expected eBPF programs:"
|
|
||||||
echo "- tracepoint:syscalls/sys_enter_openat (file opens)"
|
|
||||||
echo "- kprobe:vfs_read (file reads)"
|
|
||||||
echo "- kprobe:vfs_write (file writes)"
|
|
||||||
echo ""
|
|
||||||
|
|
||||||
echo "For PERFORMANCE issues - Expected eBPF programs:"
|
|
||||||
echo "- tracepoint:syscalls/sys_enter_* (syscall frequency analysis)"
|
|
||||||
echo "- kprobe:schedule (CPU scheduling events)"
|
|
||||||
echo ""
|
|
||||||
|
|
||||||
echo "4. eBPF Integration Features"
|
|
||||||
echo "---------------------------"
|
|
||||||
echo "✓ Cilium eBPF library integration"
|
|
||||||
echo "✓ bpftrace-based program execution"
|
|
||||||
echo "✓ Dynamic program generation based on issue type"
|
|
||||||
echo "✓ Parallel execution with regular diagnostic commands"
|
|
||||||
echo "✓ Structured JSON event collection"
|
|
||||||
echo "✓ AI-driven eBPF program selection"
|
|
||||||
echo ""
|
|
||||||
|
|
||||||
echo "5. Example AI Response with eBPF"
|
|
||||||
echo "-------------------------------"
|
|
||||||
cat << 'EOF'
|
|
||||||
{
|
|
||||||
"response_type": "diagnostic",
|
|
||||||
"reasoning": "Network timeout issues require monitoring TCP connections and system calls to identify bottlenecks",
|
|
||||||
"commands": [
|
|
||||||
{"id": "net_status", "command": "ss -tulpn", "description": "Current network connections"},
|
|
||||||
{"id": "net_config", "command": "ip route show", "description": "Network configuration"}
|
|
||||||
],
|
|
||||||
"ebpf_programs": [
|
|
||||||
{
|
|
||||||
"name": "tcp_connect_monitor",
|
|
||||||
"type": "kprobe",
|
|
||||||
"target": "tcp_connect",
|
|
||||||
"duration": 15,
|
|
||||||
"description": "Monitor TCP connection attempts"
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"name": "syscall_network",
|
|
||||||
"type": "tracepoint",
|
|
||||||
"target": "syscalls/sys_enter_connect",
|
|
||||||
"duration": 15,
|
|
||||||
"filters": {"comm": "curl"},
|
|
||||||
"description": "Monitor network-related system calls"
|
|
||||||
}
|
|
||||||
]
|
|
||||||
}
|
|
||||||
EOF
|
|
||||||
echo ""
|
|
||||||
|
|
||||||
echo "6. Security and Safety"
|
|
||||||
echo "--------------------"
|
|
||||||
echo "✓ eBPF programs are read-only and time-limited"
|
|
||||||
echo "✓ No system modification capabilities"
|
|
||||||
echo "✓ Automatic cleanup after execution"
|
|
||||||
echo "✓ Safe execution in containers and restricted environments"
|
|
||||||
echo "✓ Graceful fallback when eBPF is not available"
|
|
||||||
echo ""
|
|
||||||
|
|
||||||
echo "7. Next Steps"
|
|
||||||
echo "------------"
|
|
||||||
echo "To test the full eBPF integration:"
|
|
||||||
echo ""
|
|
||||||
echo "a) Run with root privileges for full eBPF access:"
|
|
||||||
echo " sudo $AGENT"
|
|
||||||
echo ""
|
|
||||||
echo "b) Try these test scenarios:"
|
|
||||||
echo " - 'Network connection timeouts'"
|
|
||||||
echo " - 'High CPU usage and slow performance'"
|
|
||||||
echo " - 'File permission errors'"
|
|
||||||
echo " - 'Process hanging or not responding'"
|
|
||||||
echo ""
|
|
||||||
echo "c) Install additional eBPF tools:"
|
|
||||||
echo " sudo ./ebpf_helper.sh install"
|
|
||||||
echo ""
|
|
||||||
|
|
||||||
echo "🎯 eBPF Integration Complete!"
|
|
||||||
echo ""
|
|
||||||
echo "The agent now supports:"
|
|
||||||
echo "- Dynamic eBPF program compilation and execution"
|
|
||||||
echo "- AI-driven selection of appropriate tracepoints and kprobes"
|
|
||||||
echo "- Real-time system event monitoring during diagnosis"
|
|
||||||
echo "- Integration with Cilium eBPF library for professional-grade monitoring"
|
|
||||||
echo ""
|
|
||||||
echo "This provides unprecedented visibility into system behavior"
|
|
||||||
echo "for accurate root cause analysis and issue resolution."
|
|
||||||
@@ -1,51 +0,0 @@
|
|||||||
#!/bin/bash
|
|
||||||
|
|
||||||
# NannyAPI Function Discovery Script
|
|
||||||
# This script helps you find the correct function name for your NannyAPI setup
|
|
||||||
|
|
||||||
echo "🔍 NannyAPI Function Discovery"
|
|
||||||
echo "=============================="
|
|
||||||
echo ""
|
|
||||||
|
|
||||||
ENDPOINT="${NANNYAPI_ENDPOINT:-http://tensorzero.netcup.internal:3000/openai/v1}"
|
|
||||||
|
|
||||||
echo "Testing endpoint: $ENDPOINT/chat/completions"
|
|
||||||
echo ""
|
|
||||||
|
|
||||||
# Test common function name patterns
|
|
||||||
test_functions=(
|
|
||||||
"nannyapi::function_name::diagnose"
|
|
||||||
"nannyapi::function_name::diagnose_and_heal"
|
|
||||||
"nannyapi::function_name::linux_diagnostic"
|
|
||||||
"nannyapi::function_name::system_diagnostic"
|
|
||||||
"nannyapi::model_name::gpt-4"
|
|
||||||
"nannyapi::model_name::claude"
|
|
||||||
)
|
|
||||||
|
|
||||||
for func in "${test_functions[@]}"; do
|
|
||||||
echo "Testing function: $func"
|
|
||||||
|
|
||||||
response=$(curl -s -X POST "$ENDPOINT/chat/completions" \
|
|
||||||
-H "Content-Type: application/json" \
|
|
||||||
-d "{\"model\":\"$func\",\"messages\":[{\"role\":\"user\",\"content\":\"test\"}]}")
|
|
||||||
|
|
||||||
if echo "$response" | grep -q "Unknown function"; then
|
|
||||||
echo " ❌ Function not found"
|
|
||||||
elif echo "$response" | grep -q "error"; then
|
|
||||||
echo " ⚠️ Error: $(echo "$response" | jq -r '.error' 2>/dev/null || echo "$response")"
|
|
||||||
else
|
|
||||||
echo " ✅ Function exists and responding!"
|
|
||||||
echo " Use this in your environment: export NANNYAPI_MODEL=\"$func\""
|
|
||||||
fi
|
|
||||||
echo ""
|
|
||||||
done
|
|
||||||
|
|
||||||
echo "💡 If none of the above work, check your NannyAPI configuration file"
|
|
||||||
echo " for the correct function names and update NANNYAPI_MODEL accordingly."
|
|
||||||
echo ""
|
|
||||||
echo "Example NannyAPI config snippet:"
|
|
||||||
echo "```yaml"
|
|
||||||
echo "functions:"
|
|
||||||
echo " diagnose_and_heal: # This becomes 'nannyapi::function_name::diagnose_and_heal'"
|
|
||||||
echo " # function definition"
|
|
||||||
echo "```"
|
|
||||||
334
docs/INSTALLATION.md
Normal file
334
docs/INSTALLATION.md
Normal file
@@ -0,0 +1,334 @@
|
|||||||
|
# NannyAgent Installation Guide
|
||||||
|
|
||||||
|
## Quick Install
|
||||||
|
|
||||||
|
### One-Line Install (Recommended)
|
||||||
|
|
||||||
|
After uploading `install.sh` to your website:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
curl -fsSL https://your-domain.com/install.sh | sudo bash
|
||||||
|
```
|
||||||
|
|
||||||
|
Or with wget:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
wget -qO- https://your-domain.com/install.sh | sudo bash
|
||||||
|
```
|
||||||
|
|
||||||
|
### Two-Step Install (More Secure)
|
||||||
|
|
||||||
|
Download and inspect the installer first:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Download the installer
|
||||||
|
curl -fsSL https://your-domain.com/install.sh -o install.sh
|
||||||
|
|
||||||
|
# Inspect the script (recommended!)
|
||||||
|
less install.sh
|
||||||
|
|
||||||
|
# Make it executable
|
||||||
|
chmod +x install.sh
|
||||||
|
|
||||||
|
# Run the installer
|
||||||
|
sudo ./install.sh
|
||||||
|
```
|
||||||
|
|
||||||
|
## Installation from GitHub
|
||||||
|
|
||||||
|
If you're hosting on GitHub:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
curl -fsSL https://raw.githubusercontent.com/yourusername/nannyagent/main/install.sh | sudo bash
|
||||||
|
```
|
||||||
|
|
||||||
|
## System Requirements
|
||||||
|
|
||||||
|
Before installing, ensure your system meets these requirements:
|
||||||
|
|
||||||
|
### Operating System
|
||||||
|
- ✅ Linux (any distribution)
|
||||||
|
- ❌ Windows (not supported)
|
||||||
|
- ❌ macOS (not supported)
|
||||||
|
- ❌ Containers/Docker (not supported)
|
||||||
|
- ❌ LXC (not supported)
|
||||||
|
|
||||||
|
### Architecture
|
||||||
|
- ✅ amd64 (x86_64)
|
||||||
|
- ✅ arm64 (aarch64)
|
||||||
|
- ❌ i386/i686 (32-bit not supported)
|
||||||
|
- ❌ Other architectures (not supported)
|
||||||
|
|
||||||
|
### Kernel Version
|
||||||
|
- ✅ Linux kernel 5.x or higher
|
||||||
|
- ❌ Linux kernel 4.x or lower (not supported)
|
||||||
|
|
||||||
|
Check your kernel version:
|
||||||
|
```bash
|
||||||
|
uname -r
|
||||||
|
# Should show 5.x.x or higher
|
||||||
|
```
|
||||||
|
|
||||||
|
### Privileges
|
||||||
|
- Must have root/sudo access
|
||||||
|
- Will create system directories:
|
||||||
|
- `/usr/local/bin/nannyagent` (binary)
|
||||||
|
- `/etc/nannyagent` (configuration)
|
||||||
|
- `/var/lib/nannyagent` (data directory)
|
||||||
|
|
||||||
|
### Network
|
||||||
|
- Connectivity to Supabase backend required
|
||||||
|
- HTTPS access to your Supabase project URL
|
||||||
|
- No proxy support at this time
|
||||||
|
|
||||||
|
## What the Installer Does
|
||||||
|
|
||||||
|
The installer performs these steps automatically:
|
||||||
|
|
||||||
|
1. ✅ **System Checks**
|
||||||
|
- Verifies root privileges
|
||||||
|
- Detects OS and architecture
|
||||||
|
- Checks kernel version (5.x+)
|
||||||
|
- Detects container environments
|
||||||
|
- Checks for existing installations
|
||||||
|
|
||||||
|
2. ✅ **Dependency Installation**
|
||||||
|
- Installs `bpftrace` (eBPF tracing tool)
|
||||||
|
- Installs `bpfcc-tools` (BCC toolkit)
|
||||||
|
- Installs kernel headers if needed
|
||||||
|
- Uses your system's package manager (apt/dnf/yum)
|
||||||
|
|
||||||
|
3. ✅ **Build & Install**
|
||||||
|
- Verifies Go installation (required for building)
|
||||||
|
- Compiles the nannyagent binary
|
||||||
|
- Tests connectivity to Supabase
|
||||||
|
- Installs binary to `/usr/local/bin`
|
||||||
|
|
||||||
|
4. ✅ **Configuration**
|
||||||
|
- Creates `/etc/nannyagent/config.env`
|
||||||
|
- Creates `/var/lib/nannyagent` data directory
|
||||||
|
- Sets proper permissions (secure)
|
||||||
|
- Creates installation lock file
|
||||||
|
|
||||||
|
## Installation Exit Codes
|
||||||
|
|
||||||
|
The installer exits with specific codes for different scenarios:
|
||||||
|
|
||||||
|
| Exit Code | Meaning | Resolution |
|
||||||
|
|-----------|---------|------------|
|
||||||
|
| 0 | Success | Installation completed |
|
||||||
|
| 1 | Not root | Run with `sudo` |
|
||||||
|
| 2 | Unsupported OS | Use Linux |
|
||||||
|
| 3 | Unsupported architecture | Use amd64 or arm64 |
|
||||||
|
| 4 | Container detected | Install on bare metal or VM |
|
||||||
|
| 5 | Kernel too old | Upgrade to kernel 5.x+ |
|
||||||
|
| 6 | Existing installation | Remove `/var/lib/nannyagent` first |
|
||||||
|
| 7 | eBPF tools failed | Check package manager and repos |
|
||||||
|
| 8 | Go not installed | Install Go from golang.org |
|
||||||
|
| 9 | Build failed | Check Go installation and dependencies |
|
||||||
|
| 10 | Directory creation failed | Check permissions |
|
||||||
|
| 11 | Binary installation failed | Check disk space and permissions |
|
||||||
|
|
||||||
|
## Post-Installation
|
||||||
|
|
||||||
|
After successful installation:
|
||||||
|
|
||||||
|
### 1. Configure Supabase URL
|
||||||
|
|
||||||
|
Edit the configuration file:
|
||||||
|
```bash
|
||||||
|
sudo nano /etc/nannyagent/config.env
|
||||||
|
```
|
||||||
|
|
||||||
|
Set your Supabase project URL:
|
||||||
|
```bash
|
||||||
|
SUPABASE_PROJECT_URL=https://your-project.supabase.co
|
||||||
|
TOKEN_PATH=/var/lib/nannyagent/token.json
|
||||||
|
DEBUG=false
|
||||||
|
```
|
||||||
|
|
||||||
|
### 2. Test the Installation
|
||||||
|
|
||||||
|
Check version (no sudo needed):
|
||||||
|
```bash
|
||||||
|
nannyagent --version
|
||||||
|
```
|
||||||
|
|
||||||
|
Show help (no sudo needed):
|
||||||
|
```bash
|
||||||
|
nannyagent --help
|
||||||
|
```
|
||||||
|
|
||||||
|
### 3. Run the Agent
|
||||||
|
|
||||||
|
Start the agent (requires sudo):
|
||||||
|
```bash
|
||||||
|
sudo nannyagent
|
||||||
|
```
|
||||||
|
|
||||||
|
On first run, you'll see authentication instructions:
|
||||||
|
```
|
||||||
|
Visit: https://your-app.com/device-auth
|
||||||
|
Enter code: ABCD-1234
|
||||||
|
```
|
||||||
|
|
||||||
|
## Uninstallation
|
||||||
|
|
||||||
|
To remove NannyAgent:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Remove binary
|
||||||
|
sudo rm /usr/local/bin/nannyagent
|
||||||
|
|
||||||
|
# Remove configuration
|
||||||
|
sudo rm -rf /etc/nannyagent
|
||||||
|
|
||||||
|
# Remove data directory (includes authentication tokens)
|
||||||
|
sudo rm -rf /var/lib/nannyagent
|
||||||
|
```
|
||||||
|
|
||||||
|
## Troubleshooting
|
||||||
|
|
||||||
|
### "Kernel version X.X is not supported"
|
||||||
|
|
||||||
|
Your kernel is too old. Check current version:
|
||||||
|
```bash
|
||||||
|
uname -r
|
||||||
|
```
|
||||||
|
|
||||||
|
Options:
|
||||||
|
1. Upgrade your kernel to 5.x or higher
|
||||||
|
2. Use a different system with a newer kernel
|
||||||
|
3. Check your distribution's documentation for kernel upgrades
|
||||||
|
|
||||||
|
### "Another instance may already be installed"
|
||||||
|
|
||||||
|
The installer detected an existing installation. Options:
|
||||||
|
|
||||||
|
**Option 1:** Remove the existing installation
|
||||||
|
```bash
|
||||||
|
sudo rm -rf /var/lib/nannyagent
|
||||||
|
```
|
||||||
|
|
||||||
|
**Option 2:** Check if it's actually running
|
||||||
|
```bash
|
||||||
|
ps aux | grep nannyagent
|
||||||
|
```
|
||||||
|
|
||||||
|
If running, stop it first, then remove the data directory.
|
||||||
|
|
||||||
|
### "Cannot connect to Supabase"
|
||||||
|
|
||||||
|
This is a warning, not an error. The installation will complete, but the agent won't work without connectivity.
|
||||||
|
|
||||||
|
Check:
|
||||||
|
1. Is SUPABASE_PROJECT_URL set correctly?
|
||||||
|
```bash
|
||||||
|
cat /etc/nannyagent/config.env
|
||||||
|
```
|
||||||
|
|
||||||
|
2. Can you reach the URL?
|
||||||
|
```bash
|
||||||
|
curl -I https://your-project.supabase.co
|
||||||
|
```
|
||||||
|
|
||||||
|
3. Check firewall rules:
|
||||||
|
```bash
|
||||||
|
sudo iptables -L -n | grep -i drop
|
||||||
|
```
|
||||||
|
|
||||||
|
### "Go is not installed"
|
||||||
|
|
||||||
|
The installer requires Go to build the binary. Install Go:
|
||||||
|
|
||||||
|
**Ubuntu/Debian:**
|
||||||
|
```bash
|
||||||
|
sudo apt update
|
||||||
|
sudo apt install golang-go
|
||||||
|
```
|
||||||
|
|
||||||
|
**RHEL/CentOS/Fedora:**
|
||||||
|
```bash
|
||||||
|
sudo dnf install golang
|
||||||
|
```
|
||||||
|
|
||||||
|
Or download from: https://golang.org/dl/
|
||||||
|
|
||||||
|
### "eBPF tools installation failed"
|
||||||
|
|
||||||
|
Check your package repositories:
|
||||||
|
|
||||||
|
**Ubuntu/Debian:**
|
||||||
|
```bash
|
||||||
|
sudo apt update
|
||||||
|
sudo apt install bpfcc-tools bpftrace
|
||||||
|
```
|
||||||
|
|
||||||
|
**RHEL/Fedora:**
|
||||||
|
```bash
|
||||||
|
sudo dnf install bcc-tools bpftrace
|
||||||
|
```
|
||||||
|
|
||||||
|
## Security Considerations
|
||||||
|
|
||||||
|
### Permissions
|
||||||
|
|
||||||
|
The installer creates directories with restricted permissions:
|
||||||
|
- `/etc/nannyagent` - 755 (readable by all, writable by root)
|
||||||
|
- `/etc/nannyagent/config.env` - 600 (only root can read/write)
|
||||||
|
- `/var/lib/nannyagent` - 700 (only root can access)
|
||||||
|
|
||||||
|
### Authentication Tokens
|
||||||
|
|
||||||
|
Authentication tokens are stored securely in:
|
||||||
|
```
|
||||||
|
/var/lib/nannyagent/token.json
|
||||||
|
```
|
||||||
|
|
||||||
|
Only root can access this file (permissions: 600).
|
||||||
|
|
||||||
|
### Network Communication
|
||||||
|
|
||||||
|
All communication with Supabase uses HTTPS (TLS encrypted).
|
||||||
|
|
||||||
|
## Manual Installation (Alternative)
|
||||||
|
|
||||||
|
If you prefer manual installation:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# 1. Clone repository
|
||||||
|
git clone https://github.com/yourusername/nannyagent.git
|
||||||
|
cd nannyagent
|
||||||
|
|
||||||
|
# 2. Install eBPF tools (Ubuntu/Debian)
|
||||||
|
sudo apt update
|
||||||
|
sudo apt install bpfcc-tools bpftrace linux-headers-$(uname -r)
|
||||||
|
|
||||||
|
# 3. Build binary
|
||||||
|
go mod tidy
|
||||||
|
CGO_ENABLED=0 GOOS=linux go build -a -installsuffix cgo -ldflags '-w -s' -o nannyagent .
|
||||||
|
|
||||||
|
# 4. Install
|
||||||
|
sudo cp nannyagent /usr/local/bin/
|
||||||
|
sudo chmod 755 /usr/local/bin/nannyagent
|
||||||
|
|
||||||
|
# 5. Create directories
|
||||||
|
sudo mkdir -p /etc/nannyagent
|
||||||
|
sudo mkdir -p /var/lib/nannyagent
|
||||||
|
sudo chmod 700 /var/lib/nannyagent
|
||||||
|
|
||||||
|
# 6. Create configuration
|
||||||
|
sudo cat > /etc/nannyagent/config.env <<EOF
|
||||||
|
SUPABASE_PROJECT_URL=https://your-project.supabase.co
|
||||||
|
TOKEN_PATH=/var/lib/nannyagent/token.json
|
||||||
|
DEBUG=false
|
||||||
|
EOF
|
||||||
|
|
||||||
|
sudo chmod 600 /etc/nannyagent/config.env
|
||||||
|
```
|
||||||
|
|
||||||
|
## Support
|
||||||
|
|
||||||
|
For issues or questions:
|
||||||
|
- GitHub Issues: https://github.com/yourusername/nannyagent/issues
|
||||||
|
- Documentation: https://github.com/yourusername/nannyagent/docs
|
||||||
@@ -1,550 +0,0 @@
|
|||||||
package main
|
|
||||||
|
|
||||||
import (
|
|
||||||
"context"
|
|
||||||
"fmt"
|
|
||||||
"log"
|
|
||||||
"strings"
|
|
||||||
"sync"
|
|
||||||
"time"
|
|
||||||
|
|
||||||
"github.com/cilium/ebpf"
|
|
||||||
"github.com/cilium/ebpf/asm"
|
|
||||||
"github.com/cilium/ebpf/link"
|
|
||||||
"github.com/cilium/ebpf/perf"
|
|
||||||
"github.com/cilium/ebpf/rlimit"
|
|
||||||
)
|
|
||||||
|
|
||||||
// NetworkEvent represents a network event captured by eBPF
|
|
||||||
type NetworkEvent struct {
|
|
||||||
Timestamp uint64 `json:"timestamp"`
|
|
||||||
PID uint32 `json:"pid"`
|
|
||||||
TID uint32 `json:"tid"`
|
|
||||||
UID uint32 `json:"uid"`
|
|
||||||
EventType string `json:"event_type"`
|
|
||||||
Comm [16]byte `json:"-"`
|
|
||||||
CommStr string `json:"comm"`
|
|
||||||
}
|
|
||||||
|
|
||||||
// CiliumEBPFManager implements eBPF monitoring using Cilium eBPF library
|
|
||||||
type CiliumEBPFManager struct {
|
|
||||||
mu sync.RWMutex
|
|
||||||
activePrograms map[string]*EBPFProgram
|
|
||||||
completedResults map[string]*EBPFTrace
|
|
||||||
capabilities map[string]bool
|
|
||||||
}
|
|
||||||
|
|
||||||
// EBPFProgram represents a running eBPF program
|
|
||||||
type EBPFProgram struct {
|
|
||||||
ID string
|
|
||||||
Request EBPFRequest
|
|
||||||
Program *ebpf.Program
|
|
||||||
Link link.Link
|
|
||||||
PerfReader *perf.Reader
|
|
||||||
Events []NetworkEvent
|
|
||||||
StartTime time.Time
|
|
||||||
Cancel context.CancelFunc
|
|
||||||
}
|
|
||||||
|
|
||||||
// NewCiliumEBPFManager creates a new Cilium-based eBPF manager
|
|
||||||
func NewCiliumEBPFManager() *CiliumEBPFManager {
|
|
||||||
// Remove memory limit for eBPF programs
|
|
||||||
if err := rlimit.RemoveMemlock(); err != nil {
|
|
||||||
log.Printf("Failed to remove memlock limit: %v", err)
|
|
||||||
}
|
|
||||||
|
|
||||||
return &CiliumEBPFManager{
|
|
||||||
activePrograms: make(map[string]*EBPFProgram),
|
|
||||||
completedResults: make(map[string]*EBPFTrace),
|
|
||||||
capabilities: map[string]bool{
|
|
||||||
"kernel_support": true,
|
|
||||||
"kprobe": true,
|
|
||||||
"kretprobe": true,
|
|
||||||
"tracepoint": true,
|
|
||||||
},
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
// StartEBPFProgram starts an eBPF program using Cilium library
|
|
||||||
func (em *CiliumEBPFManager) StartEBPFProgram(req EBPFRequest) (string, error) {
|
|
||||||
programID := fmt.Sprintf("%s_%d", req.Name, time.Now().Unix())
|
|
||||||
|
|
||||||
ctx, cancel := context.WithTimeout(context.Background(), time.Duration(req.Duration+5)*time.Second)
|
|
||||||
|
|
||||||
program, err := em.createEBPFProgram(req)
|
|
||||||
if err != nil {
|
|
||||||
cancel()
|
|
||||||
return "", fmt.Errorf("failed to create eBPF program: %w", err)
|
|
||||||
}
|
|
||||||
|
|
||||||
programLink, err := em.attachProgram(program, req)
|
|
||||||
if err != nil {
|
|
||||||
if program != nil {
|
|
||||||
program.Close()
|
|
||||||
}
|
|
||||||
cancel()
|
|
||||||
return "", fmt.Errorf("failed to attach eBPF program: %w", err)
|
|
||||||
}
|
|
||||||
|
|
||||||
// Create perf event map for collecting events
|
|
||||||
perfMap, err := ebpf.NewMap(&ebpf.MapSpec{
|
|
||||||
Type: ebpf.PerfEventArray,
|
|
||||||
KeySize: 4,
|
|
||||||
ValueSize: 4,
|
|
||||||
MaxEntries: 128,
|
|
||||||
Name: "events",
|
|
||||||
})
|
|
||||||
if err != nil {
|
|
||||||
if programLink != nil {
|
|
||||||
programLink.Close()
|
|
||||||
}
|
|
||||||
if program != nil {
|
|
||||||
program.Close()
|
|
||||||
}
|
|
||||||
cancel()
|
|
||||||
return "", fmt.Errorf("failed to create perf map: %w", err)
|
|
||||||
}
|
|
||||||
|
|
||||||
perfReader, err := perf.NewReader(perfMap, 4096)
|
|
||||||
if err != nil {
|
|
||||||
perfMap.Close()
|
|
||||||
if programLink != nil {
|
|
||||||
programLink.Close()
|
|
||||||
}
|
|
||||||
if program != nil {
|
|
||||||
program.Close()
|
|
||||||
}
|
|
||||||
cancel()
|
|
||||||
return "", fmt.Errorf("failed to create perf reader: %w", err)
|
|
||||||
}
|
|
||||||
|
|
||||||
ebpfProgram := &EBPFProgram{
|
|
||||||
ID: programID,
|
|
||||||
Request: req,
|
|
||||||
Program: program,
|
|
||||||
Link: programLink,
|
|
||||||
PerfReader: perfReader,
|
|
||||||
Events: make([]NetworkEvent, 0),
|
|
||||||
StartTime: time.Now(),
|
|
||||||
Cancel: cancel,
|
|
||||||
}
|
|
||||||
|
|
||||||
em.mu.Lock()
|
|
||||||
em.activePrograms[programID] = ebpfProgram
|
|
||||||
em.mu.Unlock()
|
|
||||||
|
|
||||||
// Start event collection in goroutine
|
|
||||||
go em.collectEvents(ctx, programID)
|
|
||||||
|
|
||||||
log.Printf("Started eBPF program %s (%s on %s) for %d seconds using Cilium library",
|
|
||||||
programID, req.Type, req.Target, req.Duration)
|
|
||||||
|
|
||||||
return programID, nil
|
|
||||||
}
|
|
||||||
|
|
||||||
// createEBPFProgram creates actual eBPF program using Cilium library
|
|
||||||
func (em *CiliumEBPFManager) createEBPFProgram(req EBPFRequest) (*ebpf.Program, error) {
|
|
||||||
var programType ebpf.ProgramType
|
|
||||||
|
|
||||||
switch req.Type {
|
|
||||||
case "kprobe", "kretprobe":
|
|
||||||
programType = ebpf.Kprobe
|
|
||||||
case "tracepoint":
|
|
||||||
programType = ebpf.TracePoint
|
|
||||||
default:
|
|
||||||
return nil, fmt.Errorf("unsupported program type: %s", req.Type)
|
|
||||||
}
|
|
||||||
|
|
||||||
// Create eBPF instructions that capture basic event data
|
|
||||||
// We'll use a simplified approach that collects events when the probe fires
|
|
||||||
instructions := asm.Instructions{
|
|
||||||
// Get current PID/TID
|
|
||||||
asm.FnGetCurrentPidTgid.Call(),
|
|
||||||
asm.Mov.Reg(asm.R6, asm.R0), // store pid_tgid in R6
|
|
||||||
|
|
||||||
// Get current UID/GID
|
|
||||||
asm.FnGetCurrentUidGid.Call(),
|
|
||||||
asm.Mov.Reg(asm.R7, asm.R0), // store uid_gid in R7
|
|
||||||
|
|
||||||
// Get current ktime
|
|
||||||
asm.FnKtimeGetNs.Call(),
|
|
||||||
asm.Mov.Reg(asm.R8, asm.R0), // store timestamp in R8
|
|
||||||
|
|
||||||
// For now, just return 0 - we'll detect the probe firings via attachment success
|
|
||||||
// and generate events based on realistic UDP traffic patterns
|
|
||||||
asm.Mov.Imm(asm.R0, 0),
|
|
||||||
asm.Return(),
|
|
||||||
}
|
|
||||||
|
|
||||||
// Create eBPF program specification with actual instructions
|
|
||||||
spec := &ebpf.ProgramSpec{
|
|
||||||
Name: req.Name,
|
|
||||||
Type: programType,
|
|
||||||
License: "GPL",
|
|
||||||
Instructions: instructions,
|
|
||||||
}
|
|
||||||
|
|
||||||
// Load the actual eBPF program using Cilium library
|
|
||||||
program, err := ebpf.NewProgram(spec)
|
|
||||||
if err != nil {
|
|
||||||
return nil, fmt.Errorf("failed to load eBPF program: %w", err)
|
|
||||||
}
|
|
||||||
|
|
||||||
log.Printf("Created native eBPF %s program for %s using Cilium library", req.Type, req.Target)
|
|
||||||
return program, nil
|
|
||||||
}
|
|
||||||
|
|
||||||
// attachProgram attaches the eBPF program to the appropriate probe point
|
|
||||||
func (em *CiliumEBPFManager) attachProgram(program *ebpf.Program, req EBPFRequest) (link.Link, error) {
|
|
||||||
if program == nil {
|
|
||||||
return nil, fmt.Errorf("cannot attach nil program")
|
|
||||||
}
|
|
||||||
|
|
||||||
switch req.Type {
|
|
||||||
case "kprobe":
|
|
||||||
l, err := link.Kprobe(req.Target, program, nil)
|
|
||||||
return l, err
|
|
||||||
|
|
||||||
case "kretprobe":
|
|
||||||
l, err := link.Kretprobe(req.Target, program, nil)
|
|
||||||
return l, err
|
|
||||||
|
|
||||||
case "tracepoint":
|
|
||||||
// Parse tracepoint target (e.g., "syscalls:sys_enter_connect")
|
|
||||||
l, err := link.Tracepoint("syscalls", "sys_enter_connect", program, nil)
|
|
||||||
return l, err
|
|
||||||
|
|
||||||
default:
|
|
||||||
return nil, fmt.Errorf("unsupported program type: %s", req.Type)
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
// collectEvents collects events from eBPF program via perf buffer using Cilium library
|
|
||||||
func (em *CiliumEBPFManager) collectEvents(ctx context.Context, programID string) {
|
|
||||||
defer em.cleanupProgram(programID)
|
|
||||||
|
|
||||||
em.mu.RLock()
|
|
||||||
ebpfProgram, exists := em.activePrograms[programID]
|
|
||||||
em.mu.RUnlock()
|
|
||||||
|
|
||||||
if !exists {
|
|
||||||
return
|
|
||||||
}
|
|
||||||
|
|
||||||
duration := time.Duration(ebpfProgram.Request.Duration) * time.Second
|
|
||||||
endTime := time.Now().Add(duration)
|
|
||||||
eventCount := 0
|
|
||||||
|
|
||||||
for time.Now().Before(endTime) {
|
|
||||||
select {
|
|
||||||
case <-ctx.Done():
|
|
||||||
log.Printf("eBPF program %s cancelled", programID)
|
|
||||||
return
|
|
||||||
default:
|
|
||||||
// Our eBPF programs use minimal bytecode and don't write to perf buffer
|
|
||||||
// Instead, we generate realistic events based on the fact that programs are successfully attached
|
|
||||||
// and would fire when UDP kernel functions are called
|
|
||||||
|
|
||||||
// Generate events at reasonable intervals to simulate UDP activity
|
|
||||||
if eventCount < 30 && (time.Now().UnixMilli()%180 < 18) {
|
|
||||||
em.generateRealisticUDPEvent(programID, &eventCount)
|
|
||||||
}
|
|
||||||
|
|
||||||
time.Sleep(150 * time.Millisecond)
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
// Store results before cleanup
|
|
||||||
em.mu.Lock()
|
|
||||||
if program, exists := em.activePrograms[programID]; exists {
|
|
||||||
// Convert NetworkEvent to EBPFEvent for compatibility
|
|
||||||
events := make([]EBPFEvent, len(program.Events))
|
|
||||||
for i, event := range program.Events {
|
|
||||||
events[i] = EBPFEvent{
|
|
||||||
Timestamp: int64(event.Timestamp),
|
|
||||||
EventType: event.EventType,
|
|
||||||
ProcessID: int(event.PID),
|
|
||||||
ProcessName: event.CommStr,
|
|
||||||
Data: map[string]interface{}{
|
|
||||||
"pid": event.PID,
|
|
||||||
"tid": event.TID,
|
|
||||||
"uid": event.UID,
|
|
||||||
},
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
endTime := time.Now()
|
|
||||||
duration := endTime.Sub(program.StartTime)
|
|
||||||
|
|
||||||
trace := &EBPFTrace{
|
|
||||||
TraceID: programID,
|
|
||||||
StartTime: program.StartTime,
|
|
||||||
EndTime: endTime,
|
|
||||||
EventCount: len(events),
|
|
||||||
Events: events,
|
|
||||||
Capability: fmt.Sprintf("%s on %s", program.Request.Type, program.Request.Target),
|
|
||||||
Summary: fmt.Sprintf("eBPF %s on %s captured %d events over %v using Cilium library",
|
|
||||||
program.Request.Type, program.Request.Target, len(events), duration),
|
|
||||||
ProcessList: em.extractProcessList(events),
|
|
||||||
}
|
|
||||||
|
|
||||||
em.completedResults[programID] = trace
|
|
||||||
|
|
||||||
// Log grouped event summary instead of individual events
|
|
||||||
em.logEventSummary(programID, program.Request, events)
|
|
||||||
}
|
|
||||||
em.mu.Unlock()
|
|
||||||
|
|
||||||
log.Printf("eBPF program %s completed - collected %d events via Cilium library", programID, eventCount)
|
|
||||||
}
|
|
||||||
|
|
||||||
// parseEventFromPerf parses raw perf buffer data into NetworkEvent
|
|
||||||
func (em *CiliumEBPFManager) parseEventFromPerf(data []byte, req EBPFRequest) NetworkEvent {
|
|
||||||
// Parse raw perf event data - this is a simplified parser
|
|
||||||
// In production, you'd have a structured event format defined in your eBPF program
|
|
||||||
|
|
||||||
var pid uint32 = 1234 // Default values for parsing
|
|
||||||
var timestamp uint64 = uint64(time.Now().UnixNano())
|
|
||||||
|
|
||||||
// Basic parsing - extract PID if data is long enough
|
|
||||||
if len(data) >= 8 {
|
|
||||||
// Assume first 4 bytes are PID, next 4 are timestamp (simplified)
|
|
||||||
pid = uint32(data[0]) | uint32(data[1])<<8 | uint32(data[2])<<16 | uint32(data[3])<<24
|
|
||||||
}
|
|
||||||
|
|
||||||
return NetworkEvent{
|
|
||||||
Timestamp: timestamp,
|
|
||||||
PID: pid,
|
|
||||||
TID: pid,
|
|
||||||
UID: 1000,
|
|
||||||
EventType: req.Name,
|
|
||||||
CommStr: "cilium_ebpf_process",
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
// GetProgramResults returns the trace results for a program
|
|
||||||
func (em *CiliumEBPFManager) GetProgramResults(programID string) (*EBPFTrace, error) {
|
|
||||||
em.mu.RLock()
|
|
||||||
defer em.mu.RUnlock()
|
|
||||||
|
|
||||||
// First check completed results
|
|
||||||
if trace, exists := em.completedResults[programID]; exists {
|
|
||||||
return trace, nil
|
|
||||||
}
|
|
||||||
|
|
||||||
// If not found in completed results, check active programs (for ongoing programs)
|
|
||||||
program, exists := em.activePrograms[programID]
|
|
||||||
if !exists {
|
|
||||||
return nil, fmt.Errorf("program %s not found", programID)
|
|
||||||
}
|
|
||||||
|
|
||||||
endTime := time.Now()
|
|
||||||
duration := endTime.Sub(program.StartTime)
|
|
||||||
|
|
||||||
// Convert NetworkEvent to EBPFEvent for compatibility
|
|
||||||
events := make([]EBPFEvent, len(program.Events))
|
|
||||||
for i, event := range program.Events {
|
|
||||||
events[i] = EBPFEvent{
|
|
||||||
Timestamp: int64(event.Timestamp),
|
|
||||||
EventType: event.EventType,
|
|
||||||
ProcessID: int(event.PID),
|
|
||||||
ProcessName: event.CommStr,
|
|
||||||
Data: map[string]interface{}{
|
|
||||||
"pid": event.PID,
|
|
||||||
"tid": event.TID,
|
|
||||||
"uid": event.UID,
|
|
||||||
},
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
return &EBPFTrace{
|
|
||||||
TraceID: programID,
|
|
||||||
StartTime: program.StartTime,
|
|
||||||
EndTime: endTime,
|
|
||||||
Capability: program.Request.Name,
|
|
||||||
Events: events,
|
|
||||||
EventCount: len(program.Events),
|
|
||||||
ProcessList: em.extractProcessList(events),
|
|
||||||
Summary: fmt.Sprintf("eBPF %s on %s captured %d events over %v using Cilium library", program.Request.Type, program.Request.Target, len(program.Events), duration),
|
|
||||||
}, nil
|
|
||||||
}
|
|
||||||
|
|
||||||
// cleanupProgram cleans up a completed eBPF program
|
|
||||||
func (em *CiliumEBPFManager) cleanupProgram(programID string) {
|
|
||||||
em.mu.Lock()
|
|
||||||
defer em.mu.Unlock()
|
|
||||||
|
|
||||||
if program, exists := em.activePrograms[programID]; exists {
|
|
||||||
if program.Cancel != nil {
|
|
||||||
program.Cancel()
|
|
||||||
}
|
|
||||||
if program.PerfReader != nil {
|
|
||||||
program.PerfReader.Close()
|
|
||||||
}
|
|
||||||
if program.Link != nil {
|
|
||||||
program.Link.Close()
|
|
||||||
}
|
|
||||||
if program.Program != nil {
|
|
||||||
program.Program.Close()
|
|
||||||
}
|
|
||||||
delete(em.activePrograms, programID)
|
|
||||||
log.Printf("Cleaned up eBPF program %s", programID)
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
// GetCapabilities returns the eBPF capabilities
|
|
||||||
func (em *CiliumEBPFManager) GetCapabilities() map[string]bool {
|
|
||||||
return em.capabilities
|
|
||||||
}
|
|
||||||
|
|
||||||
// GetSummary returns a summary of the eBPF manager
|
|
||||||
func (em *CiliumEBPFManager) GetSummary() map[string]interface{} {
|
|
||||||
em.mu.RLock()
|
|
||||||
defer em.mu.RUnlock()
|
|
||||||
|
|
||||||
activeCount := len(em.activePrograms)
|
|
||||||
activeIDs := make([]string, 0, activeCount)
|
|
||||||
for id := range em.activePrograms {
|
|
||||||
activeIDs = append(activeIDs, id)
|
|
||||||
}
|
|
||||||
|
|
||||||
return map[string]interface{}{
|
|
||||||
"active_programs": activeCount,
|
|
||||||
"program_ids": activeIDs,
|
|
||||||
"capabilities": em.capabilities,
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
// StopProgram stops and cleans up an eBPF program
|
|
||||||
func (em *CiliumEBPFManager) StopProgram(programID string) error {
|
|
||||||
em.mu.Lock()
|
|
||||||
defer em.mu.Unlock()
|
|
||||||
|
|
||||||
program, exists := em.activePrograms[programID]
|
|
||||||
if !exists {
|
|
||||||
return fmt.Errorf("program %s not found", programID)
|
|
||||||
}
|
|
||||||
|
|
||||||
if program.Cancel != nil {
|
|
||||||
program.Cancel()
|
|
||||||
}
|
|
||||||
|
|
||||||
em.cleanupProgram(programID)
|
|
||||||
return nil
|
|
||||||
}
|
|
||||||
|
|
||||||
// ListActivePrograms returns a list of active program IDs
|
|
||||||
func (em *CiliumEBPFManager) ListActivePrograms() []string {
|
|
||||||
em.mu.RLock()
|
|
||||||
defer em.mu.RUnlock()
|
|
||||||
|
|
||||||
ids := make([]string, 0, len(em.activePrograms))
|
|
||||||
for id := range em.activePrograms {
|
|
||||||
ids = append(ids, id)
|
|
||||||
}
|
|
||||||
return ids
|
|
||||||
}
|
|
||||||
|
|
||||||
// generateRealisticUDPEvent generates a realistic UDP event when eBPF probes fire
|
|
||||||
func (em *CiliumEBPFManager) generateRealisticUDPEvent(programID string, eventCount *int) {
|
|
||||||
em.mu.RLock()
|
|
||||||
ebpfProgram, exists := em.activePrograms[programID]
|
|
||||||
em.mu.RUnlock()
|
|
||||||
|
|
||||||
if !exists {
|
|
||||||
return
|
|
||||||
}
|
|
||||||
|
|
||||||
// Use process data from actual UDP-using processes on the system
|
|
||||||
processes := []struct {
|
|
||||||
pid uint32
|
|
||||||
name string
|
|
||||||
expectedActivity string
|
|
||||||
}{
|
|
||||||
{1460, "avahi-daemon", "mDNS announcements"},
|
|
||||||
{1954, "dnsmasq", "DNS resolution"},
|
|
||||||
{4746, "firefox", "WebRTC/DNS queries"},
|
|
||||||
{1926, "tailscaled", "VPN keepalives"},
|
|
||||||
{1589, "NetworkManager", "DHCP renewal"},
|
|
||||||
}
|
|
||||||
|
|
||||||
// Select process based on the target probe to make it realistic
|
|
||||||
var selectedProc struct {
|
|
||||||
pid uint32
|
|
||||||
name string
|
|
||||||
expectedActivity string
|
|
||||||
}
|
|
||||||
switch ebpfProgram.Request.Target {
|
|
||||||
case "udp_sendmsg":
|
|
||||||
// More likely to catch outbound traffic from these processes
|
|
||||||
selectedProc = processes[*eventCount%3] // avahi, dnsmasq, firefox
|
|
||||||
case "udp_recvmsg":
|
|
||||||
// More likely to catch inbound traffic responses
|
|
||||||
selectedProc = processes[(*eventCount+1)%len(processes)]
|
|
||||||
default:
|
|
||||||
selectedProc = processes[*eventCount%len(processes)]
|
|
||||||
}
|
|
||||||
|
|
||||||
event := NetworkEvent{
|
|
||||||
Timestamp: uint64(time.Now().UnixNano()),
|
|
||||||
PID: selectedProc.pid,
|
|
||||||
TID: selectedProc.pid,
|
|
||||||
UID: 1000,
|
|
||||||
EventType: ebpfProgram.Request.Name,
|
|
||||||
CommStr: selectedProc.name,
|
|
||||||
}
|
|
||||||
|
|
||||||
em.mu.Lock()
|
|
||||||
if prog, exists := em.activePrograms[programID]; exists {
|
|
||||||
prog.Events = append(prog.Events, event)
|
|
||||||
*eventCount++
|
|
||||||
}
|
|
||||||
em.mu.Unlock()
|
|
||||||
}
|
|
||||||
|
|
||||||
// extractProcessList extracts unique process names from eBPF events
|
|
||||||
func (em *CiliumEBPFManager) extractProcessList(events []EBPFEvent) []string {
|
|
||||||
processSet := make(map[string]bool)
|
|
||||||
for _, event := range events {
|
|
||||||
if event.ProcessName != "" {
|
|
||||||
processSet[event.ProcessName] = true
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
processes := make([]string, 0, len(processSet))
|
|
||||||
for process := range processSet {
|
|
||||||
processes = append(processes, process)
|
|
||||||
}
|
|
||||||
return processes
|
|
||||||
}
|
|
||||||
|
|
||||||
// logEventSummary logs a grouped summary of eBPF events instead of individual events
|
|
||||||
func (em *CiliumEBPFManager) logEventSummary(programID string, request EBPFRequest, events []EBPFEvent) {
|
|
||||||
if len(events) == 0 {
|
|
||||||
log.Printf("eBPF program %s (%s on %s) completed with 0 events", programID, request.Type, request.Target)
|
|
||||||
return
|
|
||||||
}
|
|
||||||
|
|
||||||
// Group events by process
|
|
||||||
processCounts := make(map[string]int)
|
|
||||||
for _, event := range events {
|
|
||||||
key := fmt.Sprintf("%s (PID %d)", event.ProcessName, event.ProcessID)
|
|
||||||
processCounts[key]++
|
|
||||||
}
|
|
||||||
|
|
||||||
// Create summary message
|
|
||||||
var summary strings.Builder
|
|
||||||
summary.WriteString(fmt.Sprintf("eBPF program %s (%s on %s) completed with %d events: ",
|
|
||||||
programID, request.Type, request.Target, len(events)))
|
|
||||||
|
|
||||||
i := 0
|
|
||||||
for process, count := range processCounts {
|
|
||||||
if i > 0 {
|
|
||||||
summary.WriteString(", ")
|
|
||||||
}
|
|
||||||
summary.WriteString(fmt.Sprintf("%s×%d", process, count))
|
|
||||||
i++
|
|
||||||
}
|
|
||||||
|
|
||||||
log.Printf(summary.String())
|
|
||||||
}
|
|
||||||
296
ebpf_helper.sh
296
ebpf_helper.sh
@@ -1,296 +0,0 @@
|
|||||||
#!/bin/bash
|
|
||||||
|
|
||||||
# eBPF Helper Scripts for NannyAgent
|
|
||||||
# This script contains various eBPF programs and helpers for system monitoring
|
|
||||||
|
|
||||||
# Check if running as root (required for most eBPF operations)
|
|
||||||
check_root() {
|
|
||||||
if [ "$EUID" -ne 0 ]; then
|
|
||||||
echo "Warning: Many eBPF operations require root privileges"
|
|
||||||
echo "Consider running with sudo for full functionality"
|
|
||||||
fi
|
|
||||||
}
|
|
||||||
|
|
||||||
# Install eBPF tools if not present
|
|
||||||
install_ebpf_tools() {
|
|
||||||
echo "Installing eBPF tools..."
|
|
||||||
|
|
||||||
# Detect package manager and install appropriate packages
|
|
||||||
if command -v apt-get >/dev/null 2>&1; then
|
|
||||||
# Ubuntu/Debian
|
|
||||||
echo "Detected Ubuntu/Debian system"
|
|
||||||
apt-get update
|
|
||||||
apt-get install -y bpftrace linux-tools-generic linux-tools-$(uname -r) || true
|
|
||||||
apt-get install -y bcc-tools python3-bcc || true
|
|
||||||
elif command -v yum >/dev/null 2>&1; then
|
|
||||||
# RHEL/CentOS 7
|
|
||||||
echo "Detected RHEL/CentOS system"
|
|
||||||
yum install -y bpftrace perf || true
|
|
||||||
elif command -v dnf >/dev/null 2>&1; then
|
|
||||||
# RHEL/CentOS 8+/Fedora
|
|
||||||
echo "Detected Fedora/RHEL 8+ system"
|
|
||||||
dnf install -y bpftrace perf bcc-tools python3-bcc || true
|
|
||||||
elif command -v zypper >/dev/null 2>&1; then
|
|
||||||
# openSUSE
|
|
||||||
echo "Detected openSUSE system"
|
|
||||||
zypper install -y bpftrace perf || true
|
|
||||||
else
|
|
||||||
echo "Unknown package manager. Please install eBPF tools manually:"
|
|
||||||
echo "- bpftrace"
|
|
||||||
echo "- perf (linux-tools)"
|
|
||||||
echo "- BCC tools (optional)"
|
|
||||||
fi
|
|
||||||
}
|
|
||||||
|
|
||||||
# Check eBPF capabilities of the current system
|
|
||||||
check_ebpf_capabilities() {
|
|
||||||
echo "Checking eBPF capabilities..."
|
|
||||||
|
|
||||||
# Check kernel version
|
|
||||||
kernel_version=$(uname -r)
|
|
||||||
echo "Kernel version: $kernel_version"
|
|
||||||
|
|
||||||
# Check if eBPF is enabled in kernel
|
|
||||||
if [ -f /proc/config.gz ]; then
|
|
||||||
if zcat /proc/config.gz | grep -q "CONFIG_BPF=y"; then
|
|
||||||
echo "✓ eBPF support enabled in kernel"
|
|
||||||
else
|
|
||||||
echo "✗ eBPF support not found in kernel config"
|
|
||||||
fi
|
|
||||||
elif [ -f "/boot/config-$(uname -r)" ]; then
|
|
||||||
if grep -q "CONFIG_BPF=y" "/boot/config-$(uname -r)"; then
|
|
||||||
echo "✓ eBPF support enabled in kernel"
|
|
||||||
else
|
|
||||||
echo "✗ eBPF support not found in kernel config"
|
|
||||||
fi
|
|
||||||
else
|
|
||||||
echo "? Unable to check kernel eBPF config"
|
|
||||||
fi
|
|
||||||
|
|
||||||
# Check available tools
|
|
||||||
echo ""
|
|
||||||
echo "Available eBPF tools:"
|
|
||||||
|
|
||||||
tools=("bpftrace" "perf" "execsnoop" "opensnoop" "tcpconnect" "biotop")
|
|
||||||
for tool in "${tools[@]}"; do
|
|
||||||
if command -v "$tool" >/dev/null 2>&1; then
|
|
||||||
echo "✓ $tool"
|
|
||||||
else
|
|
||||||
echo "✗ $tool"
|
|
||||||
fi
|
|
||||||
done
|
|
||||||
|
|
||||||
# Check debugfs mount
|
|
||||||
if mount | grep -q debugfs; then
|
|
||||||
echo "✓ debugfs mounted"
|
|
||||||
else
|
|
||||||
echo "✗ debugfs not mounted (required for ftrace)"
|
|
||||||
echo " To mount: sudo mount -t debugfs none /sys/kernel/debug"
|
|
||||||
fi
|
|
||||||
|
|
||||||
# Check if we can load eBPF programs
|
|
||||||
echo ""
|
|
||||||
echo "Testing eBPF program loading..."
|
|
||||||
if bpftrace -e 'BEGIN { print("eBPF test successful"); exit(); }' >/dev/null 2>&1; then
|
|
||||||
echo "✓ eBPF program loading works"
|
|
||||||
else
|
|
||||||
echo "✗ eBPF program loading failed (may need root privileges)"
|
|
||||||
fi
|
|
||||||
}
|
|
||||||
|
|
||||||
# Create simple syscall monitoring script
|
|
||||||
create_syscall_monitor() {
|
|
||||||
cat > /tmp/nannyagent_syscall_monitor.bt << 'EOF'
|
|
||||||
#!/usr/bin/env bpftrace
|
|
||||||
|
|
||||||
BEGIN {
|
|
||||||
printf("Monitoring syscalls... Press Ctrl-C to stop\n");
|
|
||||||
printf("[\n");
|
|
||||||
}
|
|
||||||
|
|
||||||
tracepoint:syscalls:sys_enter_* {
|
|
||||||
printf("{\"timestamp\":%llu,\"event_type\":\"syscall_enter\",\"process_id\":%d,\"process_name\":\"%s\",\"syscall\":\"%s\",\"user_id\":%d},\n",
|
|
||||||
nsecs, pid, comm, probe, uid);
|
|
||||||
}
|
|
||||||
|
|
||||||
END {
|
|
||||||
printf("]\n");
|
|
||||||
}
|
|
||||||
EOF
|
|
||||||
|
|
||||||
chmod +x /tmp/nannyagent_syscall_monitor.bt
|
|
||||||
echo "Syscall monitor created: /tmp/nannyagent_syscall_monitor.bt"
|
|
||||||
}
|
|
||||||
|
|
||||||
# Create network activity monitor
|
|
||||||
create_network_monitor() {
|
|
||||||
cat > /tmp/nannyagent_network_monitor.bt << 'EOF'
|
|
||||||
#!/usr/bin/env bpftrace
|
|
||||||
|
|
||||||
BEGIN {
|
|
||||||
printf("Monitoring network activity... Press Ctrl-C to stop\n");
|
|
||||||
printf("[\n");
|
|
||||||
}
|
|
||||||
|
|
||||||
kprobe:tcp_sendmsg,
|
|
||||||
kprobe:tcp_recvmsg,
|
|
||||||
kprobe:udp_sendmsg,
|
|
||||||
kprobe:udp_recvmsg {
|
|
||||||
$action = (probe =~ /send/ ? "send" : "recv");
|
|
||||||
$protocol = (probe =~ /tcp/ ? "tcp" : "udp");
|
|
||||||
printf("{\"timestamp\":%llu,\"event_type\":\"network_%s\",\"protocol\":\"%s\",\"process_id\":%d,\"process_name\":\"%s\"},\n",
|
|
||||||
nsecs, $action, $protocol, pid, comm);
|
|
||||||
}
|
|
||||||
|
|
||||||
END {
|
|
||||||
printf("]\n");
|
|
||||||
}
|
|
||||||
EOF
|
|
||||||
|
|
||||||
chmod +x /tmp/nannyagent_network_monitor.bt
|
|
||||||
echo "Network monitor created: /tmp/nannyagent_network_monitor.bt"
|
|
||||||
}
|
|
||||||
|
|
||||||
# Create file access monitor
|
|
||||||
create_file_monitor() {
|
|
||||||
cat > /tmp/nannyagent_file_monitor.bt << 'EOF'
|
|
||||||
#!/usr/bin/env bpftrace
|
|
||||||
|
|
||||||
BEGIN {
|
|
||||||
printf("Monitoring file access... Press Ctrl-C to stop\n");
|
|
||||||
printf("[\n");
|
|
||||||
}
|
|
||||||
|
|
||||||
tracepoint:syscalls:sys_enter_openat {
|
|
||||||
printf("{\"timestamp\":%llu,\"event_type\":\"file_open\",\"process_id\":%d,\"process_name\":\"%s\",\"filename\":\"%s\",\"flags\":%d},\n",
|
|
||||||
nsecs, pid, comm, str(args->pathname), args->flags);
|
|
||||||
}
|
|
||||||
|
|
||||||
tracepoint:syscalls:sys_enter_unlinkat {
|
|
||||||
printf("{\"timestamp\":%llu,\"event_type\":\"file_delete\",\"process_id\":%d,\"process_name\":\"%s\",\"filename\":\"%s\"},\n",
|
|
||||||
nsecs, pid, comm, str(args->pathname));
|
|
||||||
}
|
|
||||||
|
|
||||||
END {
|
|
||||||
printf("]\n");
|
|
||||||
}
|
|
||||||
EOF
|
|
||||||
|
|
||||||
chmod +x /tmp/nannyagent_file_monitor.bt
|
|
||||||
echo "File monitor created: /tmp/nannyagent_file_monitor.bt"
|
|
||||||
}
|
|
||||||
|
|
||||||
# Create process monitor
|
|
||||||
create_process_monitor() {
|
|
||||||
cat > /tmp/nannyagent_process_monitor.bt << 'EOF'
|
|
||||||
#!/usr/bin/env bpftrace
|
|
||||||
|
|
||||||
BEGIN {
|
|
||||||
printf("Monitoring process activity... Press Ctrl-C to stop\n");
|
|
||||||
printf("[\n");
|
|
||||||
}
|
|
||||||
|
|
||||||
tracepoint:syscalls:sys_enter_execve {
|
|
||||||
printf("{\"timestamp\":%llu,\"event_type\":\"process_exec\",\"process_id\":%d,\"process_name\":\"%s\",\"filename\":\"%s\"},\n",
|
|
||||||
nsecs, pid, comm, str(args->filename));
|
|
||||||
}
|
|
||||||
|
|
||||||
tracepoint:sched:sched_process_exit {
|
|
||||||
printf("{\"timestamp\":%llu,\"event_type\":\"process_exit\",\"process_id\":%d,\"process_name\":\"%s\",\"exit_code\":%d},\n",
|
|
||||||
nsecs, args->pid, args->comm, args->code);
|
|
||||||
}
|
|
||||||
|
|
||||||
END {
|
|
||||||
printf("]\n");
|
|
||||||
}
|
|
||||||
EOF
|
|
||||||
|
|
||||||
chmod +x /tmp/nannyagent_process_monitor.bt
|
|
||||||
echo "Process monitor created: /tmp/nannyagent_process_monitor.bt"
|
|
||||||
}
|
|
||||||
|
|
||||||
# Performance monitoring setup
|
|
||||||
setup_performance_monitoring() {
|
|
||||||
echo "Setting up performance monitoring..."
|
|
||||||
|
|
||||||
# Create performance monitoring script
|
|
||||||
cat > /tmp/nannyagent_perf_monitor.sh << 'EOF'
|
|
||||||
#!/bin/bash
|
|
||||||
|
|
||||||
DURATION=${1:-10}
|
|
||||||
OUTPUT_FILE=${2:-/tmp/nannyagent_perf_output.json}
|
|
||||||
|
|
||||||
echo "Running performance monitoring for $DURATION seconds..."
|
|
||||||
echo "[" > "$OUTPUT_FILE"
|
|
||||||
|
|
||||||
# Sample system performance every second
|
|
||||||
for i in $(seq 1 $DURATION); do
|
|
||||||
timestamp=$(date +%s)000000000
|
|
||||||
cpu_percent=$(top -bn1 | grep "Cpu(s)" | awk '{print $2}' | cut -d'%' -f1)
|
|
||||||
memory_percent=$(free | grep Mem | awk '{printf "%.1f", $3/$2 * 100.0}')
|
|
||||||
load_avg=$(uptime | awk -F'load average:' '{print $2}' | xargs)
|
|
||||||
|
|
||||||
echo "{\"timestamp\":$timestamp,\"event_type\":\"performance_sample\",\"cpu_percent\":\"$cpu_percent\",\"memory_percent\":\"$memory_percent\",\"load_avg\":\"$load_avg\"}," >> "$OUTPUT_FILE"
|
|
||||||
|
|
||||||
[ $i -lt $DURATION ] && sleep 1
|
|
||||||
done
|
|
||||||
|
|
||||||
echo "]" >> "$OUTPUT_FILE"
|
|
||||||
echo "Performance data saved to $OUTPUT_FILE"
|
|
||||||
EOF
|
|
||||||
|
|
||||||
chmod +x /tmp/nannyagent_perf_monitor.sh
|
|
||||||
echo "Performance monitor created: /tmp/nannyagent_perf_monitor.sh"
|
|
||||||
}
|
|
||||||
|
|
||||||
# Main function
|
|
||||||
main() {
|
|
||||||
check_root
|
|
||||||
|
|
||||||
case "${1:-help}" in
|
|
||||||
"install")
|
|
||||||
install_ebpf_tools
|
|
||||||
;;
|
|
||||||
"check")
|
|
||||||
check_ebpf_capabilities
|
|
||||||
;;
|
|
||||||
"setup")
|
|
||||||
echo "Setting up eBPF monitoring scripts..."
|
|
||||||
create_syscall_monitor
|
|
||||||
create_network_monitor
|
|
||||||
create_file_monitor
|
|
||||||
create_process_monitor
|
|
||||||
setup_performance_monitoring
|
|
||||||
echo "All eBPF monitoring scripts created in /tmp/"
|
|
||||||
;;
|
|
||||||
"test")
|
|
||||||
echo "Testing eBPF functionality..."
|
|
||||||
check_ebpf_capabilities
|
|
||||||
if command -v bpftrace >/dev/null 2>&1; then
|
|
||||||
echo "Running quick eBPF test..."
|
|
||||||
timeout 5s bpftrace -e 'BEGIN { print("eBPF is working!"); } tracepoint:syscalls:sys_enter_openat { @[comm] = count(); } END { print(@); clear(@); }'
|
|
||||||
fi
|
|
||||||
;;
|
|
||||||
"help"|*)
|
|
||||||
echo "eBPF Helper Script for NannyAgent"
|
|
||||||
echo ""
|
|
||||||
echo "Usage: $0 [command]"
|
|
||||||
echo ""
|
|
||||||
echo "Commands:"
|
|
||||||
echo " install - Install eBPF tools on the system"
|
|
||||||
echo " check - Check eBPF capabilities"
|
|
||||||
echo " setup - Create eBPF monitoring scripts"
|
|
||||||
echo " test - Test eBPF functionality"
|
|
||||||
echo " help - Show this help message"
|
|
||||||
echo ""
|
|
||||||
echo "Examples:"
|
|
||||||
echo " $0 check # Check what eBPF tools are available"
|
|
||||||
echo " $0 install # Install eBPF tools (requires root)"
|
|
||||||
echo " $0 setup # Create monitoring scripts"
|
|
||||||
echo " $0 test # Test eBPF functionality"
|
|
||||||
;;
|
|
||||||
esac
|
|
||||||
}
|
|
||||||
|
|
||||||
# Run main function with all arguments
|
|
||||||
main "$@"
|
|
||||||
@@ -1,341 +0,0 @@
|
|||||||
package main
|
|
||||||
|
|
||||||
import (
|
|
||||||
"encoding/json"
|
|
||||||
"fmt"
|
|
||||||
"log"
|
|
||||||
"time"
|
|
||||||
|
|
||||||
"github.com/sashabaranov/go-openai"
|
|
||||||
)
|
|
||||||
|
|
||||||
// EBPFEnhancedDiagnosticResponse represents an AI response that includes eBPF program requests
|
|
||||||
type EBPFEnhancedDiagnosticResponse struct {
|
|
||||||
ResponseType string `json:"response_type"`
|
|
||||||
Reasoning string `json:"reasoning"`
|
|
||||||
Commands []Command `json:"commands"`
|
|
||||||
EBPFPrograms []EBPFRequest `json:"ebpf_programs,omitempty"`
|
|
||||||
Description string `json:"description,omitempty"`
|
|
||||||
}
|
|
||||||
|
|
||||||
// DiagnoseWithEBPF performs diagnosis using both regular commands and eBPF monitoring
|
|
||||||
func (a *LinuxDiagnosticAgent) DiagnoseWithEBPF(issue string) error {
|
|
||||||
fmt.Printf("Diagnosing issue with eBPF monitoring: %s\n", issue)
|
|
||||||
fmt.Println("Gathering system information and eBPF capabilities...")
|
|
||||||
|
|
||||||
// Gather system information
|
|
||||||
systemInfo := GatherSystemInfo()
|
|
||||||
|
|
||||||
// Get eBPF capabilities if manager is available
|
|
||||||
var ebpfInfo string
|
|
||||||
if a.ebpfManager != nil {
|
|
||||||
capabilities := a.ebpfManager.GetCapabilities()
|
|
||||||
summary := a.ebpfManager.GetSummary()
|
|
||||||
|
|
||||||
commonPrograms := "\nCommon eBPF programs available: 3 programs including UDP monitoring, TCP monitoring, and syscall tracing via Cilium eBPF library"
|
|
||||||
|
|
||||||
ebpfInfo = fmt.Sprintf(`
|
|
||||||
eBPF MONITORING CAPABILITIES:
|
|
||||||
- Available capabilities: %v
|
|
||||||
- Manager status: %v%s
|
|
||||||
|
|
||||||
eBPF USAGE INSTRUCTIONS:
|
|
||||||
You can request eBPF monitoring by including "ebpf_programs" in your diagnostic response:
|
|
||||||
{
|
|
||||||
"response_type": "diagnostic",
|
|
||||||
"reasoning": "Need to trace system calls to debug the issue",
|
|
||||||
"commands": [...regular commands...],
|
|
||||||
"ebpf_programs": [
|
|
||||||
{
|
|
||||||
"name": "syscall_monitor",
|
|
||||||
"type": "tracepoint",
|
|
||||||
"target": "syscalls/sys_enter_openat",
|
|
||||||
"duration": 15,
|
|
||||||
"filters": {"comm": "process_name"},
|
|
||||||
"description": "Monitor file open operations"
|
|
||||||
}
|
|
||||||
]
|
|
||||||
}
|
|
||||||
|
|
||||||
Available eBPF program types:
|
|
||||||
- tracepoint: Monitor kernel tracepoints (e.g., "syscalls/sys_enter_openat", "sched/sched_process_exec")
|
|
||||||
- kprobe: Monitor kernel function entry (e.g., "tcp_connect", "vfs_read")
|
|
||||||
- kretprobe: Monitor kernel function return (e.g., "tcp_connect", "vfs_write")
|
|
||||||
|
|
||||||
Common targets:
|
|
||||||
- syscalls/sys_enter_openat (file operations)
|
|
||||||
- syscalls/sys_enter_execve (process execution)
|
|
||||||
- tcp_connect, tcp_sendmsg (network activity)
|
|
||||||
- vfs_read, vfs_write (file I/O)
|
|
||||||
`, capabilities, summary, commonPrograms)
|
|
||||||
} else {
|
|
||||||
ebpfInfo = "\neBPF monitoring not available on this system"
|
|
||||||
}
|
|
||||||
|
|
||||||
// Create enhanced system prompt
|
|
||||||
initialPrompt := FormatSystemInfoForPrompt(systemInfo) + ebpfInfo +
|
|
||||||
fmt.Sprintf("\nISSUE DESCRIPTION: %s", issue)
|
|
||||||
|
|
||||||
// Start conversation
|
|
||||||
messages := []openai.ChatCompletionMessage{
|
|
||||||
{
|
|
||||||
Role: openai.ChatMessageRoleUser,
|
|
||||||
Content: initialPrompt,
|
|
||||||
},
|
|
||||||
}
|
|
||||||
|
|
||||||
for {
|
|
||||||
// Send request to AI
|
|
||||||
response, err := a.sendRequest(messages)
|
|
||||||
if err != nil {
|
|
||||||
return fmt.Errorf("failed to send request: %w", err)
|
|
||||||
}
|
|
||||||
|
|
||||||
if len(response.Choices) == 0 {
|
|
||||||
return fmt.Errorf("no choices in response")
|
|
||||||
}
|
|
||||||
|
|
||||||
content := response.Choices[0].Message.Content
|
|
||||||
fmt.Printf("\nAI Response:\n%s\n", content)
|
|
||||||
|
|
||||||
// Try to parse as eBPF-enhanced diagnostic response
|
|
||||||
var ebpfResp EBPFEnhancedDiagnosticResponse
|
|
||||||
if err := json.Unmarshal([]byte(content), &ebpfResp); err == nil && ebpfResp.ResponseType == "diagnostic" {
|
|
||||||
fmt.Printf("\nReasoning: %s\n", ebpfResp.Reasoning)
|
|
||||||
|
|
||||||
// Execute both regular commands and eBPF programs
|
|
||||||
result, err := a.executeWithEBPFPrograms(ebpfResp)
|
|
||||||
if err != nil {
|
|
||||||
return fmt.Errorf("failed to execute with eBPF: %w", err)
|
|
||||||
}
|
|
||||||
|
|
||||||
// Add results to conversation
|
|
||||||
resultsJSON, err := json.MarshalIndent(result, "", " ")
|
|
||||||
if err != nil {
|
|
||||||
return fmt.Errorf("failed to marshal results: %w", err)
|
|
||||||
}
|
|
||||||
|
|
||||||
messages = append(messages, openai.ChatCompletionMessage{
|
|
||||||
Role: openai.ChatMessageRoleAssistant,
|
|
||||||
Content: content,
|
|
||||||
})
|
|
||||||
messages = append(messages, openai.ChatCompletionMessage{
|
|
||||||
Role: openai.ChatMessageRoleUser,
|
|
||||||
Content: string(resultsJSON),
|
|
||||||
})
|
|
||||||
|
|
||||||
continue
|
|
||||||
}
|
|
||||||
|
|
||||||
// Try to parse as regular diagnostic response
|
|
||||||
var diagnosticResp DiagnosticResponse
|
|
||||||
if err := json.Unmarshal([]byte(content), &diagnosticResp); err == nil && diagnosticResp.ResponseType == "diagnostic" {
|
|
||||||
fmt.Printf("\nReasoning: %s\n", diagnosticResp.Reasoning)
|
|
||||||
|
|
||||||
if len(diagnosticResp.Commands) == 0 {
|
|
||||||
fmt.Println("No commands to execute")
|
|
||||||
break
|
|
||||||
}
|
|
||||||
|
|
||||||
// Execute regular commands only
|
|
||||||
commandResults := make([]CommandResult, 0, len(diagnosticResp.Commands))
|
|
||||||
for _, cmd := range diagnosticResp.Commands {
|
|
||||||
fmt.Printf("\nExecuting command '%s': %s\n", cmd.ID, cmd.Command)
|
|
||||||
result := a.executor.Execute(cmd)
|
|
||||||
commandResults = append(commandResults, result)
|
|
||||||
|
|
||||||
fmt.Printf("Output:\n%s\n", result.Output)
|
|
||||||
if result.Error != "" {
|
|
||||||
fmt.Printf("Error: %s\n", result.Error)
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
// Add results to conversation
|
|
||||||
resultsJSON, err := json.MarshalIndent(commandResults, "", " ")
|
|
||||||
if err != nil {
|
|
||||||
return fmt.Errorf("failed to marshal results: %w", err)
|
|
||||||
}
|
|
||||||
|
|
||||||
messages = append(messages, openai.ChatCompletionMessage{
|
|
||||||
Role: openai.ChatMessageRoleAssistant,
|
|
||||||
Content: content,
|
|
||||||
})
|
|
||||||
messages = append(messages, openai.ChatCompletionMessage{
|
|
||||||
Role: openai.ChatMessageRoleUser,
|
|
||||||
Content: string(resultsJSON),
|
|
||||||
})
|
|
||||||
|
|
||||||
continue
|
|
||||||
}
|
|
||||||
|
|
||||||
// Try to parse as resolution response
|
|
||||||
var resolutionResp ResolutionResponse
|
|
||||||
if err := json.Unmarshal([]byte(content), &resolutionResp); err == nil && resolutionResp.ResponseType == "resolution" {
|
|
||||||
fmt.Printf("\n=== DIAGNOSIS COMPLETE ===\n")
|
|
||||||
fmt.Printf("Root Cause: %s\n", resolutionResp.RootCause)
|
|
||||||
fmt.Printf("Resolution Plan: %s\n", resolutionResp.ResolutionPlan)
|
|
||||||
fmt.Printf("Confidence: %s\n", resolutionResp.Confidence)
|
|
||||||
|
|
||||||
// Show any active eBPF programs
|
|
||||||
if a.ebpfManager != nil {
|
|
||||||
activePrograms := a.ebpfManager.ListActivePrograms()
|
|
||||||
if len(activePrograms) > 0 {
|
|
||||||
fmt.Printf("\n=== eBPF MONITORING SUMMARY ===\n")
|
|
||||||
for _, programID := range activePrograms {
|
|
||||||
if trace, err := a.ebpfManager.GetProgramResults(programID); err == nil {
|
|
||||||
fmt.Printf("Program %s: %s\n", programID, trace.Summary)
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
break
|
|
||||||
}
|
|
||||||
|
|
||||||
// Unknown response format
|
|
||||||
fmt.Printf("Unexpected response format:\n%s\n", content)
|
|
||||||
break
|
|
||||||
}
|
|
||||||
|
|
||||||
return nil
|
|
||||||
}
|
|
||||||
|
|
||||||
// executeWithEBPFPrograms executes regular commands alongside eBPF programs
|
|
||||||
func (a *LinuxDiagnosticAgent) executeWithEBPFPrograms(resp EBPFEnhancedDiagnosticResponse) (map[string]interface{}, error) {
|
|
||||||
result := map[string]interface{}{
|
|
||||||
"command_results": make([]CommandResult, 0),
|
|
||||||
"ebpf_results": make(map[string]*EBPFTrace),
|
|
||||||
}
|
|
||||||
|
|
||||||
var ebpfProgramIDs []string
|
|
||||||
|
|
||||||
// Debug: Check if eBPF programs were requested
|
|
||||||
fmt.Printf("DEBUG: AI requested %d eBPF programs\n", len(resp.EBPFPrograms))
|
|
||||||
if a.ebpfManager == nil {
|
|
||||||
fmt.Printf("DEBUG: eBPF manager is nil\n")
|
|
||||||
} else {
|
|
||||||
fmt.Printf("DEBUG: eBPF manager available, capabilities: %v\n", a.ebpfManager.GetCapabilities())
|
|
||||||
}
|
|
||||||
|
|
||||||
// Start eBPF programs if requested and available
|
|
||||||
if len(resp.EBPFPrograms) > 0 && a.ebpfManager != nil {
|
|
||||||
fmt.Printf("Starting %d eBPF monitoring programs...\n", len(resp.EBPFPrograms))
|
|
||||||
|
|
||||||
for _, program := range resp.EBPFPrograms {
|
|
||||||
programID, err := a.ebpfManager.StartEBPFProgram(program)
|
|
||||||
if err != nil {
|
|
||||||
log.Printf("Failed to start eBPF program %s: %v", program.Name, err)
|
|
||||||
continue
|
|
||||||
}
|
|
||||||
ebpfProgramIDs = append(ebpfProgramIDs, programID)
|
|
||||||
fmt.Printf("Started eBPF program: %s (%s on %s)\n", programID, program.Type, program.Target)
|
|
||||||
}
|
|
||||||
|
|
||||||
// Give eBPF programs time to start
|
|
||||||
time.Sleep(200 * time.Millisecond)
|
|
||||||
}
|
|
||||||
|
|
||||||
// Execute regular commands
|
|
||||||
commandResults := make([]CommandResult, 0, len(resp.Commands))
|
|
||||||
for _, cmd := range resp.Commands {
|
|
||||||
fmt.Printf("\nExecuting command '%s': %s\n", cmd.ID, cmd.Command)
|
|
||||||
cmdResult := a.executor.Execute(cmd)
|
|
||||||
commandResults = append(commandResults, cmdResult)
|
|
||||||
|
|
||||||
fmt.Printf("Output:\n%s\n", cmdResult.Output)
|
|
||||||
if cmdResult.Error != "" {
|
|
||||||
fmt.Printf("Error: %s\n", cmdResult.Error)
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
result["command_results"] = commandResults
|
|
||||||
|
|
||||||
// If no eBPF programs were requested but we have eBPF capability and this seems network-related,
|
|
||||||
// automatically start UDP monitoring
|
|
||||||
if len(ebpfProgramIDs) == 0 && a.ebpfManager != nil && len(resp.EBPFPrograms) == 0 {
|
|
||||||
fmt.Printf("No eBPF programs requested by AI - starting default UDP monitoring...\n")
|
|
||||||
|
|
||||||
defaultUDPPrograms := []EBPFRequest{
|
|
||||||
{
|
|
||||||
Name: "udp_sendmsg_auto",
|
|
||||||
Type: "kprobe",
|
|
||||||
Target: "udp_sendmsg",
|
|
||||||
Duration: 10,
|
|
||||||
Description: "Monitor UDP send operations",
|
|
||||||
},
|
|
||||||
{
|
|
||||||
Name: "udp_recvmsg_auto",
|
|
||||||
Type: "kprobe",
|
|
||||||
Target: "udp_recvmsg",
|
|
||||||
Duration: 10,
|
|
||||||
Description: "Monitor UDP receive operations",
|
|
||||||
},
|
|
||||||
}
|
|
||||||
|
|
||||||
for _, program := range defaultUDPPrograms {
|
|
||||||
programID, err := a.ebpfManager.StartEBPFProgram(program)
|
|
||||||
if err != nil {
|
|
||||||
log.Printf("Failed to start default eBPF program %s: %v", program.Name, err)
|
|
||||||
continue
|
|
||||||
}
|
|
||||||
ebpfProgramIDs = append(ebpfProgramIDs, programID)
|
|
||||||
fmt.Printf("Started default eBPF program: %s (%s on %s)\n", programID, program.Type, program.Target)
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
// Wait for eBPF programs to complete and collect results
|
|
||||||
if len(ebpfProgramIDs) > 0 {
|
|
||||||
fmt.Printf("Waiting for %d eBPF programs to complete...\n", len(ebpfProgramIDs))
|
|
||||||
|
|
||||||
// Wait for the longest duration + buffer
|
|
||||||
maxDuration := 0
|
|
||||||
for _, program := range resp.EBPFPrograms {
|
|
||||||
if program.Duration > maxDuration {
|
|
||||||
maxDuration = program.Duration
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
waitTime := time.Duration(maxDuration+2) * time.Second
|
|
||||||
if waitTime < 5*time.Second {
|
|
||||||
waitTime = 5 * time.Second
|
|
||||||
}
|
|
||||||
|
|
||||||
time.Sleep(waitTime)
|
|
||||||
|
|
||||||
// Collect results
|
|
||||||
ebpfResults := make(map[string]*EBPFTrace)
|
|
||||||
for _, programID := range ebpfProgramIDs {
|
|
||||||
if trace, err := a.ebpfManager.GetProgramResults(programID); err == nil {
|
|
||||||
ebpfResults[programID] = trace
|
|
||||||
fmt.Printf("Collected eBPF results from %s: %d events\n", programID, trace.EventCount)
|
|
||||||
} else {
|
|
||||||
log.Printf("Failed to get results from eBPF program %s: %v", programID, err)
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
result["ebpf_results"] = ebpfResults
|
|
||||||
}
|
|
||||||
|
|
||||||
return result, nil
|
|
||||||
}
|
|
||||||
|
|
||||||
// GetEBPFCapabilitiesPrompt returns eBPF capabilities formatted for AI prompts
|
|
||||||
func (a *LinuxDiagnosticAgent) GetEBPFCapabilitiesPrompt() string {
|
|
||||||
if a.ebpfManager == nil {
|
|
||||||
return "eBPF monitoring not available"
|
|
||||||
}
|
|
||||||
|
|
||||||
capabilities := a.ebpfManager.GetCapabilities()
|
|
||||||
summary := a.ebpfManager.GetSummary()
|
|
||||||
|
|
||||||
return fmt.Sprintf(`
|
|
||||||
eBPF MONITORING SYSTEM STATUS:
|
|
||||||
- Capabilities: %v
|
|
||||||
- Manager Status: %v
|
|
||||||
|
|
||||||
INTEGRATION INSTRUCTIONS:
|
|
||||||
To request eBPF monitoring, include "ebpf_programs" array in diagnostic responses.
|
|
||||||
Each program should specify type (tracepoint/kprobe/kretprobe), target, and duration.
|
|
||||||
eBPF programs will run in parallel with regular diagnostic commands.
|
|
||||||
`, capabilities, summary)
|
|
||||||
}
|
|
||||||
@@ -1,4 +0,0 @@
|
|||||||
package main
|
|
||||||
|
|
||||||
// This file intentionally left minimal to avoid compilation order issues
|
|
||||||
// The EBPFManagerInterface is defined in ebpf_simple_manager.go
|
|
||||||
@@ -1,387 +0,0 @@
|
|||||||
package main
|
|
||||||
|
|
||||||
import (
|
|
||||||
"context"
|
|
||||||
"fmt"
|
|
||||||
"log"
|
|
||||||
"os"
|
|
||||||
"os/exec"
|
|
||||||
"strings"
|
|
||||||
"sync"
|
|
||||||
"time"
|
|
||||||
)
|
|
||||||
|
|
||||||
// EBPFEvent represents an event captured by eBPF programs
|
|
||||||
type EBPFEvent struct {
|
|
||||||
Timestamp int64 `json:"timestamp"`
|
|
||||||
EventType string `json:"event_type"`
|
|
||||||
ProcessID int `json:"process_id"`
|
|
||||||
ProcessName string `json:"process_name"`
|
|
||||||
UserID int `json:"user_id"`
|
|
||||||
Data map[string]interface{} `json:"data"`
|
|
||||||
}
|
|
||||||
|
|
||||||
// EBPFTrace represents a collection of eBPF events for a specific investigation
|
|
||||||
type EBPFTrace struct {
|
|
||||||
TraceID string `json:"trace_id"`
|
|
||||||
StartTime time.Time `json:"start_time"`
|
|
||||||
EndTime time.Time `json:"end_time"`
|
|
||||||
Capability string `json:"capability"`
|
|
||||||
Events []EBPFEvent `json:"events"`
|
|
||||||
Summary string `json:"summary"`
|
|
||||||
EventCount int `json:"event_count"`
|
|
||||||
ProcessList []string `json:"process_list"`
|
|
||||||
}
|
|
||||||
|
|
||||||
// EBPFRequest represents a request to run eBPF monitoring
|
|
||||||
type EBPFRequest struct {
|
|
||||||
Name string `json:"name"`
|
|
||||||
Type string `json:"type"` // "tracepoint", "kprobe", "kretprobe"
|
|
||||||
Target string `json:"target"` // tracepoint path or function name
|
|
||||||
Duration int `json:"duration"` // seconds
|
|
||||||
Filters map[string]string `json:"filters,omitempty"`
|
|
||||||
Description string `json:"description"`
|
|
||||||
}
|
|
||||||
|
|
||||||
// EBPFManagerInterface defines the interface for eBPF managers
|
|
||||||
type EBPFManagerInterface interface {
|
|
||||||
GetCapabilities() map[string]bool
|
|
||||||
GetSummary() map[string]interface{}
|
|
||||||
StartEBPFProgram(req EBPFRequest) (string, error)
|
|
||||||
GetProgramResults(programID string) (*EBPFTrace, error)
|
|
||||||
StopProgram(programID string) error
|
|
||||||
ListActivePrograms() []string
|
|
||||||
}
|
|
||||||
|
|
||||||
// SimpleEBPFManager implements basic eBPF functionality using bpftrace
|
|
||||||
type SimpleEBPFManager struct {
|
|
||||||
programs map[string]*RunningProgram
|
|
||||||
programsLock sync.RWMutex
|
|
||||||
capabilities map[string]bool
|
|
||||||
programCounter int
|
|
||||||
}
|
|
||||||
|
|
||||||
// RunningProgram represents an active eBPF program
|
|
||||||
type RunningProgram struct {
|
|
||||||
ID string
|
|
||||||
Request EBPFRequest
|
|
||||||
Process *exec.Cmd
|
|
||||||
Events []EBPFEvent
|
|
||||||
StartTime time.Time
|
|
||||||
Cancel context.CancelFunc
|
|
||||||
}
|
|
||||||
|
|
||||||
// NewSimpleEBPFManager creates a new simple eBPF manager
|
|
||||||
func NewSimpleEBPFManager() *SimpleEBPFManager {
|
|
||||||
manager := &SimpleEBPFManager{
|
|
||||||
programs: make(map[string]*RunningProgram),
|
|
||||||
capabilities: make(map[string]bool),
|
|
||||||
}
|
|
||||||
|
|
||||||
// Test capabilities
|
|
||||||
manager.testCapabilities()
|
|
||||||
return manager
|
|
||||||
}
|
|
||||||
|
|
||||||
// testCapabilities checks what eBPF capabilities are available
|
|
||||||
func (em *SimpleEBPFManager) testCapabilities() {
|
|
||||||
// Test if bpftrace is available
|
|
||||||
if _, err := exec.LookPath("bpftrace"); err == nil {
|
|
||||||
em.capabilities["bpftrace"] = true
|
|
||||||
}
|
|
||||||
|
|
||||||
// Test root privileges (required for eBPF)
|
|
||||||
em.capabilities["root_access"] = os.Geteuid() == 0
|
|
||||||
|
|
||||||
// Test kernel version (simplified check)
|
|
||||||
cmd := exec.Command("uname", "-r")
|
|
||||||
output, err := cmd.Output()
|
|
||||||
if err == nil {
|
|
||||||
version := strings.TrimSpace(string(output))
|
|
||||||
em.capabilities["kernel_ebpf"] = strings.Contains(version, "4.") || strings.Contains(version, "5.") || strings.Contains(version, "6.")
|
|
||||||
} else {
|
|
||||||
em.capabilities["kernel_ebpf"] = false
|
|
||||||
}
|
|
||||||
|
|
||||||
log.Printf("eBPF capabilities: %+v", em.capabilities)
|
|
||||||
}
|
|
||||||
|
|
||||||
// GetCapabilities returns the available eBPF capabilities
|
|
||||||
func (em *SimpleEBPFManager) GetCapabilities() map[string]bool {
|
|
||||||
em.programsLock.RLock()
|
|
||||||
defer em.programsLock.RUnlock()
|
|
||||||
|
|
||||||
caps := make(map[string]bool)
|
|
||||||
for k, v := range em.capabilities {
|
|
||||||
caps[k] = v
|
|
||||||
}
|
|
||||||
return caps
|
|
||||||
}
|
|
||||||
|
|
||||||
// GetSummary returns a summary of the eBPF manager state
|
|
||||||
func (em *SimpleEBPFManager) GetSummary() map[string]interface{} {
|
|
||||||
em.programsLock.RLock()
|
|
||||||
defer em.programsLock.RUnlock()
|
|
||||||
|
|
||||||
return map[string]interface{}{
|
|
||||||
"capabilities": em.capabilities,
|
|
||||||
"active_programs": len(em.programs),
|
|
||||||
"program_ids": em.ListActivePrograms(),
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
// StartEBPFProgram starts a new eBPF monitoring program
|
|
||||||
func (em *SimpleEBPFManager) StartEBPFProgram(req EBPFRequest) (string, error) {
|
|
||||||
if !em.capabilities["bpftrace"] {
|
|
||||||
return "", fmt.Errorf("bpftrace not available")
|
|
||||||
}
|
|
||||||
|
|
||||||
if !em.capabilities["root_access"] {
|
|
||||||
return "", fmt.Errorf("root access required for eBPF programs")
|
|
||||||
}
|
|
||||||
|
|
||||||
em.programsLock.Lock()
|
|
||||||
defer em.programsLock.Unlock()
|
|
||||||
|
|
||||||
// Generate program ID
|
|
||||||
em.programCounter++
|
|
||||||
programID := fmt.Sprintf("prog_%d", em.programCounter)
|
|
||||||
|
|
||||||
// Create bpftrace script
|
|
||||||
script, err := em.generateBpftraceScript(req)
|
|
||||||
if err != nil {
|
|
||||||
return "", fmt.Errorf("failed to generate script: %w", err)
|
|
||||||
}
|
|
||||||
|
|
||||||
// Start bpftrace process
|
|
||||||
ctx, cancel := context.WithTimeout(context.Background(), time.Duration(req.Duration)*time.Second)
|
|
||||||
cmd := exec.CommandContext(ctx, "bpftrace", "-e", script)
|
|
||||||
|
|
||||||
program := &RunningProgram{
|
|
||||||
ID: programID,
|
|
||||||
Request: req,
|
|
||||||
Process: cmd,
|
|
||||||
Events: []EBPFEvent{},
|
|
||||||
StartTime: time.Now(),
|
|
||||||
Cancel: cancel,
|
|
||||||
}
|
|
||||||
|
|
||||||
// Start the program
|
|
||||||
if err := cmd.Start(); err != nil {
|
|
||||||
cancel()
|
|
||||||
return "", fmt.Errorf("failed to start bpftrace: %w", err)
|
|
||||||
}
|
|
||||||
|
|
||||||
em.programs[programID] = program
|
|
||||||
|
|
||||||
// Monitor the program in a goroutine
|
|
||||||
go em.monitorProgram(programID)
|
|
||||||
|
|
||||||
log.Printf("Started eBPF program %s for %s", programID, req.Name)
|
|
||||||
return programID, nil
|
|
||||||
}
|
|
||||||
|
|
||||||
// generateBpftraceScript creates a bpftrace script based on the request
|
|
||||||
func (em *SimpleEBPFManager) generateBpftraceScript(req EBPFRequest) (string, error) {
|
|
||||||
switch req.Type {
|
|
||||||
case "network":
|
|
||||||
return `
|
|
||||||
BEGIN {
|
|
||||||
printf("Starting network monitoring...\n");
|
|
||||||
}
|
|
||||||
|
|
||||||
tracepoint:syscalls:sys_enter_connect,
|
|
||||||
tracepoint:syscalls:sys_enter_accept,
|
|
||||||
tracepoint:syscalls:sys_enter_recvfrom,
|
|
||||||
tracepoint:syscalls:sys_enter_sendto {
|
|
||||||
printf("NETWORK|%d|%s|%d|%s\n", nsecs, probe, pid, comm);
|
|
||||||
}
|
|
||||||
|
|
||||||
END {
|
|
||||||
printf("Network monitoring completed\n");
|
|
||||||
}`, nil
|
|
||||||
|
|
||||||
case "process":
|
|
||||||
return `
|
|
||||||
BEGIN {
|
|
||||||
printf("Starting process monitoring...\n");
|
|
||||||
}
|
|
||||||
|
|
||||||
tracepoint:syscalls:sys_enter_execve,
|
|
||||||
tracepoint:syscalls:sys_enter_fork,
|
|
||||||
tracepoint:syscalls:sys_enter_clone {
|
|
||||||
printf("PROCESS|%d|%s|%d|%s\n", nsecs, probe, pid, comm);
|
|
||||||
}
|
|
||||||
|
|
||||||
END {
|
|
||||||
printf("Process monitoring completed\n");
|
|
||||||
}`, nil
|
|
||||||
|
|
||||||
case "file":
|
|
||||||
return `
|
|
||||||
BEGIN {
|
|
||||||
printf("Starting file monitoring...\n");
|
|
||||||
}
|
|
||||||
|
|
||||||
tracepoint:syscalls:sys_enter_open,
|
|
||||||
tracepoint:syscalls:sys_enter_openat,
|
|
||||||
tracepoint:syscalls:sys_enter_read,
|
|
||||||
tracepoint:syscalls:sys_enter_write {
|
|
||||||
printf("FILE|%d|%s|%d|%s\n", nsecs, probe, pid, comm);
|
|
||||||
}
|
|
||||||
|
|
||||||
END {
|
|
||||||
printf("File monitoring completed\n");
|
|
||||||
}`, nil
|
|
||||||
|
|
||||||
default:
|
|
||||||
return "", fmt.Errorf("unsupported eBPF program type: %s", req.Type)
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
// monitorProgram monitors a running eBPF program and collects events
|
|
||||||
func (em *SimpleEBPFManager) monitorProgram(programID string) {
|
|
||||||
em.programsLock.Lock()
|
|
||||||
program, exists := em.programs[programID]
|
|
||||||
if !exists {
|
|
||||||
em.programsLock.Unlock()
|
|
||||||
return
|
|
||||||
}
|
|
||||||
em.programsLock.Unlock()
|
|
||||||
|
|
||||||
// Wait for the program to complete
|
|
||||||
err := program.Process.Wait()
|
|
||||||
|
|
||||||
// Clean up
|
|
||||||
program.Cancel()
|
|
||||||
|
|
||||||
em.programsLock.Lock()
|
|
||||||
if err != nil {
|
|
||||||
log.Printf("eBPF program %s completed with error: %v", programID, err)
|
|
||||||
} else {
|
|
||||||
log.Printf("eBPF program %s completed successfully", programID)
|
|
||||||
}
|
|
||||||
|
|
||||||
// Parse output and generate events (simplified for demo)
|
|
||||||
// In a real implementation, you would parse the bpftrace output
|
|
||||||
program.Events = []EBPFEvent{
|
|
||||||
{
|
|
||||||
Timestamp: time.Now().Unix(),
|
|
||||||
EventType: program.Request.Type,
|
|
||||||
ProcessID: 0,
|
|
||||||
ProcessName: "example",
|
|
||||||
UserID: 0,
|
|
||||||
Data: map[string]interface{}{
|
|
||||||
"description": "Sample eBPF event",
|
|
||||||
"program_id": programID,
|
|
||||||
},
|
|
||||||
},
|
|
||||||
}
|
|
||||||
em.programsLock.Unlock()
|
|
||||||
|
|
||||||
log.Printf("Generated %d events for program %s", len(program.Events), programID)
|
|
||||||
}
|
|
||||||
|
|
||||||
// GetProgramResults returns the results of a completed program
|
|
||||||
func (em *SimpleEBPFManager) GetProgramResults(programID string) (*EBPFTrace, error) {
|
|
||||||
em.programsLock.RLock()
|
|
||||||
defer em.programsLock.RUnlock()
|
|
||||||
|
|
||||||
program, exists := em.programs[programID]
|
|
||||||
if !exists {
|
|
||||||
return nil, fmt.Errorf("program %s not found", programID)
|
|
||||||
}
|
|
||||||
|
|
||||||
// Check if program is still running
|
|
||||||
if program.Process.ProcessState == nil {
|
|
||||||
return nil, fmt.Errorf("program %s is still running", programID)
|
|
||||||
}
|
|
||||||
|
|
||||||
events := make([]EBPFEvent, len(program.Events))
|
|
||||||
copy(events, program.Events)
|
|
||||||
|
|
||||||
processes := make([]string, 0)
|
|
||||||
processMap := make(map[string]bool)
|
|
||||||
for _, event := range events {
|
|
||||||
if !processMap[event.ProcessName] {
|
|
||||||
processes = append(processes, event.ProcessName)
|
|
||||||
processMap[event.ProcessName] = true
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
trace := &EBPFTrace{
|
|
||||||
TraceID: programID,
|
|
||||||
StartTime: program.StartTime,
|
|
||||||
EndTime: time.Now(),
|
|
||||||
Capability: program.Request.Type,
|
|
||||||
Events: events,
|
|
||||||
EventCount: len(events),
|
|
||||||
ProcessList: processes,
|
|
||||||
Summary: fmt.Sprintf("Collected %d events for %s monitoring", len(events), program.Request.Type),
|
|
||||||
}
|
|
||||||
|
|
||||||
return trace, nil
|
|
||||||
}
|
|
||||||
|
|
||||||
// StopProgram stops a running eBPF program
|
|
||||||
func (em *SimpleEBPFManager) StopProgram(programID string) error {
|
|
||||||
em.programsLock.Lock()
|
|
||||||
defer em.programsLock.Unlock()
|
|
||||||
|
|
||||||
program, exists := em.programs[programID]
|
|
||||||
if !exists {
|
|
||||||
return fmt.Errorf("program %s not found", programID)
|
|
||||||
}
|
|
||||||
|
|
||||||
// Cancel the context and kill the process
|
|
||||||
program.Cancel()
|
|
||||||
if program.Process.Process != nil {
|
|
||||||
program.Process.Process.Kill()
|
|
||||||
}
|
|
||||||
|
|
||||||
delete(em.programs, programID)
|
|
||||||
log.Printf("Stopped eBPF program %s", programID)
|
|
||||||
return nil
|
|
||||||
}
|
|
||||||
|
|
||||||
// ListActivePrograms returns a list of active program IDs
|
|
||||||
func (em *SimpleEBPFManager) ListActivePrograms() []string {
|
|
||||||
em.programsLock.RLock()
|
|
||||||
defer em.programsLock.RUnlock()
|
|
||||||
|
|
||||||
programs := make([]string, 0, len(em.programs))
|
|
||||||
for id := range em.programs {
|
|
||||||
programs = append(programs, id)
|
|
||||||
}
|
|
||||||
return programs
|
|
||||||
}
|
|
||||||
|
|
||||||
// GetCommonEBPFRequests returns predefined eBPF programs for common use cases
|
|
||||||
func (em *SimpleEBPFManager) GetCommonEBPFRequests() []EBPFRequest {
|
|
||||||
return []EBPFRequest{
|
|
||||||
{
|
|
||||||
Name: "network_activity",
|
|
||||||
Type: "network",
|
|
||||||
Target: "syscalls:sys_enter_connect,sys_enter_accept,sys_enter_recvfrom,sys_enter_sendto",
|
|
||||||
Duration: 30,
|
|
||||||
Description: "Monitor network connections and data transfers",
|
|
||||||
},
|
|
||||||
{
|
|
||||||
Name: "process_activity",
|
|
||||||
Type: "process",
|
|
||||||
Target: "syscalls:sys_enter_execve,sys_enter_fork,sys_enter_clone",
|
|
||||||
Duration: 30,
|
|
||||||
Description: "Monitor process creation and execution",
|
|
||||||
},
|
|
||||||
{
|
|
||||||
Name: "file_access",
|
|
||||||
Type: "file",
|
|
||||||
Target: "syscalls:sys_enter_open,sys_enter_openat,sys_enter_read,sys_enter_write",
|
|
||||||
Duration: 30,
|
|
||||||
Description: "Monitor file system access and I/O operations",
|
|
||||||
},
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
// Helper functions - using system_info.go functions
|
|
||||||
// isRoot and checkKernelVersion are available from system_info.go
|
|
||||||
@@ -1,67 +0,0 @@
|
|||||||
package main
|
|
||||||
|
|
||||||
import (
|
|
||||||
"fmt"
|
|
||||||
"os"
|
|
||||||
)
|
|
||||||
|
|
||||||
// Standalone test for eBPF integration
|
|
||||||
func testEBPFIntegration() {
|
|
||||||
fmt.Println("🔬 eBPF Integration Quick Test")
|
|
||||||
fmt.Println("=============================")
|
|
||||||
|
|
||||||
// Skip privilege checks for testing - show what would happen
|
|
||||||
if os.Geteuid() != 0 {
|
|
||||||
fmt.Println("⚠️ Running as non-root user - showing limited test results")
|
|
||||||
fmt.Println(" In production, this program requires root privileges")
|
|
||||||
fmt.Println("")
|
|
||||||
}
|
|
||||||
|
|
||||||
// Create a basic diagnostic agent
|
|
||||||
agent := NewLinuxDiagnosticAgent()
|
|
||||||
|
|
||||||
// Test eBPF capability detection
|
|
||||||
fmt.Println("1. Checking eBPF Capabilities:")
|
|
||||||
|
|
||||||
// Test if eBPF manager was initialized
|
|
||||||
if agent.ebpfManager == nil {
|
|
||||||
fmt.Println(" ❌ eBPF Manager not initialized")
|
|
||||||
return
|
|
||||||
}
|
|
||||||
fmt.Println(" ✅ eBPF Manager initialized successfully")
|
|
||||||
|
|
||||||
// Test eBPF program suggestions for different categories
|
|
||||||
fmt.Println("2. Testing eBPF Program Categories:")
|
|
||||||
|
|
||||||
// Simulate what would be available for different issue types
|
|
||||||
categories := []string{"NETWORK", "PROCESS", "FILE", "PERFORMANCE"}
|
|
||||||
for _, category := range categories {
|
|
||||||
fmt.Printf(" %s: Available\n", category)
|
|
||||||
}
|
|
||||||
|
|
||||||
// Test simple diagnostic with eBPF
|
|
||||||
fmt.Println("3. Testing eBPF-Enhanced Diagnostics:")
|
|
||||||
|
|
||||||
testIssue := "Process hanging - application stops responding"
|
|
||||||
fmt.Printf(" Issue: %s\n", testIssue)
|
|
||||||
|
|
||||||
// Call the eBPF-enhanced diagnostic (adjusted parameters)
|
|
||||||
result := agent.DiagnoseWithEBPF(testIssue)
|
|
||||||
|
|
||||||
fmt.Printf(" Response received: %s\n", result)
|
|
||||||
fmt.Println()
|
|
||||||
|
|
||||||
fmt.Println("✅ eBPF Integration Test Complete!")
|
|
||||||
fmt.Println(" The agent successfully:")
|
|
||||||
fmt.Println(" - Initialized eBPF manager")
|
|
||||||
fmt.Println(" - Integrated with diagnostic system")
|
|
||||||
fmt.Println(" - Ready for eBPF program execution")
|
|
||||||
}
|
|
||||||
|
|
||||||
// Add test command to main if run with "test-ebpf" argument
|
|
||||||
func init() {
|
|
||||||
if len(os.Args) > 1 && os.Args[1] == "test-ebpf" {
|
|
||||||
testEBPFIntegration()
|
|
||||||
os.Exit(0)
|
|
||||||
}
|
|
||||||
}
|
|
||||||
15
go.mod
15
go.mod
@@ -5,8 +5,19 @@ go 1.23.0
|
|||||||
toolchain go1.24.2
|
toolchain go1.24.2
|
||||||
|
|
||||||
require (
|
require (
|
||||||
github.com/cilium/ebpf v0.19.0
|
github.com/gorilla/websocket v1.5.3
|
||||||
|
github.com/joho/godotenv v1.5.1
|
||||||
github.com/sashabaranov/go-openai v1.32.0
|
github.com/sashabaranov/go-openai v1.32.0
|
||||||
|
github.com/shirou/gopsutil/v3 v3.24.5
|
||||||
)
|
)
|
||||||
|
|
||||||
require golang.org/x/sys v0.31.0 // indirect
|
require (
|
||||||
|
github.com/go-ole/go-ole v1.2.6 // indirect
|
||||||
|
github.com/lufia/plan9stats v0.0.0-20211012122336-39d0f177ccd0 // indirect
|
||||||
|
github.com/power-devops/perfstat v0.0.0-20210106213030-5aafc221ea8c // indirect
|
||||||
|
github.com/shoenig/go-m1cpu v0.1.6 // indirect
|
||||||
|
github.com/tklauser/go-sysconf v0.3.12 // indirect
|
||||||
|
github.com/tklauser/numcpus v0.6.1 // indirect
|
||||||
|
github.com/yusufpapurcu/wmi v1.2.4 // indirect
|
||||||
|
golang.org/x/sys v0.31.0 // indirect
|
||||||
|
)
|
||||||
|
|||||||
58
go.sum
58
go.sum
@@ -1,28 +1,42 @@
|
|||||||
github.com/cilium/ebpf v0.19.0 h1:Ro/rE64RmFBeA9FGjcTc+KmCeY6jXmryu6FfnzPRIao=
|
github.com/davecgh/go-spew v1.1.1 h1:vj9j/u1bqnvCEfJOwUhtlOARqs3+rkHYY13jYWTU97c=
|
||||||
github.com/cilium/ebpf v0.19.0/go.mod h1:fLCgMo3l8tZmAdM3B2XqdFzXBpwkcSTroaVqN08OWVY=
|
github.com/davecgh/go-spew v1.1.1/go.mod h1:J7Y8YcW2NihsgmVo/mv3lAwl/skON4iLHjSsI+c5H38=
|
||||||
github.com/go-quicktest/qt v1.101.1-0.20240301121107-c6c8733fa1e6 h1:teYtXy9B7y5lHTp8V9KPxpYRAVA7dozigQcMiBust1s=
|
github.com/go-ole/go-ole v1.2.6 h1:/Fpf6oFPoeFik9ty7siob0G6Ke8QvQEuVcuChpwXzpY=
|
||||||
github.com/go-quicktest/qt v1.101.1-0.20240301121107-c6c8733fa1e6/go.mod h1:p4lGIVX+8Wa6ZPNDvqcxq36XpUDLh42FLetFU7odllI=
|
github.com/go-ole/go-ole v1.2.6/go.mod h1:pprOEPIfldk/42T2oK7lQ4v4JSDwmV0As9GaiUsvbm0=
|
||||||
|
github.com/google/go-cmp v0.5.6/go.mod h1:v8dTdLbMG2kIc/vJvl+f65V22dbkXbowE6jgT/gNBxE=
|
||||||
github.com/google/go-cmp v0.6.0 h1:ofyhxvXcZhMsU5ulbFiLKl/XBFqE1GSq7atu8tAmTRI=
|
github.com/google/go-cmp v0.6.0 h1:ofyhxvXcZhMsU5ulbFiLKl/XBFqE1GSq7atu8tAmTRI=
|
||||||
github.com/google/go-cmp v0.6.0/go.mod h1:17dUlkBOakJ0+DkrSSNjCkIjxS6bF9zb3elmeNGIjoY=
|
github.com/google/go-cmp v0.6.0/go.mod h1:17dUlkBOakJ0+DkrSSNjCkIjxS6bF9zb3elmeNGIjoY=
|
||||||
github.com/josharian/native v1.1.0 h1:uuaP0hAbW7Y4l0ZRQ6C9zfb7Mg1mbFKry/xzDAfmtLA=
|
github.com/gorilla/websocket v1.5.3 h1:saDtZ6Pbx/0u+bgYQ3q96pZgCzfhKXGPqt7kZ72aNNg=
|
||||||
github.com/josharian/native v1.1.0/go.mod h1:7X/raswPFr05uY3HiLlYeyQntB6OO7E/d2Cu7qoaN2w=
|
github.com/gorilla/websocket v1.5.3/go.mod h1:YR8l580nyteQvAITg2hZ9XVh4b55+EU/adAjf1fMHhE=
|
||||||
github.com/jsimonetti/rtnetlink/v2 v2.0.1 h1:xda7qaHDSVOsADNouv7ukSuicKZO7GgVUCXxpaIEIlM=
|
github.com/joho/godotenv v1.5.1 h1:7eLL/+HRGLY0ldzfGMeQkb7vMd0as4CfYvUVzLqw0N0=
|
||||||
github.com/jsimonetti/rtnetlink/v2 v2.0.1/go.mod h1:7MoNYNbb3UaDHtF8udiJo/RH6VsTKP1pqKLUTVCvToE=
|
github.com/joho/godotenv v1.5.1/go.mod h1:f4LDr5Voq0i2e/R5DDNOoa2zzDfwtkZa6DnEwAbqwq4=
|
||||||
github.com/kr/pretty v0.3.1 h1:flRD4NNwYAUpkphVc1HcthR4KEIFJ65n8Mw5qdRn3LE=
|
github.com/lufia/plan9stats v0.0.0-20211012122336-39d0f177ccd0 h1:6E+4a0GO5zZEnZ81pIr0yLvtUWk2if982qA3F3QD6H4=
|
||||||
github.com/kr/pretty v0.3.1/go.mod h1:hoEshYVHaxMs3cyo3Yncou5ZscifuDolrwPKZanG3xk=
|
github.com/lufia/plan9stats v0.0.0-20211012122336-39d0f177ccd0/go.mod h1:zJYVVT2jmtg6P3p1VtQj7WsuWi/y4VnjVBn7F8KPB3I=
|
||||||
github.com/kr/text v0.2.0 h1:5Nx0Ya0ZqY2ygV366QzturHI13Jq95ApcVaJBhpS+AY=
|
github.com/pmezard/go-difflib v1.0.0 h1:4DBwDE0NGyQoBHbLQYPwSUPoCMWR5BEzIk/f1lZbAQM=
|
||||||
github.com/kr/text v0.2.0/go.mod h1:eLer722TekiGuMkidMxC/pM04lWEeraHUUmBw8l2grE=
|
github.com/pmezard/go-difflib v1.0.0/go.mod h1:iKH77koFhYxTK1pcRnkKkqfTogsbg7gZNVY4sRDYZ/4=
|
||||||
github.com/mdlayher/netlink v1.7.2 h1:/UtM3ofJap7Vl4QWCPDGXY8d3GIY2UGSDbK+QWmY8/g=
|
github.com/power-devops/perfstat v0.0.0-20210106213030-5aafc221ea8c h1:ncq/mPwQF4JjgDlrVEn3C11VoGHZN7m8qihwgMEtzYw=
|
||||||
github.com/mdlayher/netlink v1.7.2/go.mod h1:xraEF7uJbxLhc5fpHL4cPe221LI2bdttWlU+ZGLfQSw=
|
github.com/power-devops/perfstat v0.0.0-20210106213030-5aafc221ea8c/go.mod h1:OmDBASR4679mdNQnz2pUhc2G8CO2JrUAVFDRBDP/hJE=
|
||||||
github.com/mdlayher/socket v0.4.1 h1:eM9y2/jlbs1M615oshPQOHZzj6R6wMT7bX5NPiQvn2U=
|
|
||||||
github.com/mdlayher/socket v0.4.1/go.mod h1:cAqeGjoufqdxWkD7DkpyS+wcefOtmu5OQ8KuoJGIReA=
|
|
||||||
github.com/rogpeppe/go-internal v1.12.0 h1:exVL4IDcn6na9z1rAb56Vxr+CgyK3nn3O+epU5NdKM8=
|
|
||||||
github.com/rogpeppe/go-internal v1.12.0/go.mod h1:E+RYuTGaKKdloAfM02xzb0FW3Paa99yedzYV+kq4uf4=
|
|
||||||
github.com/sashabaranov/go-openai v1.32.0 h1:Yk3iE9moX3RBXxrof3OBtUBrE7qZR0zF9ebsoO4zVzI=
|
github.com/sashabaranov/go-openai v1.32.0 h1:Yk3iE9moX3RBXxrof3OBtUBrE7qZR0zF9ebsoO4zVzI=
|
||||||
github.com/sashabaranov/go-openai v1.32.0/go.mod h1:lj5b/K+zjTSFxVLijLSTDZuP7adOgerWeFyZLUhAKRg=
|
github.com/sashabaranov/go-openai v1.32.0/go.mod h1:lj5b/K+zjTSFxVLijLSTDZuP7adOgerWeFyZLUhAKRg=
|
||||||
golang.org/x/net v0.38.0 h1:vRMAPTMaeGqVhG5QyLJHqNDwecKTomGeqbnfZyKlBI8=
|
github.com/shirou/gopsutil/v3 v3.24.5 h1:i0t8kL+kQTvpAYToeuiVk3TgDeKOFioZO3Ztz/iZ9pI=
|
||||||
golang.org/x/net v0.38.0/go.mod h1:ivrbrMbzFq5J41QOQh0siUuly180yBYtLp+CKbEaFx8=
|
github.com/shirou/gopsutil/v3 v3.24.5/go.mod h1:bsoOS1aStSs9ErQ1WWfxllSeS1K5D+U30r2NfcubMVk=
|
||||||
golang.org/x/sync v0.1.0 h1:wsuoTGHzEhffawBOhz5CYhcrV4IdKZbEyZjBMuTp12o=
|
github.com/shoenig/go-m1cpu v0.1.6 h1:nxdKQNcEB6vzgA2E2bvzKIYRuNj7XNJ4S/aRSwKzFtM=
|
||||||
golang.org/x/sync v0.1.0/go.mod h1:RxMgew5VJxzue5/jJTE5uejpjVlOe/izrB70Jof72aM=
|
github.com/shoenig/go-m1cpu v0.1.6/go.mod h1:1JJMcUBvfNwpq05QDQVAnx3gUHr9IYF7GNg9SUEw2VQ=
|
||||||
|
github.com/shoenig/test v0.6.4 h1:kVTaSd7WLz5WZ2IaoM0RSzRsUD+m8wRR+5qvntpn4LU=
|
||||||
|
github.com/shoenig/test v0.6.4/go.mod h1:byHiCGXqrVaflBLAMq/srcZIHynQPQgeyvkvXnjqq0k=
|
||||||
|
github.com/stretchr/testify v1.9.0 h1:HtqpIVDClZ4nwg75+f6Lvsy/wHu+3BoSGCbBAcpTsTg=
|
||||||
|
github.com/stretchr/testify v1.9.0/go.mod h1:r2ic/lqez/lEtzL7wO/rwa5dbSLXVDPFyf8C91i36aY=
|
||||||
|
github.com/tklauser/go-sysconf v0.3.12 h1:0QaGUFOdQaIVdPgfITYzaTegZvdCjmYO52cSFAEVmqU=
|
||||||
|
github.com/tklauser/go-sysconf v0.3.12/go.mod h1:Ho14jnntGE1fpdOqQEEaiKRpvIavV0hSfmBq8nJbHYI=
|
||||||
|
github.com/tklauser/numcpus v0.6.1 h1:ng9scYS7az0Bk4OZLvrNXNSAO2Pxr1XXRAPyjhIx+Fk=
|
||||||
|
github.com/tklauser/numcpus v0.6.1/go.mod h1:1XfjsgE2zo8GVw7POkMbHENHzVg3GzmoZ9fESEdAacY=
|
||||||
|
github.com/yusufpapurcu/wmi v1.2.4 h1:zFUKzehAFReQwLys1b/iSMl+JQGSCSjtVqQn9bBrPo0=
|
||||||
|
github.com/yusufpapurcu/wmi v1.2.4/go.mod h1:SBZ9tNy3G9/m5Oi98Zks0QjeHVDvuK0qfxQmPyzfmi0=
|
||||||
|
golang.org/x/sys v0.0.0-20190916202348-b4ddaad3f8a3/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs=
|
||||||
|
golang.org/x/sys v0.0.0-20201204225414-ed752295db88/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs=
|
||||||
|
golang.org/x/sys v0.8.0/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg=
|
||||||
|
golang.org/x/sys v0.11.0/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg=
|
||||||
golang.org/x/sys v0.31.0 h1:ioabZlmFYtWhL+TRYpcnNlLwhyxaM9kWTDEmfnprqik=
|
golang.org/x/sys v0.31.0 h1:ioabZlmFYtWhL+TRYpcnNlLwhyxaM9kWTDEmfnprqik=
|
||||||
golang.org/x/sys v0.31.0/go.mod h1:BJP2sWEmIv4KK5OTEluFJCKSidICx8ciO85XgH3Ak8k=
|
golang.org/x/sys v0.31.0/go.mod h1:BJP2sWEmIv4KK5OTEluFJCKSidICx8ciO85XgH3Ak8k=
|
||||||
|
golang.org/x/xerrors v0.0.0-20191204190536-9bdfabe68543/go.mod h1:I/5z698sn9Ka8TeJc9MKroUUfqBBauWjQqLJ2OPfmY0=
|
||||||
|
gopkg.in/yaml.v3 v3.0.1 h1:fxVm/GzAzEWqLHuvctI91KS9hhNmmWOoWu0XTYJS7CA=
|
||||||
|
gopkg.in/yaml.v3 v3.0.1/go.mod h1:K4uyk7z7BCEPqu6E+C64Yfv1cQ7kz7rIZviUmN+EgEM=
|
||||||
|
|||||||
470
install.sh
470
install.sh
@@ -1,85 +1,403 @@
|
|||||||
#!/bin/bash
|
#!/bin/bash
|
||||||
|
|
||||||
# Linux Diagnostic Agent Installation Script
|
|
||||||
# This script installs the nanny-agent on a Linux system
|
|
||||||
|
|
||||||
set -e
|
set -e
|
||||||
|
|
||||||
echo "🔧 Linux Diagnostic Agent Installation Script"
|
# NannyAgent Installer Script
|
||||||
echo "=============================================="
|
# Version: 0.0.1
|
||||||
|
# Description: Installs NannyAgent Linux diagnostic tool with eBPF capabilities
|
||||||
|
|
||||||
# Check if Go is installed
|
VERSION="0.0.1"
|
||||||
if ! command -v go &> /dev/null; then
|
INSTALL_DIR="/usr/local/bin"
|
||||||
echo "❌ Go is not installed. Please install Go first:"
|
CONFIG_DIR="/etc/nannyagent"
|
||||||
|
DATA_DIR="/var/lib/nannyagent"
|
||||||
|
BINARY_NAME="nannyagent"
|
||||||
|
LOCKFILE="${DATA_DIR}/.nannyagent.lock"
|
||||||
|
|
||||||
|
# Colors for output
|
||||||
|
RED='\033[0;31m'
|
||||||
|
GREEN='\033[0;32m'
|
||||||
|
YELLOW='\033[1;33m'
|
||||||
|
BLUE='\033[0;34m'
|
||||||
|
NC='\033[0m' # No Color
|
||||||
|
|
||||||
|
# Logging functions
|
||||||
|
log_info() {
|
||||||
|
echo -e "${BLUE}[INFO]${NC} $1"
|
||||||
|
}
|
||||||
|
|
||||||
|
log_success() {
|
||||||
|
echo -e "${GREEN}[SUCCESS]${NC} $1"
|
||||||
|
}
|
||||||
|
|
||||||
|
log_warning() {
|
||||||
|
echo -e "${YELLOW}[WARNING]${NC} $1"
|
||||||
|
}
|
||||||
|
|
||||||
|
log_error() {
|
||||||
|
echo -e "${RED}[ERROR]${NC} $1"
|
||||||
|
}
|
||||||
|
|
||||||
|
# Check if running as root
|
||||||
|
check_root() {
|
||||||
|
if [ "$EUID" -ne 0 ]; then
|
||||||
|
log_error "This installer must be run as root"
|
||||||
|
log_info "Please run: sudo bash install.sh"
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
}
|
||||||
|
|
||||||
|
# Detect OS and architecture
|
||||||
|
detect_platform() {
|
||||||
|
OS=$(uname -s | tr '[:upper:]' '[:lower:]')
|
||||||
|
ARCH=$(uname -m)
|
||||||
|
|
||||||
|
log_info "Detected OS: $OS"
|
||||||
|
log_info "Detected Architecture: $ARCH"
|
||||||
|
|
||||||
|
# Check if OS is Linux
|
||||||
|
if [ "$OS" != "linux" ]; then
|
||||||
|
log_error "Unsupported operating system: $OS"
|
||||||
|
log_error "This installer only supports Linux"
|
||||||
|
exit 2
|
||||||
|
fi
|
||||||
|
|
||||||
|
# Check if architecture is supported (amd64 or arm64)
|
||||||
|
case "$ARCH" in
|
||||||
|
x86_64|amd64)
|
||||||
|
ARCH="amd64"
|
||||||
|
;;
|
||||||
|
aarch64|arm64)
|
||||||
|
ARCH="arm64"
|
||||||
|
;;
|
||||||
|
*)
|
||||||
|
log_error "Unsupported architecture: $ARCH"
|
||||||
|
log_error "Only amd64 (x86_64) and arm64 (aarch64) are supported"
|
||||||
|
exit 3
|
||||||
|
;;
|
||||||
|
esac
|
||||||
|
|
||||||
|
# Check if running in container/LXC
|
||||||
|
if [ -f /.dockerenv ] || grep -q docker /proc/1/cgroup 2>/dev/null; then
|
||||||
|
log_error "Container environment detected (Docker)"
|
||||||
|
log_error "NannyAgent does not support running inside containers or LXC"
|
||||||
|
exit 4
|
||||||
|
fi
|
||||||
|
|
||||||
|
if [ -f /proc/1/environ ] && grep -q "container=lxc" /proc/1/environ 2>/dev/null; then
|
||||||
|
log_error "LXC environment detected"
|
||||||
|
log_error "NannyAgent does not support running inside containers or LXC"
|
||||||
|
exit 4
|
||||||
|
fi
|
||||||
|
}
|
||||||
|
|
||||||
|
# Check kernel version (5.x or higher)
|
||||||
|
check_kernel_version() {
|
||||||
|
log_info "Checking kernel version..."
|
||||||
|
|
||||||
|
KERNEL_VERSION=$(uname -r)
|
||||||
|
KERNEL_MAJOR=$(echo "$KERNEL_VERSION" | cut -d. -f1)
|
||||||
|
|
||||||
|
log_info "Kernel version: $KERNEL_VERSION"
|
||||||
|
|
||||||
|
if [ "$KERNEL_MAJOR" -lt 5 ]; then
|
||||||
|
log_error "Kernel version $KERNEL_VERSION is not supported"
|
||||||
|
log_error "NannyAgent requires Linux kernel 5.x or higher"
|
||||||
|
log_error "Current kernel: $KERNEL_VERSION (major version: $KERNEL_MAJOR)"
|
||||||
|
exit 5
|
||||||
|
fi
|
||||||
|
|
||||||
|
log_success "Kernel version $KERNEL_VERSION is supported"
|
||||||
|
}
|
||||||
|
|
||||||
|
# Check if another instance is already installed
|
||||||
|
check_existing_installation() {
|
||||||
|
log_info "Checking for existing installation..."
|
||||||
|
|
||||||
|
# Check if lock file exists
|
||||||
|
if [ -f "$LOCKFILE" ]; then
|
||||||
|
log_error "An installation lock file exists at $LOCKFILE"
|
||||||
|
log_error "Another instance of NannyAgent may already be installed or running"
|
||||||
|
log_error "If you're sure no other instance exists, remove the lock file:"
|
||||||
|
log_error " sudo rm $LOCKFILE"
|
||||||
|
exit 6
|
||||||
|
fi
|
||||||
|
|
||||||
|
# Check if data directory exists and has files
|
||||||
|
if [ -d "$DATA_DIR" ]; then
|
||||||
|
FILE_COUNT=$(find "$DATA_DIR" -type f 2>/dev/null | wc -l)
|
||||||
|
if [ "$FILE_COUNT" -gt 0 ]; then
|
||||||
|
log_error "Data directory $DATA_DIR already exists with $FILE_COUNT files"
|
||||||
|
log_error "Another instance of NannyAgent may already be installed"
|
||||||
|
log_error "To reinstall, please remove the data directory first:"
|
||||||
|
log_error " sudo rm -rf $DATA_DIR"
|
||||||
|
exit 6
|
||||||
|
fi
|
||||||
|
fi
|
||||||
|
|
||||||
|
# Check if binary already exists
|
||||||
|
if [ -f "$INSTALL_DIR/$BINARY_NAME" ]; then
|
||||||
|
log_warning "Binary $INSTALL_DIR/$BINARY_NAME already exists"
|
||||||
|
log_warning "It will be replaced with the new version"
|
||||||
|
fi
|
||||||
|
|
||||||
|
log_success "No conflicting installation found"
|
||||||
|
}
|
||||||
|
|
||||||
|
# Install required dependencies (eBPF tools)
|
||||||
|
install_dependencies() {
|
||||||
|
log_info "Installing eBPF dependencies..."
|
||||||
|
|
||||||
|
# Detect package manager
|
||||||
|
if command -v apt-get &> /dev/null; then
|
||||||
|
PKG_MANAGER="apt-get"
|
||||||
|
log_info "Detected Debian/Ubuntu system"
|
||||||
|
|
||||||
|
# Update package list
|
||||||
|
log_info "Updating package list..."
|
||||||
|
apt-get update -qq || {
|
||||||
|
log_error "Failed to update package list"
|
||||||
|
exit 7
|
||||||
|
}
|
||||||
|
|
||||||
|
# Install bpfcc-tools and bpftrace
|
||||||
|
log_info "Installing bpfcc-tools and bpftrace..."
|
||||||
|
DEBIAN_FRONTEND=noninteractive apt-get install -y -qq bpfcc-tools bpftrace linux-headers-$(uname -r) 2>&1 | grep -v "^Reading" | grep -v "^Building" || {
|
||||||
|
log_error "Failed to install eBPF tools"
|
||||||
|
exit 7
|
||||||
|
}
|
||||||
|
|
||||||
|
elif command -v dnf &> /dev/null; then
|
||||||
|
PKG_MANAGER="dnf"
|
||||||
|
log_info "Detected Fedora/RHEL 8+ system"
|
||||||
|
|
||||||
|
log_info "Installing bcc-tools and bpftrace..."
|
||||||
|
dnf install -y -q bcc-tools bpftrace kernel-devel 2>&1 | grep -v "^Last metadata" || {
|
||||||
|
log_error "Failed to install eBPF tools"
|
||||||
|
exit 7
|
||||||
|
}
|
||||||
|
|
||||||
|
elif command -v yum &> /dev/null; then
|
||||||
|
PKG_MANAGER="yum"
|
||||||
|
log_info "Detected CentOS/RHEL 7 system"
|
||||||
|
|
||||||
|
log_info "Installing bcc-tools and bpftrace..."
|
||||||
|
yum install -y -q bcc-tools bpftrace kernel-devel 2>&1 | grep -v "^Loaded plugins" || {
|
||||||
|
log_error "Failed to install eBPF tools"
|
||||||
|
exit 7
|
||||||
|
}
|
||||||
|
|
||||||
|
else
|
||||||
|
log_error "Unsupported package manager"
|
||||||
|
log_error "Please install 'bpfcc-tools' and 'bpftrace' manually"
|
||||||
|
exit 7
|
||||||
|
fi
|
||||||
|
|
||||||
|
# Verify installations
|
||||||
|
if ! command -v bpftrace &> /dev/null; then
|
||||||
|
log_error "bpftrace installation failed or not in PATH"
|
||||||
|
exit 7
|
||||||
|
fi
|
||||||
|
|
||||||
|
# Check for BCC tools (RedHat systems may have them in /usr/share/bcc/tools/)
|
||||||
|
if [ -d "/usr/share/bcc/tools" ]; then
|
||||||
|
log_info "BCC tools found at /usr/share/bcc/tools/"
|
||||||
|
# Add to PATH if not already there
|
||||||
|
if [[ ":$PATH:" != *":/usr/share/bcc/tools:"* ]]; then
|
||||||
|
export PATH="/usr/share/bcc/tools:$PATH"
|
||||||
|
log_info "Added /usr/share/bcc/tools to PATH"
|
||||||
|
fi
|
||||||
|
fi
|
||||||
|
|
||||||
|
log_success "eBPF tools installed successfully"
|
||||||
|
}
|
||||||
|
|
||||||
|
# Check Go installation
|
||||||
|
check_go() {
|
||||||
|
log_info "Checking for Go installation..."
|
||||||
|
|
||||||
|
if ! command -v go &> /dev/null; then
|
||||||
|
log_error "Go is not installed"
|
||||||
|
log_error "Please install Go 1.23 or higher from https://golang.org/dl/"
|
||||||
|
exit 8
|
||||||
|
fi
|
||||||
|
|
||||||
|
GO_VERSION=$(go version | awk '{print $3}' | sed 's/go//')
|
||||||
|
log_info "Go version: $GO_VERSION"
|
||||||
|
log_success "Go is installed"
|
||||||
|
}
|
||||||
|
|
||||||
|
# Build the binary
|
||||||
|
build_binary() {
|
||||||
|
log_info "Building NannyAgent binary for $ARCH architecture..."
|
||||||
|
|
||||||
|
# Check if go.mod exists
|
||||||
|
if [ ! -f "go.mod" ]; then
|
||||||
|
log_error "go.mod not found. Are you in the correct directory?"
|
||||||
|
exit 9
|
||||||
|
fi
|
||||||
|
|
||||||
|
# Get Go dependencies
|
||||||
|
log_info "Downloading Go dependencies..."
|
||||||
|
go mod download || {
|
||||||
|
log_error "Failed to download Go dependencies"
|
||||||
|
exit 9
|
||||||
|
}
|
||||||
|
|
||||||
|
# Build the binary for the current architecture
|
||||||
|
log_info "Compiling binary for $ARCH..."
|
||||||
|
CGO_ENABLED=0 GOOS=linux GOARCH="$ARCH" go build -a -installsuffix cgo \
|
||||||
|
-ldflags "-w -s -X main.Version=$VERSION" \
|
||||||
|
-o "$BINARY_NAME" . || {
|
||||||
|
log_error "Failed to build binary for $ARCH"
|
||||||
|
exit 9
|
||||||
|
}
|
||||||
|
|
||||||
|
# Verify binary was created
|
||||||
|
if [ ! -f "$BINARY_NAME" ]; then
|
||||||
|
log_error "Binary not found after build"
|
||||||
|
exit 9
|
||||||
|
fi
|
||||||
|
|
||||||
|
# Verify binary is executable
|
||||||
|
chmod +x "$BINARY_NAME"
|
||||||
|
|
||||||
|
# Test the binary
|
||||||
|
if ./"$BINARY_NAME" --version &>/dev/null; then
|
||||||
|
log_success "Binary built and tested successfully for $ARCH"
|
||||||
|
else
|
||||||
|
log_error "Binary build succeeded but execution test failed"
|
||||||
|
exit 9
|
||||||
|
fi
|
||||||
|
}
|
||||||
|
|
||||||
|
# Check connectivity to Supabase
|
||||||
|
check_connectivity() {
|
||||||
|
log_info "Checking connectivity to Supabase..."
|
||||||
|
|
||||||
|
# Load SUPABASE_PROJECT_URL from .env if it exists
|
||||||
|
if [ -f ".env" ]; then
|
||||||
|
source .env 2>/dev/null || true
|
||||||
|
fi
|
||||||
|
|
||||||
|
if [ -z "$SUPABASE_PROJECT_URL" ]; then
|
||||||
|
log_warning "SUPABASE_PROJECT_URL not set in .env file"
|
||||||
|
log_warning "The agent may not work without proper configuration"
|
||||||
|
log_warning "Please configure $CONFIG_DIR/config.env after installation"
|
||||||
|
return
|
||||||
|
fi
|
||||||
|
|
||||||
|
log_info "Testing connection to $SUPABASE_PROJECT_URL..."
|
||||||
|
|
||||||
|
# Try to reach the Supabase endpoint
|
||||||
|
if command -v curl &> /dev/null; then
|
||||||
|
HTTP_CODE=$(curl -s -o /dev/null -w "%{http_code}" --connect-timeout 5 "$SUPABASE_PROJECT_URL" || echo "000")
|
||||||
|
|
||||||
|
if [ "$HTTP_CODE" = "000" ]; then
|
||||||
|
log_warning "Cannot connect to $SUPABASE_PROJECT_URL"
|
||||||
|
log_warning "Network connectivity issue detected"
|
||||||
|
log_warning "The agent will not work without connectivity to Supabase"
|
||||||
|
log_warning "Please check your network configuration and firewall settings"
|
||||||
|
elif [ "$HTTP_CODE" = "404" ] || [ "$HTTP_CODE" = "200" ] || [ "$HTTP_CODE" = "301" ] || [ "$HTTP_CODE" = "302" ]; then
|
||||||
|
log_success "Successfully connected to Supabase (HTTP $HTTP_CODE)"
|
||||||
|
else
|
||||||
|
log_warning "Received HTTP $HTTP_CODE from $SUPABASE_PROJECT_URL"
|
||||||
|
log_warning "The agent may not work correctly"
|
||||||
|
fi
|
||||||
|
else
|
||||||
|
log_warning "curl not found, skipping connectivity check"
|
||||||
|
fi
|
||||||
|
}
|
||||||
|
|
||||||
|
# Create necessary directories
|
||||||
|
create_directories() {
|
||||||
|
log_info "Creating directories..."
|
||||||
|
|
||||||
|
# Create config directory
|
||||||
|
mkdir -p "$CONFIG_DIR" || {
|
||||||
|
log_error "Failed to create config directory: $CONFIG_DIR"
|
||||||
|
exit 10
|
||||||
|
}
|
||||||
|
|
||||||
|
# Create data directory with restricted permissions
|
||||||
|
mkdir -p "$DATA_DIR" || {
|
||||||
|
log_error "Failed to create data directory: $DATA_DIR"
|
||||||
|
exit 10
|
||||||
|
}
|
||||||
|
chmod 700 "$DATA_DIR"
|
||||||
|
|
||||||
|
log_success "Directories created successfully"
|
||||||
|
}
|
||||||
|
|
||||||
|
# Install the binary
|
||||||
|
install_binary() {
|
||||||
|
log_info "Installing binary to $INSTALL_DIR..."
|
||||||
|
|
||||||
|
# Copy binary
|
||||||
|
cp "$BINARY_NAME" "$INSTALL_DIR/$BINARY_NAME" || {
|
||||||
|
log_error "Failed to copy binary to $INSTALL_DIR"
|
||||||
|
exit 11
|
||||||
|
}
|
||||||
|
|
||||||
|
# Set permissions
|
||||||
|
chmod 755 "$INSTALL_DIR/$BINARY_NAME"
|
||||||
|
|
||||||
|
# Copy .env to config if it exists
|
||||||
|
if [ -f ".env" ]; then
|
||||||
|
log_info "Copying configuration to $CONFIG_DIR..."
|
||||||
|
cp .env "$CONFIG_DIR/config.env"
|
||||||
|
chmod 600 "$CONFIG_DIR/config.env"
|
||||||
|
fi
|
||||||
|
|
||||||
|
# Create lock file
|
||||||
|
touch "$LOCKFILE"
|
||||||
|
echo "Installed at $(date)" > "$LOCKFILE"
|
||||||
|
|
||||||
|
log_success "Binary installed successfully"
|
||||||
|
}
|
||||||
|
|
||||||
|
# Display post-installation information
|
||||||
|
post_install_info() {
|
||||||
echo ""
|
echo ""
|
||||||
echo "For Ubuntu/Debian:"
|
log_success "NannyAgent v$VERSION installed successfully!"
|
||||||
echo " sudo apt update && sudo apt install golang-go"
|
|
||||||
echo ""
|
echo ""
|
||||||
echo "For RHEL/CentOS/Fedora:"
|
echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
|
||||||
echo " sudo dnf install golang"
|
|
||||||
echo " # or"
|
|
||||||
echo " sudo yum install golang"
|
|
||||||
echo ""
|
echo ""
|
||||||
exit 1
|
echo " Configuration: $CONFIG_DIR/config.env"
|
||||||
fi
|
echo " Data Directory: $DATA_DIR"
|
||||||
|
echo " Binary Location: $INSTALL_DIR/$BINARY_NAME"
|
||||||
|
echo ""
|
||||||
|
echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
|
||||||
|
echo ""
|
||||||
|
echo "Next steps:"
|
||||||
|
echo ""
|
||||||
|
echo " 1. Configure your Supabase URL in $CONFIG_DIR/config.env"
|
||||||
|
echo " 2. Run the agent: sudo $BINARY_NAME"
|
||||||
|
echo " 3. Check version: $BINARY_NAME --version"
|
||||||
|
echo " 4. Get help: $BINARY_NAME --help"
|
||||||
|
echo ""
|
||||||
|
echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
|
||||||
|
echo ""
|
||||||
|
}
|
||||||
|
|
||||||
echo "✅ Go is installed: $(go version)"
|
# Main installation flow
|
||||||
|
main() {
|
||||||
|
echo ""
|
||||||
|
echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
|
||||||
|
echo " NannyAgent Installer v$VERSION"
|
||||||
|
echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
|
||||||
|
echo ""
|
||||||
|
|
||||||
|
check_root
|
||||||
|
detect_platform
|
||||||
|
check_kernel_version
|
||||||
|
check_existing_installation
|
||||||
|
install_dependencies
|
||||||
|
check_go
|
||||||
|
build_binary
|
||||||
|
check_connectivity
|
||||||
|
create_directories
|
||||||
|
install_binary
|
||||||
|
post_install_info
|
||||||
|
}
|
||||||
|
|
||||||
# Build the application
|
# Run main installation
|
||||||
echo "🔨 Building the application..."
|
main
|
||||||
go mod tidy
|
|
||||||
make build
|
|
||||||
|
|
||||||
# Check if build was successful
|
|
||||||
if [ ! -f "./nanny-agent" ]; then
|
|
||||||
echo "❌ Build failed! nanny-agent binary not found."
|
|
||||||
exit 1
|
|
||||||
fi
|
|
||||||
|
|
||||||
echo "✅ Build successful!"
|
|
||||||
|
|
||||||
# Ask for installation preference
|
|
||||||
echo ""
|
|
||||||
echo "Installation options:"
|
|
||||||
echo "1. Install system-wide (/usr/local/bin) - requires sudo"
|
|
||||||
echo "2. Keep in current directory"
|
|
||||||
echo ""
|
|
||||||
read -p "Choose option (1 or 2): " choice
|
|
||||||
|
|
||||||
case $choice in
|
|
||||||
1)
|
|
||||||
echo "📦 Installing system-wide..."
|
|
||||||
sudo cp nanny-agent /usr/local/bin/
|
|
||||||
sudo chmod +x /usr/local/bin/nanny-agent
|
|
||||||
echo "✅ Agent installed to /usr/local/bin/nanny-agent"
|
|
||||||
echo ""
|
|
||||||
echo "You can now run the agent from anywhere with:"
|
|
||||||
echo " nanny-agent"
|
|
||||||
;;
|
|
||||||
2)
|
|
||||||
echo "✅ Agent ready in current directory"
|
|
||||||
echo ""
|
|
||||||
echo "Run the agent with:"
|
|
||||||
echo " ./nanny-agent"
|
|
||||||
;;
|
|
||||||
*)
|
|
||||||
echo "❌ Invalid choice. Agent is available in current directory."
|
|
||||||
echo "Run with: ./nanny-agent"
|
|
||||||
;;
|
|
||||||
esac
|
|
||||||
|
|
||||||
# Configuration
|
|
||||||
echo ""
|
|
||||||
echo "📝 Configuration:"
|
|
||||||
echo "Set these environment variables to configure the agent:"
|
|
||||||
echo ""
|
|
||||||
echo "export NANNYAPI_ENDPOINT=\"http://your-nannyapi-host:3000/openai/v1\""
|
|
||||||
echo "export NANNYAPI_MODEL=\"your-model-identifier\""
|
|
||||||
echo ""
|
|
||||||
echo "Or create a .env file in the working directory."
|
|
||||||
echo ""
|
|
||||||
echo "🎉 Installation complete!"
|
|
||||||
echo ""
|
|
||||||
echo "Example usage:"
|
|
||||||
echo " ./nanny-agent"
|
|
||||||
echo " > On /var filesystem I cannot create any file but df -h shows 30% free space available."
|
|
||||||
|
|||||||
@@ -1,116 +0,0 @@
|
|||||||
#!/bin/bash
|
|
||||||
|
|
||||||
# Linux Diagnostic Agent - Integration Tests
|
|
||||||
# This script creates realistic Linux problem scenarios for testing
|
|
||||||
|
|
||||||
set -e
|
|
||||||
|
|
||||||
AGENT_BINARY="./nanny-agent"
|
|
||||||
TEST_DIR="/tmp/nanny-agent-tests"
|
|
||||||
TEST_LOG="$TEST_DIR/integration_test.log"
|
|
||||||
|
|
||||||
# Color codes for output
|
|
||||||
RED='\033[0;31m'
|
|
||||||
GREEN='\033[0;32m'
|
|
||||||
YELLOW='\033[1;33m'
|
|
||||||
BLUE='\033[0;34m'
|
|
||||||
NC='\033[0m' # No Color
|
|
||||||
|
|
||||||
# Ensure test directory exists
|
|
||||||
mkdir -p "$TEST_DIR"
|
|
||||||
|
|
||||||
echo -e "${BLUE}🧪 Linux Diagnostic Agent - Integration Tests${NC}"
|
|
||||||
echo "================================================="
|
|
||||||
echo ""
|
|
||||||
|
|
||||||
# Check if agent binary exists
|
|
||||||
if [[ ! -f "$AGENT_BINARY" ]]; then
|
|
||||||
echo -e "${RED}❌ Agent binary not found at $AGENT_BINARY${NC}"
|
|
||||||
echo "Please run: make build"
|
|
||||||
exit 1
|
|
||||||
fi
|
|
||||||
|
|
||||||
# Function to run a test scenario
|
|
||||||
run_test() {
|
|
||||||
local test_name="$1"
|
|
||||||
local scenario="$2"
|
|
||||||
local expected_keywords="$3"
|
|
||||||
|
|
||||||
echo -e "${YELLOW}📋 Test: $test_name${NC}"
|
|
||||||
echo "Scenario: $scenario"
|
|
||||||
echo ""
|
|
||||||
|
|
||||||
# Run the agent with the scenario
|
|
||||||
echo "$scenario" | timeout 120s "$AGENT_BINARY" > "$TEST_LOG" 2>&1 || true
|
|
||||||
|
|
||||||
# Check if any expected keywords are found in the output
|
|
||||||
local found_keywords=0
|
|
||||||
IFS=',' read -ra KEYWORDS <<< "$expected_keywords"
|
|
||||||
for keyword in "${KEYWORDS[@]}"; do
|
|
||||||
keyword=$(echo "$keyword" | xargs) # trim whitespace
|
|
||||||
if grep -qi "$keyword" "$TEST_LOG"; then
|
|
||||||
echo -e "${GREEN} ✅ Found expected keyword: $keyword${NC}"
|
|
||||||
((found_keywords++))
|
|
||||||
else
|
|
||||||
echo -e "${RED} ❌ Missing keyword: $keyword${NC}"
|
|
||||||
fi
|
|
||||||
done
|
|
||||||
|
|
||||||
# Show summary
|
|
||||||
if [[ $found_keywords -gt 0 ]]; then
|
|
||||||
echo -e "${GREEN} ✅ Test PASSED ($found_keywords keywords found)${NC}"
|
|
||||||
else
|
|
||||||
echo -e "${RED} ❌ Test FAILED (no expected keywords found)${NC}"
|
|
||||||
fi
|
|
||||||
|
|
||||||
echo ""
|
|
||||||
echo "Full output saved to: $TEST_LOG"
|
|
||||||
echo "----------------------------------------"
|
|
||||||
echo ""
|
|
||||||
}
|
|
||||||
|
|
||||||
# Test Scenario 1: Disk Space Issues (Inode Exhaustion)
|
|
||||||
run_test "Disk Space - Inode Exhaustion" \
|
|
||||||
"I cannot create new files in /home directory even though df -h shows plenty of space available. Getting 'No space left on device' error when trying to touch new files." \
|
|
||||||
"inode,df -i,filesystem,inodes,exhausted"
|
|
||||||
|
|
||||||
# Test Scenario 2: Memory Issues
|
|
||||||
run_test "Memory Issues - OOM Killer" \
|
|
||||||
"My applications keep getting killed randomly and I see 'killed' messages in logs. The system becomes unresponsive for a few seconds before recovering. This happens especially when running memory-intensive tasks." \
|
|
||||||
"memory,oom,killed,dmesg,free,swap"
|
|
||||||
|
|
||||||
# Test Scenario 3: Network Connectivity Issues
|
|
||||||
run_test "Network Connectivity - DNS Resolution" \
|
|
||||||
"I can ping IP addresses directly (like 8.8.8.8) but cannot resolve domain names. Web browsing fails with DNS resolution errors, but ping 8.8.8.8 works fine." \
|
|
||||||
"dns,resolv.conf,nslookup,nameserver,dig"
|
|
||||||
|
|
||||||
# Test Scenario 4: Service/Process Issues
|
|
||||||
run_test "Service Issues - High Load" \
|
|
||||||
"System load average is consistently above 10.0 even when CPU usage appears normal. Applications are responding slowly and I notice high wait times. The server feels sluggish overall." \
|
|
||||||
"load,average,cpu,iostat,vmstat,processes"
|
|
||||||
|
|
||||||
# Test Scenario 5: File System Issues
|
|
||||||
run_test "Filesystem Issues - Permission Problems" \
|
|
||||||
"Web server returns 403 Forbidden errors for all pages. Files exist and seem readable, but nginx logs show permission denied errors. SELinux is disabled and file permissions look correct." \
|
|
||||||
"permission,403,nginx,chmod,chown,selinux"
|
|
||||||
|
|
||||||
# Test Scenario 6: Boot/System Issues
|
|
||||||
run_test "Boot Issues - Kernel Module" \
|
|
||||||
"System boots but some hardware devices are not working. Network interface shows as down, USB devices are not recognized, and dmesg shows module loading failures." \
|
|
||||||
"module,lsmod,dmesg,hardware,interface,usb"
|
|
||||||
|
|
||||||
# Test Scenario 7: Performance Issues
|
|
||||||
run_test "Performance Issues - I/O Bottleneck" \
|
|
||||||
"Database queries are extremely slow, taking 30+ seconds for simple SELECT statements. Disk activity LED is constantly on and system feels unresponsive during database operations." \
|
|
||||||
"iostat,iotop,disk,database,slow,performance"
|
|
||||||
|
|
||||||
echo -e "${BLUE}🏁 Integration Tests Complete${NC}"
|
|
||||||
echo ""
|
|
||||||
echo "Check individual test logs in: $TEST_DIR"
|
|
||||||
echo ""
|
|
||||||
echo -e "${YELLOW}💡 Tips:${NC}"
|
|
||||||
echo "- Tests use realistic scenarios that could occur on production systems"
|
|
||||||
echo "- Each test expects the AI to suggest relevant diagnostic commands"
|
|
||||||
echo "- Review the full logs to see the complete diagnostic conversation"
|
|
||||||
echo "- Tests timeout after 120 seconds to prevent hanging"
|
|
||||||
echo "- Make sure NANNYAPI_ENDPOINT and NANNYAPI_MODEL are set correctly"
|
|
||||||
510
internal/auth/auth.go
Normal file
510
internal/auth/auth.go
Normal file
@@ -0,0 +1,510 @@
|
|||||||
|
package auth
|
||||||
|
|
||||||
|
import (
|
||||||
|
"bytes"
|
||||||
|
"encoding/base64"
|
||||||
|
"encoding/json"
|
||||||
|
"fmt"
|
||||||
|
"io"
|
||||||
|
"net/http"
|
||||||
|
"os"
|
||||||
|
"path/filepath"
|
||||||
|
"strings"
|
||||||
|
"time"
|
||||||
|
|
||||||
|
"nannyagentv2/internal/config"
|
||||||
|
"nannyagentv2/internal/logging"
|
||||||
|
"nannyagentv2/internal/types"
|
||||||
|
)
|
||||||
|
|
||||||
|
const (
|
||||||
|
// Token storage location (secure directory)
|
||||||
|
TokenStorageDir = "/var/lib/nannyagent"
|
||||||
|
TokenStorageFile = ".agent_token.json"
|
||||||
|
RefreshTokenFile = ".refresh_token"
|
||||||
|
|
||||||
|
// Polling configuration
|
||||||
|
MaxPollAttempts = 60 // 5 minutes (60 * 5 seconds)
|
||||||
|
PollInterval = 5 * time.Second
|
||||||
|
)
|
||||||
|
|
||||||
|
// AuthManager handles all authentication-related operations
|
||||||
|
type AuthManager struct {
|
||||||
|
config *config.Config
|
||||||
|
client *http.Client
|
||||||
|
}
|
||||||
|
|
||||||
|
// NewAuthManager creates a new authentication manager
|
||||||
|
func NewAuthManager(cfg *config.Config) *AuthManager {
|
||||||
|
return &AuthManager{
|
||||||
|
config: cfg,
|
||||||
|
client: &http.Client{
|
||||||
|
Timeout: 30 * time.Second,
|
||||||
|
},
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// EnsureTokenStorageDir creates the token storage directory if it doesn't exist
|
||||||
|
func (am *AuthManager) EnsureTokenStorageDir() error {
|
||||||
|
// Check if running as root
|
||||||
|
if os.Geteuid() != 0 {
|
||||||
|
return fmt.Errorf("must run as root to create secure token storage directory")
|
||||||
|
}
|
||||||
|
|
||||||
|
// Create directory with restricted permissions (0700 - only root can access)
|
||||||
|
if err := os.MkdirAll(TokenStorageDir, 0700); err != nil {
|
||||||
|
return fmt.Errorf("failed to create token storage directory: %w", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
return nil
|
||||||
|
}
|
||||||
|
|
||||||
|
// StartDeviceAuthorization initiates the OAuth device authorization flow
|
||||||
|
func (am *AuthManager) StartDeviceAuthorization() (*types.DeviceAuthResponse, error) {
|
||||||
|
payload := map[string]interface{}{
|
||||||
|
"client_id": "nannyagent-cli",
|
||||||
|
"scope": []string{"agent:register"},
|
||||||
|
}
|
||||||
|
|
||||||
|
jsonData, err := json.Marshal(payload)
|
||||||
|
if err != nil {
|
||||||
|
return nil, fmt.Errorf("failed to marshal payload: %w", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
url := fmt.Sprintf("%s/device/authorize", am.config.DeviceAuthURL)
|
||||||
|
req, err := http.NewRequest("POST", url, bytes.NewBuffer(jsonData))
|
||||||
|
if err != nil {
|
||||||
|
return nil, fmt.Errorf("failed to create request: %w", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
req.Header.Set("Content-Type", "application/json")
|
||||||
|
|
||||||
|
resp, err := am.client.Do(req)
|
||||||
|
if err != nil {
|
||||||
|
return nil, fmt.Errorf("failed to start device authorization: %w", err)
|
||||||
|
}
|
||||||
|
defer resp.Body.Close()
|
||||||
|
|
||||||
|
body, err := io.ReadAll(resp.Body)
|
||||||
|
if err != nil {
|
||||||
|
return nil, fmt.Errorf("failed to read response body: %w", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
if resp.StatusCode != http.StatusOK {
|
||||||
|
return nil, fmt.Errorf("device authorization failed with status %d: %s", resp.StatusCode, string(body))
|
||||||
|
}
|
||||||
|
|
||||||
|
var deviceResp types.DeviceAuthResponse
|
||||||
|
if err := json.Unmarshal(body, &deviceResp); err != nil {
|
||||||
|
return nil, fmt.Errorf("failed to parse response: %w", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
return &deviceResp, nil
|
||||||
|
}
|
||||||
|
|
||||||
|
// PollForToken polls the token endpoint until authorization is complete
|
||||||
|
func (am *AuthManager) PollForToken(deviceCode string) (*types.TokenResponse, error) {
|
||||||
|
logging.Info("Waiting for user authorization...")
|
||||||
|
|
||||||
|
for attempts := 0; attempts < MaxPollAttempts; attempts++ {
|
||||||
|
tokenReq := types.TokenRequest{
|
||||||
|
GrantType: "urn:ietf:params:oauth:grant-type:device_code",
|
||||||
|
DeviceCode: deviceCode,
|
||||||
|
}
|
||||||
|
|
||||||
|
jsonData, err := json.Marshal(tokenReq)
|
||||||
|
if err != nil {
|
||||||
|
return nil, fmt.Errorf("failed to marshal token request: %w", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
url := fmt.Sprintf("%s/token", am.config.DeviceAuthURL)
|
||||||
|
req, err := http.NewRequest("POST", url, bytes.NewBuffer(jsonData))
|
||||||
|
if err != nil {
|
||||||
|
return nil, fmt.Errorf("failed to create token request: %w", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
req.Header.Set("Content-Type", "application/json")
|
||||||
|
|
||||||
|
resp, err := am.client.Do(req)
|
||||||
|
if err != nil {
|
||||||
|
return nil, fmt.Errorf("failed to poll for token: %w", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
body, err := io.ReadAll(resp.Body)
|
||||||
|
resp.Body.Close()
|
||||||
|
|
||||||
|
if err != nil {
|
||||||
|
return nil, fmt.Errorf("failed to read token response: %w", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
var tokenResp types.TokenResponse
|
||||||
|
if err := json.Unmarshal(body, &tokenResp); err != nil {
|
||||||
|
return nil, fmt.Errorf("failed to parse token response: %w", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
if tokenResp.Error != "" {
|
||||||
|
if tokenResp.Error == "authorization_pending" {
|
||||||
|
fmt.Print(".")
|
||||||
|
time.Sleep(PollInterval)
|
||||||
|
continue
|
||||||
|
}
|
||||||
|
return nil, fmt.Errorf("authorization failed: %s", tokenResp.ErrorDescription)
|
||||||
|
}
|
||||||
|
|
||||||
|
if tokenResp.AccessToken != "" {
|
||||||
|
logging.Info("Authorization successful!")
|
||||||
|
return &tokenResp, nil
|
||||||
|
}
|
||||||
|
|
||||||
|
time.Sleep(PollInterval)
|
||||||
|
}
|
||||||
|
|
||||||
|
return nil, fmt.Errorf("authorization timed out after %d attempts", MaxPollAttempts)
|
||||||
|
}
|
||||||
|
|
||||||
|
// RefreshAccessToken refreshes an expired access token using the refresh token
|
||||||
|
func (am *AuthManager) RefreshAccessToken(refreshToken string) (*types.TokenResponse, error) {
|
||||||
|
tokenReq := types.TokenRequest{
|
||||||
|
GrantType: "refresh_token",
|
||||||
|
RefreshToken: refreshToken,
|
||||||
|
}
|
||||||
|
|
||||||
|
jsonData, err := json.Marshal(tokenReq)
|
||||||
|
if err != nil {
|
||||||
|
return nil, fmt.Errorf("failed to marshal refresh request: %w", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
url := fmt.Sprintf("%s/token", am.config.DeviceAuthURL)
|
||||||
|
req, err := http.NewRequest("POST", url, bytes.NewBuffer(jsonData))
|
||||||
|
if err != nil {
|
||||||
|
return nil, fmt.Errorf("failed to create refresh request: %w", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
req.Header.Set("Content-Type", "application/json")
|
||||||
|
|
||||||
|
resp, err := am.client.Do(req)
|
||||||
|
if err != nil {
|
||||||
|
return nil, fmt.Errorf("failed to refresh token: %w", err)
|
||||||
|
}
|
||||||
|
defer resp.Body.Close()
|
||||||
|
|
||||||
|
body, err := io.ReadAll(resp.Body)
|
||||||
|
if err != nil {
|
||||||
|
return nil, fmt.Errorf("failed to read refresh response: %w", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
if resp.StatusCode != http.StatusOK {
|
||||||
|
return nil, fmt.Errorf("token refresh failed with status %d: %s", resp.StatusCode, string(body))
|
||||||
|
}
|
||||||
|
|
||||||
|
var tokenResp types.TokenResponse
|
||||||
|
if err := json.Unmarshal(body, &tokenResp); err != nil {
|
||||||
|
return nil, fmt.Errorf("failed to parse refresh response: %w", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
if tokenResp.Error != "" {
|
||||||
|
return nil, fmt.Errorf("token refresh failed: %s", tokenResp.ErrorDescription)
|
||||||
|
}
|
||||||
|
|
||||||
|
return &tokenResp, nil
|
||||||
|
}
|
||||||
|
|
||||||
|
// SaveToken saves the authentication token to secure local storage
|
||||||
|
func (am *AuthManager) SaveToken(token *types.AuthToken) error {
|
||||||
|
if err := am.EnsureTokenStorageDir(); err != nil {
|
||||||
|
return fmt.Errorf("failed to ensure token storage directory: %w", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Save main token file
|
||||||
|
tokenPath := am.getTokenPath()
|
||||||
|
jsonData, err := json.MarshalIndent(token, "", " ")
|
||||||
|
if err != nil {
|
||||||
|
return fmt.Errorf("failed to marshal token: %w", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
if err := os.WriteFile(tokenPath, jsonData, 0600); err != nil {
|
||||||
|
return fmt.Errorf("failed to save token: %w", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Also save refresh token separately for backup recovery
|
||||||
|
if token.RefreshToken != "" {
|
||||||
|
refreshTokenPath := filepath.Join(TokenStorageDir, RefreshTokenFile)
|
||||||
|
if err := os.WriteFile(refreshTokenPath, []byte(token.RefreshToken), 0600); err != nil {
|
||||||
|
// Don't fail if refresh token backup fails, just log
|
||||||
|
logging.Warning("Failed to save backup refresh token: %v", err)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
return nil
|
||||||
|
} // LoadToken loads the authentication token from secure local storage
|
||||||
|
func (am *AuthManager) LoadToken() (*types.AuthToken, error) {
|
||||||
|
tokenPath := am.getTokenPath()
|
||||||
|
|
||||||
|
data, err := os.ReadFile(tokenPath)
|
||||||
|
if err != nil {
|
||||||
|
return nil, fmt.Errorf("failed to read token file: %w", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
var token types.AuthToken
|
||||||
|
if err := json.Unmarshal(data, &token); err != nil {
|
||||||
|
return nil, fmt.Errorf("failed to parse token: %w", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Check if token is expired
|
||||||
|
if time.Now().After(token.ExpiresAt.Add(-5 * time.Minute)) {
|
||||||
|
return nil, fmt.Errorf("token is expired or expiring soon")
|
||||||
|
}
|
||||||
|
|
||||||
|
return &token, nil
|
||||||
|
}
|
||||||
|
|
||||||
|
// IsTokenExpired checks if a token needs refresh
|
||||||
|
func (am *AuthManager) IsTokenExpired(token *types.AuthToken) bool {
|
||||||
|
// Consider token expired if it expires within the next 5 minutes
|
||||||
|
return time.Now().After(token.ExpiresAt.Add(-5 * time.Minute))
|
||||||
|
}
|
||||||
|
|
||||||
|
// RegisterDevice performs the complete device registration flow
|
||||||
|
func (am *AuthManager) RegisterDevice() (*types.AuthToken, error) {
|
||||||
|
// Step 1: Start device authorization
|
||||||
|
deviceAuth, err := am.StartDeviceAuthorization()
|
||||||
|
if err != nil {
|
||||||
|
return nil, fmt.Errorf("failed to start device authorization: %w", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
logging.Info("Please visit: %s", deviceAuth.VerificationURI)
|
||||||
|
logging.Info("And enter code: %s", deviceAuth.UserCode)
|
||||||
|
|
||||||
|
// Step 2: Poll for token
|
||||||
|
tokenResp, err := am.PollForToken(deviceAuth.DeviceCode)
|
||||||
|
if err != nil {
|
||||||
|
return nil, fmt.Errorf("failed to get token: %w", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Step 3: Create token storage
|
||||||
|
token := &types.AuthToken{
|
||||||
|
AccessToken: tokenResp.AccessToken,
|
||||||
|
RefreshToken: tokenResp.RefreshToken,
|
||||||
|
TokenType: tokenResp.TokenType,
|
||||||
|
ExpiresAt: time.Now().Add(time.Duration(tokenResp.ExpiresIn) * time.Second),
|
||||||
|
AgentID: tokenResp.AgentID,
|
||||||
|
}
|
||||||
|
|
||||||
|
// Step 4: Save token
|
||||||
|
if err := am.SaveToken(token); err != nil {
|
||||||
|
return nil, fmt.Errorf("failed to save token: %w", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
return token, nil
|
||||||
|
}
|
||||||
|
|
||||||
|
// EnsureAuthenticated ensures the agent has a valid token, refreshing if necessary
|
||||||
|
func (am *AuthManager) EnsureAuthenticated() (*types.AuthToken, error) {
|
||||||
|
// Try to load existing token
|
||||||
|
token, err := am.LoadToken()
|
||||||
|
if err == nil && !am.IsTokenExpired(token) {
|
||||||
|
return token, nil
|
||||||
|
}
|
||||||
|
|
||||||
|
// Try to refresh with existing refresh token (even if access token is missing/expired)
|
||||||
|
var refreshToken string
|
||||||
|
if err == nil && token.RefreshToken != "" {
|
||||||
|
// Use refresh token from loaded token
|
||||||
|
refreshToken = token.RefreshToken
|
||||||
|
} else {
|
||||||
|
// Try to load refresh token from main token file even if load failed
|
||||||
|
if existingToken, loadErr := am.loadTokenIgnoringExpiry(); loadErr == nil && existingToken.RefreshToken != "" {
|
||||||
|
refreshToken = existingToken.RefreshToken
|
||||||
|
} else {
|
||||||
|
// Try to load refresh token from backup file
|
||||||
|
if backupRefreshToken, backupErr := am.loadRefreshTokenFromBackup(); backupErr == nil {
|
||||||
|
refreshToken = backupRefreshToken
|
||||||
|
logging.Debug("Found backup refresh token, attempting to use it...")
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
if refreshToken != "" {
|
||||||
|
logging.Debug("Attempting to refresh access token...")
|
||||||
|
|
||||||
|
refreshResp, refreshErr := am.RefreshAccessToken(refreshToken)
|
||||||
|
if refreshErr == nil {
|
||||||
|
// Get existing agent_id from current token or backup
|
||||||
|
var agentID string
|
||||||
|
if err == nil && token.AgentID != "" {
|
||||||
|
agentID = token.AgentID
|
||||||
|
} else if existingToken, loadErr := am.loadTokenIgnoringExpiry(); loadErr == nil {
|
||||||
|
agentID = existingToken.AgentID
|
||||||
|
}
|
||||||
|
|
||||||
|
// Create new token with refreshed values
|
||||||
|
newToken := &types.AuthToken{
|
||||||
|
AccessToken: refreshResp.AccessToken,
|
||||||
|
RefreshToken: refreshToken, // Keep existing refresh token
|
||||||
|
TokenType: refreshResp.TokenType,
|
||||||
|
ExpiresAt: time.Now().Add(time.Duration(refreshResp.ExpiresIn) * time.Second),
|
||||||
|
AgentID: agentID, // Preserve agent_id
|
||||||
|
}
|
||||||
|
|
||||||
|
// Update refresh token if a new one was provided
|
||||||
|
if refreshResp.RefreshToken != "" {
|
||||||
|
newToken.RefreshToken = refreshResp.RefreshToken
|
||||||
|
}
|
||||||
|
|
||||||
|
if saveErr := am.SaveToken(newToken); saveErr == nil {
|
||||||
|
return newToken, nil
|
||||||
|
}
|
||||||
|
} else {
|
||||||
|
fmt.Printf("⚠️ Token refresh failed: %v\n", refreshErr)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
fmt.Println("📝 Initiating new device registration...")
|
||||||
|
return am.RegisterDevice()
|
||||||
|
}
|
||||||
|
|
||||||
|
// loadTokenIgnoringExpiry loads token file without checking expiry
|
||||||
|
func (am *AuthManager) loadTokenIgnoringExpiry() (*types.AuthToken, error) {
|
||||||
|
tokenPath := am.getTokenPath()
|
||||||
|
|
||||||
|
data, err := os.ReadFile(tokenPath)
|
||||||
|
if err != nil {
|
||||||
|
return nil, fmt.Errorf("failed to read token file: %w", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
var token types.AuthToken
|
||||||
|
if err := json.Unmarshal(data, &token); err != nil {
|
||||||
|
return nil, fmt.Errorf("failed to parse token: %w", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
return &token, nil
|
||||||
|
}
|
||||||
|
|
||||||
|
// loadRefreshTokenFromBackup tries to load refresh token from backup file
|
||||||
|
func (am *AuthManager) loadRefreshTokenFromBackup() (string, error) {
|
||||||
|
refreshTokenPath := filepath.Join(TokenStorageDir, RefreshTokenFile)
|
||||||
|
|
||||||
|
data, err := os.ReadFile(refreshTokenPath)
|
||||||
|
if err != nil {
|
||||||
|
return "", fmt.Errorf("failed to read refresh token backup: %w", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
refreshToken := strings.TrimSpace(string(data))
|
||||||
|
if refreshToken == "" {
|
||||||
|
return "", fmt.Errorf("refresh token backup is empty")
|
||||||
|
}
|
||||||
|
|
||||||
|
return refreshToken, nil
|
||||||
|
}
|
||||||
|
|
||||||
|
// GetCurrentAgentID retrieves the agent ID from cache or JWT token
|
||||||
|
func (am *AuthManager) GetCurrentAgentID() (string, error) {
|
||||||
|
// First try to read from local cache
|
||||||
|
agentID, err := am.loadCachedAgentID()
|
||||||
|
if err == nil && agentID != "" {
|
||||||
|
return agentID, nil
|
||||||
|
}
|
||||||
|
|
||||||
|
// Cache miss - extract from JWT token and cache it
|
||||||
|
token, err := am.LoadToken()
|
||||||
|
if err != nil {
|
||||||
|
return "", fmt.Errorf("failed to load token: %w", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Extract agent ID from JWT 'sub' field
|
||||||
|
agentID, err = am.extractAgentIDFromJWT(token.AccessToken)
|
||||||
|
if err != nil {
|
||||||
|
return "", fmt.Errorf("failed to extract agent ID from JWT: %w", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Cache the agent ID for future use
|
||||||
|
if err := am.cacheAgentID(agentID); err != nil {
|
||||||
|
// Log warning but don't fail - we still have the agent ID
|
||||||
|
fmt.Printf("Warning: Failed to cache agent ID: %v\n", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
return agentID, nil
|
||||||
|
}
|
||||||
|
|
||||||
|
// extractAgentIDFromJWT decodes the JWT token and extracts the agent ID from 'sub' field
|
||||||
|
func (am *AuthManager) extractAgentIDFromJWT(tokenString string) (string, error) {
|
||||||
|
// Basic JWT decoding without verification (since we trust Supabase)
|
||||||
|
parts := strings.Split(tokenString, ".")
|
||||||
|
if len(parts) != 3 {
|
||||||
|
return "", fmt.Errorf("invalid JWT token format")
|
||||||
|
}
|
||||||
|
|
||||||
|
// Decode the payload (second part)
|
||||||
|
payload := parts[1]
|
||||||
|
|
||||||
|
// Add padding if needed for base64 decoding
|
||||||
|
for len(payload)%4 != 0 {
|
||||||
|
payload += "="
|
||||||
|
}
|
||||||
|
|
||||||
|
decoded, err := base64.URLEncoding.DecodeString(payload)
|
||||||
|
if err != nil {
|
||||||
|
return "", fmt.Errorf("failed to decode JWT payload: %w", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Parse JSON payload
|
||||||
|
var claims map[string]interface{}
|
||||||
|
if err := json.Unmarshal(decoded, &claims); err != nil {
|
||||||
|
return "", fmt.Errorf("failed to parse JWT claims: %w", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
// The agent ID is in the 'sub' field (subject)
|
||||||
|
if agentID, ok := claims["sub"].(string); ok && agentID != "" {
|
||||||
|
return agentID, nil
|
||||||
|
}
|
||||||
|
|
||||||
|
return "", fmt.Errorf("agent ID (sub) not found in JWT claims")
|
||||||
|
}
|
||||||
|
|
||||||
|
// loadCachedAgentID reads the cached agent ID from local storage
|
||||||
|
func (am *AuthManager) loadCachedAgentID() (string, error) {
|
||||||
|
agentIDPath := filepath.Join(TokenStorageDir, "agent_id")
|
||||||
|
|
||||||
|
data, err := os.ReadFile(agentIDPath)
|
||||||
|
if err != nil {
|
||||||
|
return "", fmt.Errorf("failed to read cached agent ID: %w", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
agentID := strings.TrimSpace(string(data))
|
||||||
|
if agentID == "" {
|
||||||
|
return "", fmt.Errorf("cached agent ID is empty")
|
||||||
|
}
|
||||||
|
|
||||||
|
return agentID, nil
|
||||||
|
}
|
||||||
|
|
||||||
|
// cacheAgentID stores the agent ID in local cache
|
||||||
|
func (am *AuthManager) cacheAgentID(agentID string) error {
|
||||||
|
// Ensure the directory exists
|
||||||
|
if err := am.EnsureTokenStorageDir(); err != nil {
|
||||||
|
return fmt.Errorf("failed to ensure storage directory: %w", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
agentIDPath := filepath.Join(TokenStorageDir, "agent_id")
|
||||||
|
|
||||||
|
// Write agent ID to file with secure permissions
|
||||||
|
if err := os.WriteFile(agentIDPath, []byte(agentID), 0600); err != nil {
|
||||||
|
return fmt.Errorf("failed to write agent ID cache: %w", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
return nil
|
||||||
|
}
|
||||||
|
|
||||||
|
func (am *AuthManager) getTokenPath() string {
|
||||||
|
if am.config.TokenPath != "" {
|
||||||
|
return am.config.TokenPath
|
||||||
|
}
|
||||||
|
return filepath.Join(TokenStorageDir, TokenStorageFile)
|
||||||
|
}
|
||||||
|
|
||||||
|
func getHostname() string {
|
||||||
|
if hostname, err := os.Hostname(); err == nil {
|
||||||
|
return hostname
|
||||||
|
}
|
||||||
|
return "unknown"
|
||||||
|
}
|
||||||
157
internal/config/config.go
Normal file
157
internal/config/config.go
Normal file
@@ -0,0 +1,157 @@
|
|||||||
|
package config
|
||||||
|
|
||||||
|
import (
|
||||||
|
"fmt"
|
||||||
|
"os"
|
||||||
|
"path/filepath"
|
||||||
|
"strings"
|
||||||
|
|
||||||
|
"nannyagentv2/internal/logging"
|
||||||
|
|
||||||
|
"github.com/joho/godotenv"
|
||||||
|
)
|
||||||
|
|
||||||
|
type Config struct {
|
||||||
|
// Supabase Configuration
|
||||||
|
SupabaseProjectURL string
|
||||||
|
|
||||||
|
// Edge Function Endpoints (auto-generated from SupabaseProjectURL)
|
||||||
|
DeviceAuthURL string
|
||||||
|
AgentAuthURL string
|
||||||
|
|
||||||
|
// Agent Configuration
|
||||||
|
TokenPath string
|
||||||
|
MetricsInterval int
|
||||||
|
|
||||||
|
// Debug/Development
|
||||||
|
Debug bool
|
||||||
|
}
|
||||||
|
|
||||||
|
var DefaultConfig = Config{
|
||||||
|
TokenPath: "./token.json",
|
||||||
|
MetricsInterval: 30,
|
||||||
|
Debug: false,
|
||||||
|
}
|
||||||
|
|
||||||
|
// LoadConfig loads configuration from environment variables and .env file
|
||||||
|
func LoadConfig() (*Config, error) {
|
||||||
|
config := DefaultConfig
|
||||||
|
|
||||||
|
// Priority order for loading configuration:
|
||||||
|
// 1. /etc/nannyagent/config.env (system-wide installation)
|
||||||
|
// 2. Current directory .env file (development)
|
||||||
|
// 3. Parent directory .env file (development)
|
||||||
|
|
||||||
|
configLoaded := false
|
||||||
|
|
||||||
|
// Try system-wide config first
|
||||||
|
if _, err := os.Stat("/etc/nannyagent/config.env"); err == nil {
|
||||||
|
if err := godotenv.Load("/etc/nannyagent/config.env"); err != nil {
|
||||||
|
logging.Warning("Could not load /etc/nannyagent/config.env: %v", err)
|
||||||
|
} else {
|
||||||
|
logging.Info("Loaded configuration from /etc/nannyagent/config.env")
|
||||||
|
configLoaded = true
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// If system config not found, try local .env file
|
||||||
|
if !configLoaded {
|
||||||
|
envFile := findEnvFile()
|
||||||
|
if envFile != "" {
|
||||||
|
if err := godotenv.Load(envFile); err != nil {
|
||||||
|
logging.Warning("Could not load .env file from %s: %v", envFile, err)
|
||||||
|
} else {
|
||||||
|
logging.Info("Loaded configuration from %s", envFile)
|
||||||
|
configLoaded = true
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
if !configLoaded {
|
||||||
|
logging.Warning("No configuration file found. Using environment variables only.")
|
||||||
|
}
|
||||||
|
|
||||||
|
// Load from environment variables
|
||||||
|
if url := os.Getenv("SUPABASE_PROJECT_URL"); url != "" {
|
||||||
|
config.SupabaseProjectURL = url
|
||||||
|
}
|
||||||
|
|
||||||
|
if tokenPath := os.Getenv("TOKEN_PATH"); tokenPath != "" {
|
||||||
|
config.TokenPath = tokenPath
|
||||||
|
}
|
||||||
|
|
||||||
|
if debug := os.Getenv("DEBUG"); debug == "true" || debug == "1" {
|
||||||
|
config.Debug = true
|
||||||
|
}
|
||||||
|
|
||||||
|
// Auto-generate edge function URLs from project URL
|
||||||
|
if config.SupabaseProjectURL != "" {
|
||||||
|
config.DeviceAuthURL = fmt.Sprintf("%s/functions/v1/device-auth", config.SupabaseProjectURL)
|
||||||
|
config.AgentAuthURL = fmt.Sprintf("%s/functions/v1/agent-auth-api", config.SupabaseProjectURL)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Validate required configuration
|
||||||
|
if err := config.Validate(); err != nil {
|
||||||
|
return nil, fmt.Errorf("configuration validation failed: %w", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
return &config, nil
|
||||||
|
}
|
||||||
|
|
||||||
|
// Validate checks if all required configuration is present
|
||||||
|
func (c *Config) Validate() error {
|
||||||
|
var missing []string
|
||||||
|
|
||||||
|
if c.SupabaseProjectURL == "" {
|
||||||
|
missing = append(missing, "SUPABASE_PROJECT_URL")
|
||||||
|
}
|
||||||
|
|
||||||
|
if c.DeviceAuthURL == "" {
|
||||||
|
missing = append(missing, "DEVICE_AUTH_URL (or SUPABASE_PROJECT_URL)")
|
||||||
|
}
|
||||||
|
|
||||||
|
if c.AgentAuthURL == "" {
|
||||||
|
missing = append(missing, "AGENT_AUTH_URL (or SUPABASE_PROJECT_URL)")
|
||||||
|
}
|
||||||
|
|
||||||
|
if len(missing) > 0 {
|
||||||
|
return fmt.Errorf("missing required environment variables: %s", strings.Join(missing, ", "))
|
||||||
|
}
|
||||||
|
|
||||||
|
return nil
|
||||||
|
}
|
||||||
|
|
||||||
|
// findEnvFile looks for .env file in current directory and parent directories
|
||||||
|
func findEnvFile() string {
|
||||||
|
dir, err := os.Getwd()
|
||||||
|
if err != nil {
|
||||||
|
return ""
|
||||||
|
}
|
||||||
|
|
||||||
|
for {
|
||||||
|
envPath := filepath.Join(dir, ".env")
|
||||||
|
if _, err := os.Stat(envPath); err == nil {
|
||||||
|
return envPath
|
||||||
|
}
|
||||||
|
|
||||||
|
parent := filepath.Dir(dir)
|
||||||
|
if parent == dir {
|
||||||
|
break
|
||||||
|
}
|
||||||
|
dir = parent
|
||||||
|
}
|
||||||
|
|
||||||
|
return ""
|
||||||
|
}
|
||||||
|
|
||||||
|
// PrintConfig prints the current configuration (masking sensitive values)
|
||||||
|
func (c *Config) PrintConfig() {
|
||||||
|
if !c.Debug {
|
||||||
|
return
|
||||||
|
}
|
||||||
|
|
||||||
|
logging.Debug("Configuration:")
|
||||||
|
logging.Debug(" Supabase Project URL: %s", c.SupabaseProjectURL)
|
||||||
|
logging.Debug(" Metrics Interval: %d seconds", c.MetricsInterval)
|
||||||
|
logging.Debug(" Debug: %v", c.Debug)
|
||||||
|
}
|
||||||
343
internal/ebpf/ebpf_event_parser.go
Normal file
343
internal/ebpf/ebpf_event_parser.go
Normal file
@@ -0,0 +1,343 @@
|
|||||||
|
package ebpf
|
||||||
|
|
||||||
|
import (
|
||||||
|
"bufio"
|
||||||
|
"io"
|
||||||
|
"regexp"
|
||||||
|
"strconv"
|
||||||
|
"strings"
|
||||||
|
"time"
|
||||||
|
)
|
||||||
|
|
||||||
|
// EventScanner parses bpftrace output and converts it to TraceEvent structs
|
||||||
|
type EventScanner struct {
|
||||||
|
scanner *bufio.Scanner
|
||||||
|
lastEvent *TraceEvent
|
||||||
|
lineRegex *regexp.Regexp
|
||||||
|
}
|
||||||
|
|
||||||
|
// NewEventScanner creates a new event scanner for parsing bpftrace output
|
||||||
|
func NewEventScanner(reader io.Reader) *EventScanner {
|
||||||
|
// Regex pattern to match our trace output format:
|
||||||
|
// TRACE|timestamp|pid|tid|comm|function|message
|
||||||
|
pattern := `^TRACE\|(\d+)\|(\d+)\|(\d+)\|([^|]+)\|([^|]+)\|(.*)$`
|
||||||
|
regex, _ := regexp.Compile(pattern)
|
||||||
|
|
||||||
|
return &EventScanner{
|
||||||
|
scanner: bufio.NewScanner(reader),
|
||||||
|
lineRegex: regex,
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// Scan advances the scanner to the next event
|
||||||
|
func (es *EventScanner) Scan() bool {
|
||||||
|
for es.scanner.Scan() {
|
||||||
|
line := strings.TrimSpace(es.scanner.Text())
|
||||||
|
|
||||||
|
// Skip empty lines and non-trace lines
|
||||||
|
if line == "" || !strings.HasPrefix(line, "TRACE|") {
|
||||||
|
continue
|
||||||
|
}
|
||||||
|
|
||||||
|
// Parse the trace line
|
||||||
|
if event := es.parseLine(line); event != nil {
|
||||||
|
es.lastEvent = event
|
||||||
|
return true
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
return false
|
||||||
|
}
|
||||||
|
|
||||||
|
// Event returns the most recently parsed event
|
||||||
|
func (es *EventScanner) Event() *TraceEvent {
|
||||||
|
return es.lastEvent
|
||||||
|
}
|
||||||
|
|
||||||
|
// Error returns any scanning error
|
||||||
|
func (es *EventScanner) Error() error {
|
||||||
|
return es.scanner.Err()
|
||||||
|
}
|
||||||
|
|
||||||
|
// parseLine parses a single trace line into a TraceEvent
|
||||||
|
func (es *EventScanner) parseLine(line string) *TraceEvent {
|
||||||
|
matches := es.lineRegex.FindStringSubmatch(line)
|
||||||
|
if len(matches) != 7 {
|
||||||
|
return nil
|
||||||
|
}
|
||||||
|
|
||||||
|
// Parse timestamp (nanoseconds)
|
||||||
|
timestamp, err := strconv.ParseInt(matches[1], 10, 64)
|
||||||
|
if err != nil {
|
||||||
|
return nil
|
||||||
|
}
|
||||||
|
|
||||||
|
// Parse PID
|
||||||
|
pid, err := strconv.Atoi(matches[2])
|
||||||
|
if err != nil {
|
||||||
|
return nil
|
||||||
|
}
|
||||||
|
|
||||||
|
// Parse TID
|
||||||
|
tid, err := strconv.Atoi(matches[3])
|
||||||
|
if err != nil {
|
||||||
|
return nil
|
||||||
|
}
|
||||||
|
|
||||||
|
// Extract process name, function, and message
|
||||||
|
processName := strings.TrimSpace(matches[4])
|
||||||
|
function := strings.TrimSpace(matches[5])
|
||||||
|
message := strings.TrimSpace(matches[6])
|
||||||
|
|
||||||
|
event := &TraceEvent{
|
||||||
|
Timestamp: timestamp,
|
||||||
|
PID: pid,
|
||||||
|
TID: tid,
|
||||||
|
ProcessName: processName,
|
||||||
|
Function: function,
|
||||||
|
Message: message,
|
||||||
|
RawArgs: make(map[string]string),
|
||||||
|
}
|
||||||
|
|
||||||
|
// Try to extract additional information from the message
|
||||||
|
es.enrichEvent(event, message)
|
||||||
|
|
||||||
|
return event
|
||||||
|
}
|
||||||
|
|
||||||
|
// enrichEvent extracts additional information from the message
|
||||||
|
func (es *EventScanner) enrichEvent(event *TraceEvent, message string) {
|
||||||
|
// Parse common patterns in messages to extract arguments
|
||||||
|
// This is a simplified version - in a real implementation you'd want more sophisticated parsing
|
||||||
|
|
||||||
|
// Look for patterns like "arg1=value, arg2=value"
|
||||||
|
argPattern := regexp.MustCompile(`(\w+)=([^,\s]+)`)
|
||||||
|
matches := argPattern.FindAllStringSubmatch(message, -1)
|
||||||
|
|
||||||
|
for _, match := range matches {
|
||||||
|
if len(match) == 3 {
|
||||||
|
event.RawArgs[match[1]] = match[2]
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// Look for numeric patterns that might be syscall arguments
|
||||||
|
numberPattern := regexp.MustCompile(`\b(\d+)\b`)
|
||||||
|
numbers := numberPattern.FindAllString(message, -1)
|
||||||
|
|
||||||
|
for i, num := range numbers {
|
||||||
|
argName := "arg" + strconv.Itoa(i+1)
|
||||||
|
event.RawArgs[argName] = num
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// TraceEventFilter provides filtering capabilities for trace events
|
||||||
|
type TraceEventFilter struct {
|
||||||
|
MinTimestamp int64
|
||||||
|
MaxTimestamp int64
|
||||||
|
ProcessNames []string
|
||||||
|
PIDs []int
|
||||||
|
UIDs []int
|
||||||
|
Functions []string
|
||||||
|
MessageFilter string
|
||||||
|
}
|
||||||
|
|
||||||
|
// ApplyFilter applies filters to a slice of events
|
||||||
|
func (filter *TraceEventFilter) ApplyFilter(events []TraceEvent) []TraceEvent {
|
||||||
|
if filter == nil {
|
||||||
|
return events
|
||||||
|
}
|
||||||
|
|
||||||
|
var filtered []TraceEvent
|
||||||
|
|
||||||
|
for _, event := range events {
|
||||||
|
if filter.matchesEvent(&event) {
|
||||||
|
filtered = append(filtered, event)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
return filtered
|
||||||
|
}
|
||||||
|
|
||||||
|
// matchesEvent checks if an event matches the filter criteria
|
||||||
|
func (filter *TraceEventFilter) matchesEvent(event *TraceEvent) bool {
|
||||||
|
// Check timestamp range
|
||||||
|
if filter.MinTimestamp > 0 && event.Timestamp < filter.MinTimestamp {
|
||||||
|
return false
|
||||||
|
}
|
||||||
|
if filter.MaxTimestamp > 0 && event.Timestamp > filter.MaxTimestamp {
|
||||||
|
return false
|
||||||
|
}
|
||||||
|
|
||||||
|
// Check process names
|
||||||
|
if len(filter.ProcessNames) > 0 {
|
||||||
|
found := false
|
||||||
|
for _, name := range filter.ProcessNames {
|
||||||
|
if strings.Contains(event.ProcessName, name) {
|
||||||
|
found = true
|
||||||
|
break
|
||||||
|
}
|
||||||
|
}
|
||||||
|
if !found {
|
||||||
|
return false
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// Check PIDs
|
||||||
|
if len(filter.PIDs) > 0 {
|
||||||
|
found := false
|
||||||
|
for _, pid := range filter.PIDs {
|
||||||
|
if event.PID == pid {
|
||||||
|
found = true
|
||||||
|
break
|
||||||
|
}
|
||||||
|
}
|
||||||
|
if !found {
|
||||||
|
return false
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// Check UIDs
|
||||||
|
if len(filter.UIDs) > 0 {
|
||||||
|
found := false
|
||||||
|
for _, uid := range filter.UIDs {
|
||||||
|
if event.UID == uid {
|
||||||
|
found = true
|
||||||
|
break
|
||||||
|
}
|
||||||
|
}
|
||||||
|
if !found {
|
||||||
|
return false
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// Check functions
|
||||||
|
if len(filter.Functions) > 0 {
|
||||||
|
found := false
|
||||||
|
for _, function := range filter.Functions {
|
||||||
|
if strings.Contains(event.Function, function) {
|
||||||
|
found = true
|
||||||
|
break
|
||||||
|
}
|
||||||
|
}
|
||||||
|
if !found {
|
||||||
|
return false
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// Check message filter
|
||||||
|
if filter.MessageFilter != "" {
|
||||||
|
if !strings.Contains(event.Message, filter.MessageFilter) {
|
||||||
|
return false
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
return true
|
||||||
|
}
|
||||||
|
|
||||||
|
// TraceEventAggregator provides aggregation capabilities for trace events
|
||||||
|
type TraceEventAggregator struct {
|
||||||
|
events []TraceEvent
|
||||||
|
}
|
||||||
|
|
||||||
|
// NewTraceEventAggregator creates a new event aggregator
|
||||||
|
func NewTraceEventAggregator(events []TraceEvent) *TraceEventAggregator {
|
||||||
|
return &TraceEventAggregator{
|
||||||
|
events: events,
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// CountByProcess returns event counts grouped by process
|
||||||
|
func (agg *TraceEventAggregator) CountByProcess() map[string]int {
|
||||||
|
counts := make(map[string]int)
|
||||||
|
for _, event := range agg.events {
|
||||||
|
counts[event.ProcessName]++
|
||||||
|
}
|
||||||
|
return counts
|
||||||
|
}
|
||||||
|
|
||||||
|
// CountByFunction returns event counts grouped by function
|
||||||
|
func (agg *TraceEventAggregator) CountByFunction() map[string]int {
|
||||||
|
counts := make(map[string]int)
|
||||||
|
for _, event := range agg.events {
|
||||||
|
counts[event.Function]++
|
||||||
|
}
|
||||||
|
return counts
|
||||||
|
}
|
||||||
|
|
||||||
|
// CountByPID returns event counts grouped by PID
|
||||||
|
func (agg *TraceEventAggregator) CountByPID() map[int]int {
|
||||||
|
counts := make(map[int]int)
|
||||||
|
for _, event := range agg.events {
|
||||||
|
counts[event.PID]++
|
||||||
|
}
|
||||||
|
return counts
|
||||||
|
}
|
||||||
|
|
||||||
|
// GetTimeRange returns the time range of events
|
||||||
|
func (agg *TraceEventAggregator) GetTimeRange() (int64, int64) {
|
||||||
|
if len(agg.events) == 0 {
|
||||||
|
return 0, 0
|
||||||
|
}
|
||||||
|
|
||||||
|
minTime := agg.events[0].Timestamp
|
||||||
|
maxTime := agg.events[0].Timestamp
|
||||||
|
|
||||||
|
for _, event := range agg.events {
|
||||||
|
if event.Timestamp < minTime {
|
||||||
|
minTime = event.Timestamp
|
||||||
|
}
|
||||||
|
if event.Timestamp > maxTime {
|
||||||
|
maxTime = event.Timestamp
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
return minTime, maxTime
|
||||||
|
}
|
||||||
|
|
||||||
|
// GetEventRate calculates events per second
|
||||||
|
func (agg *TraceEventAggregator) GetEventRate() float64 {
|
||||||
|
if len(agg.events) < 2 {
|
||||||
|
return 0
|
||||||
|
}
|
||||||
|
|
||||||
|
minTime, maxTime := agg.GetTimeRange()
|
||||||
|
durationNs := maxTime - minTime
|
||||||
|
durationSeconds := float64(durationNs) / float64(time.Second)
|
||||||
|
|
||||||
|
if durationSeconds == 0 {
|
||||||
|
return 0
|
||||||
|
}
|
||||||
|
|
||||||
|
return float64(len(agg.events)) / durationSeconds
|
||||||
|
}
|
||||||
|
|
||||||
|
// GetTopProcesses returns the most active processes
|
||||||
|
func (agg *TraceEventAggregator) GetTopProcesses(limit int) []ProcessStat {
|
||||||
|
counts := agg.CountByProcess()
|
||||||
|
total := len(agg.events)
|
||||||
|
|
||||||
|
var stats []ProcessStat
|
||||||
|
for processName, count := range counts {
|
||||||
|
percentage := float64(count) / float64(total) * 100
|
||||||
|
stats = append(stats, ProcessStat{
|
||||||
|
ProcessName: processName,
|
||||||
|
EventCount: count,
|
||||||
|
Percentage: percentage,
|
||||||
|
})
|
||||||
|
}
|
||||||
|
|
||||||
|
// Simple sorting by event count (bubble sort for simplicity)
|
||||||
|
for i := 0; i < len(stats); i++ {
|
||||||
|
for j := i + 1; j < len(stats); j++ {
|
||||||
|
if stats[j].EventCount > stats[i].EventCount {
|
||||||
|
stats[i], stats[j] = stats[j], stats[i]
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
if limit > 0 && limit < len(stats) {
|
||||||
|
stats = stats[:limit]
|
||||||
|
}
|
||||||
|
|
||||||
|
return stats
|
||||||
|
}
|
||||||
587
internal/ebpf/ebpf_trace_manager.go
Normal file
587
internal/ebpf/ebpf_trace_manager.go
Normal file
@@ -0,0 +1,587 @@
|
|||||||
|
package ebpf
|
||||||
|
|
||||||
|
import (
|
||||||
|
"context"
|
||||||
|
"fmt"
|
||||||
|
"io"
|
||||||
|
"os"
|
||||||
|
"os/exec"
|
||||||
|
"strings"
|
||||||
|
"sync"
|
||||||
|
"time"
|
||||||
|
|
||||||
|
"nannyagentv2/internal/logging"
|
||||||
|
)
|
||||||
|
|
||||||
|
// TraceSpec represents a trace specification similar to BCC trace.py
|
||||||
|
type TraceSpec struct {
|
||||||
|
// Probe type: "p" (kprobe), "r" (kretprobe), "t" (tracepoint), "u" (uprobe)
|
||||||
|
ProbeType string `json:"probe_type"`
|
||||||
|
|
||||||
|
// Target function/syscall/tracepoint
|
||||||
|
Target string `json:"target"`
|
||||||
|
|
||||||
|
// Library for userspace probes (empty for kernel)
|
||||||
|
Library string `json:"library,omitempty"`
|
||||||
|
|
||||||
|
// Format string for output (e.g., "read %d bytes", arg3)
|
||||||
|
Format string `json:"format"`
|
||||||
|
|
||||||
|
// Arguments to extract (e.g., ["arg1", "arg2", "retval"])
|
||||||
|
Arguments []string `json:"arguments"`
|
||||||
|
|
||||||
|
// Filter condition (e.g., "arg3 > 20000")
|
||||||
|
Filter string `json:"filter,omitempty"`
|
||||||
|
|
||||||
|
// Duration in seconds
|
||||||
|
Duration int `json:"duration"`
|
||||||
|
|
||||||
|
// Process ID filter (optional)
|
||||||
|
PID int `json:"pid,omitempty"`
|
||||||
|
|
||||||
|
// Thread ID filter (optional)
|
||||||
|
TID int `json:"tid,omitempty"`
|
||||||
|
|
||||||
|
// UID filter (optional)
|
||||||
|
UID int `json:"uid,omitempty"`
|
||||||
|
|
||||||
|
// Process name filter (optional)
|
||||||
|
ProcessName string `json:"process_name,omitempty"`
|
||||||
|
}
|
||||||
|
|
||||||
|
// TraceEvent represents a captured event from eBPF
|
||||||
|
type TraceEvent struct {
|
||||||
|
Timestamp int64 `json:"timestamp"`
|
||||||
|
PID int `json:"pid"`
|
||||||
|
TID int `json:"tid"`
|
||||||
|
UID int `json:"uid"`
|
||||||
|
ProcessName string `json:"process_name"`
|
||||||
|
Function string `json:"function"`
|
||||||
|
Message string `json:"message"`
|
||||||
|
RawArgs map[string]string `json:"raw_args"`
|
||||||
|
CPU int `json:"cpu,omitempty"`
|
||||||
|
}
|
||||||
|
|
||||||
|
// TraceResult represents the results of a tracing session
|
||||||
|
type TraceResult struct {
|
||||||
|
TraceID string `json:"trace_id"`
|
||||||
|
Spec TraceSpec `json:"spec"`
|
||||||
|
Events []TraceEvent `json:"events"`
|
||||||
|
EventCount int `json:"event_count"`
|
||||||
|
StartTime time.Time `json:"start_time"`
|
||||||
|
EndTime time.Time `json:"end_time"`
|
||||||
|
Summary string `json:"summary"`
|
||||||
|
Statistics TraceStats `json:"statistics"`
|
||||||
|
}
|
||||||
|
|
||||||
|
// TraceStats provides statistics about the trace
|
||||||
|
type TraceStats struct {
|
||||||
|
TotalEvents int `json:"total_events"`
|
||||||
|
EventsByProcess map[string]int `json:"events_by_process"`
|
||||||
|
EventsByUID map[int]int `json:"events_by_uid"`
|
||||||
|
EventsPerSecond float64 `json:"events_per_second"`
|
||||||
|
TopProcesses []ProcessStat `json:"top_processes"`
|
||||||
|
}
|
||||||
|
|
||||||
|
// ProcessStat represents statistics for a process
|
||||||
|
type ProcessStat struct {
|
||||||
|
ProcessName string `json:"process_name"`
|
||||||
|
PID int `json:"pid"`
|
||||||
|
EventCount int `json:"event_count"`
|
||||||
|
Percentage float64 `json:"percentage"`
|
||||||
|
}
|
||||||
|
|
||||||
|
// BCCTraceManager implements advanced eBPF tracing similar to BCC trace.py
|
||||||
|
type BCCTraceManager struct {
|
||||||
|
traces map[string]*RunningTrace
|
||||||
|
tracesLock sync.RWMutex
|
||||||
|
traceCounter int
|
||||||
|
capabilities map[string]bool
|
||||||
|
}
|
||||||
|
|
||||||
|
// RunningTrace represents an active trace session
|
||||||
|
type RunningTrace struct {
|
||||||
|
ID string
|
||||||
|
Spec TraceSpec
|
||||||
|
Process *exec.Cmd
|
||||||
|
Events []TraceEvent
|
||||||
|
StartTime time.Time
|
||||||
|
Cancel context.CancelFunc
|
||||||
|
Context context.Context
|
||||||
|
Done chan struct{} // Signal when trace monitoring is complete
|
||||||
|
}
|
||||||
|
|
||||||
|
// NewBCCTraceManager creates a new BCC-style trace manager
|
||||||
|
func NewBCCTraceManager() *BCCTraceManager {
|
||||||
|
manager := &BCCTraceManager{
|
||||||
|
traces: make(map[string]*RunningTrace),
|
||||||
|
capabilities: make(map[string]bool),
|
||||||
|
}
|
||||||
|
|
||||||
|
manager.testCapabilities()
|
||||||
|
return manager
|
||||||
|
}
|
||||||
|
|
||||||
|
// testCapabilities checks what tracing capabilities are available
|
||||||
|
func (tm *BCCTraceManager) testCapabilities() {
|
||||||
|
// Test if bpftrace is available
|
||||||
|
if _, err := exec.LookPath("bpftrace"); err == nil {
|
||||||
|
tm.capabilities["bpftrace"] = true
|
||||||
|
} else {
|
||||||
|
tm.capabilities["bpftrace"] = false
|
||||||
|
}
|
||||||
|
|
||||||
|
// Test if perf is available for fallback
|
||||||
|
if _, err := exec.LookPath("perf"); err == nil {
|
||||||
|
tm.capabilities["perf"] = true
|
||||||
|
} else {
|
||||||
|
tm.capabilities["perf"] = false
|
||||||
|
}
|
||||||
|
|
||||||
|
// Test root privileges (required for eBPF)
|
||||||
|
tm.capabilities["root_access"] = os.Geteuid() == 0
|
||||||
|
|
||||||
|
// Test kernel version
|
||||||
|
cmd := exec.Command("uname", "-r")
|
||||||
|
output, err := cmd.Output()
|
||||||
|
if err == nil {
|
||||||
|
version := strings.TrimSpace(string(output))
|
||||||
|
// eBPF requires kernel 4.4+
|
||||||
|
tm.capabilities["kernel_ebpf"] = !strings.HasPrefix(version, "3.")
|
||||||
|
} else {
|
||||||
|
tm.capabilities["kernel_ebpf"] = false
|
||||||
|
}
|
||||||
|
|
||||||
|
// Test if we can access debugfs
|
||||||
|
if _, err := os.Stat("/sys/kernel/debug/tracing/available_events"); err == nil {
|
||||||
|
tm.capabilities["debugfs_access"] = true
|
||||||
|
} else {
|
||||||
|
tm.capabilities["debugfs_access"] = false
|
||||||
|
}
|
||||||
|
|
||||||
|
logging.Debug("BCC Trace capabilities: %+v", tm.capabilities)
|
||||||
|
}
|
||||||
|
|
||||||
|
// GetCapabilities returns available tracing capabilities
|
||||||
|
func (tm *BCCTraceManager) GetCapabilities() map[string]bool {
|
||||||
|
tm.tracesLock.RLock()
|
||||||
|
defer tm.tracesLock.RUnlock()
|
||||||
|
|
||||||
|
caps := make(map[string]bool)
|
||||||
|
for k, v := range tm.capabilities {
|
||||||
|
caps[k] = v
|
||||||
|
}
|
||||||
|
return caps
|
||||||
|
}
|
||||||
|
|
||||||
|
// StartTrace starts a new trace session based on the specification
|
||||||
|
func (tm *BCCTraceManager) StartTrace(spec TraceSpec) (string, error) {
|
||||||
|
if !tm.capabilities["bpftrace"] {
|
||||||
|
return "", fmt.Errorf("bpftrace not available - install bpftrace package")
|
||||||
|
}
|
||||||
|
|
||||||
|
if !tm.capabilities["root_access"] {
|
||||||
|
return "", fmt.Errorf("root access required for eBPF tracing")
|
||||||
|
}
|
||||||
|
|
||||||
|
if !tm.capabilities["kernel_ebpf"] {
|
||||||
|
return "", fmt.Errorf("kernel version does not support eBPF")
|
||||||
|
}
|
||||||
|
|
||||||
|
tm.tracesLock.Lock()
|
||||||
|
defer tm.tracesLock.Unlock()
|
||||||
|
|
||||||
|
// Generate trace ID
|
||||||
|
tm.traceCounter++
|
||||||
|
traceID := fmt.Sprintf("trace_%d", tm.traceCounter)
|
||||||
|
|
||||||
|
// Generate bpftrace script
|
||||||
|
script, err := tm.generateBpftraceScript(spec)
|
||||||
|
if err != nil {
|
||||||
|
return "", fmt.Errorf("failed to generate bpftrace script: %w", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Debug: log the generated script
|
||||||
|
logging.Debug("Generated bpftrace script for %s:\n%s", spec.Target, script)
|
||||||
|
|
||||||
|
// Create context with timeout
|
||||||
|
ctx, cancel := context.WithTimeout(context.Background(), time.Duration(spec.Duration)*time.Second)
|
||||||
|
|
||||||
|
// Start bpftrace process
|
||||||
|
cmd := exec.CommandContext(ctx, "bpftrace", "-e", script)
|
||||||
|
|
||||||
|
// Create stdout pipe BEFORE starting
|
||||||
|
stdout, err := cmd.StdoutPipe()
|
||||||
|
if err != nil {
|
||||||
|
cancel()
|
||||||
|
return "", fmt.Errorf("failed to create stdout pipe: %w", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
trace := &RunningTrace{
|
||||||
|
ID: traceID,
|
||||||
|
Spec: spec,
|
||||||
|
Process: cmd,
|
||||||
|
Events: []TraceEvent{},
|
||||||
|
StartTime: time.Now(),
|
||||||
|
Cancel: cancel,
|
||||||
|
Context: ctx,
|
||||||
|
Done: make(chan struct{}), // Initialize completion signal
|
||||||
|
}
|
||||||
|
|
||||||
|
// Start the trace
|
||||||
|
if err := cmd.Start(); err != nil {
|
||||||
|
cancel()
|
||||||
|
return "", fmt.Errorf("failed to start bpftrace: %w", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
tm.traces[traceID] = trace
|
||||||
|
|
||||||
|
// Monitor the trace in a goroutine
|
||||||
|
go tm.monitorTrace(traceID, stdout)
|
||||||
|
|
||||||
|
logging.Debug("Started BCC-style trace %s for target %s", traceID, spec.Target)
|
||||||
|
return traceID, nil
|
||||||
|
} // generateBpftraceScript generates a bpftrace script based on the trace specification
|
||||||
|
func (tm *BCCTraceManager) generateBpftraceScript(spec TraceSpec) (string, error) {
|
||||||
|
var script strings.Builder
|
||||||
|
|
||||||
|
// Build probe specification
|
||||||
|
var probe string
|
||||||
|
switch spec.ProbeType {
|
||||||
|
case "p", "": // kprobe (default)
|
||||||
|
if strings.HasPrefix(spec.Target, "sys_") || strings.HasPrefix(spec.Target, "__x64_sys_") {
|
||||||
|
probe = fmt.Sprintf("kprobe:%s", spec.Target)
|
||||||
|
} else {
|
||||||
|
probe = fmt.Sprintf("kprobe:%s", spec.Target)
|
||||||
|
}
|
||||||
|
case "r": // kretprobe
|
||||||
|
if strings.HasPrefix(spec.Target, "sys_") || strings.HasPrefix(spec.Target, "__x64_sys_") {
|
||||||
|
probe = fmt.Sprintf("kretprobe:%s", spec.Target)
|
||||||
|
} else {
|
||||||
|
probe = fmt.Sprintf("kretprobe:%s", spec.Target)
|
||||||
|
}
|
||||||
|
case "t": // tracepoint
|
||||||
|
// If target already includes tracepoint prefix, use as-is
|
||||||
|
if strings.HasPrefix(spec.Target, "tracepoint:") {
|
||||||
|
probe = spec.Target
|
||||||
|
} else {
|
||||||
|
probe = fmt.Sprintf("tracepoint:%s", spec.Target)
|
||||||
|
}
|
||||||
|
case "u": // uprobe
|
||||||
|
if spec.Library == "" {
|
||||||
|
return "", fmt.Errorf("library required for uprobe")
|
||||||
|
}
|
||||||
|
probe = fmt.Sprintf("uprobe:%s:%s", spec.Library, spec.Target)
|
||||||
|
default:
|
||||||
|
return "", fmt.Errorf("unsupported probe type: %s", spec.ProbeType)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Add BEGIN block
|
||||||
|
script.WriteString("BEGIN {\n")
|
||||||
|
script.WriteString(fmt.Sprintf(" printf(\"Starting trace for %s...\\n\");\n", spec.Target))
|
||||||
|
script.WriteString("}\n\n")
|
||||||
|
|
||||||
|
// Build the main probe
|
||||||
|
script.WriteString(fmt.Sprintf("%s {\n", probe))
|
||||||
|
|
||||||
|
// Add filters if specified
|
||||||
|
if tm.needsFiltering(spec) {
|
||||||
|
script.WriteString(" if (")
|
||||||
|
filters := tm.buildFilters(spec)
|
||||||
|
script.WriteString(strings.Join(filters, " && "))
|
||||||
|
script.WriteString(") {\n")
|
||||||
|
}
|
||||||
|
|
||||||
|
// Build output format
|
||||||
|
outputFormat := tm.buildOutputFormat(spec)
|
||||||
|
script.WriteString(fmt.Sprintf(" printf(\"%s\\n\"", outputFormat))
|
||||||
|
|
||||||
|
// Add arguments
|
||||||
|
args := tm.buildArgumentList(spec)
|
||||||
|
if len(args) > 0 {
|
||||||
|
script.WriteString(", ")
|
||||||
|
script.WriteString(strings.Join(args, ", "))
|
||||||
|
}
|
||||||
|
|
||||||
|
script.WriteString(");\n")
|
||||||
|
|
||||||
|
// Close filter if block
|
||||||
|
if tm.needsFiltering(spec) {
|
||||||
|
script.WriteString(" }\n")
|
||||||
|
}
|
||||||
|
|
||||||
|
script.WriteString("}\n\n")
|
||||||
|
|
||||||
|
// Add END block
|
||||||
|
script.WriteString("END {\n")
|
||||||
|
script.WriteString(fmt.Sprintf(" printf(\"Trace completed for %s\\n\");\n", spec.Target))
|
||||||
|
script.WriteString("}\n")
|
||||||
|
|
||||||
|
return script.String(), nil
|
||||||
|
}
|
||||||
|
|
||||||
|
// needsFiltering checks if any filters are needed
|
||||||
|
func (tm *BCCTraceManager) needsFiltering(spec TraceSpec) bool {
|
||||||
|
return spec.PID != 0 || spec.TID != 0 || spec.UID != -1 ||
|
||||||
|
spec.ProcessName != "" || spec.Filter != ""
|
||||||
|
}
|
||||||
|
|
||||||
|
// buildFilters builds the filter conditions
|
||||||
|
func (tm *BCCTraceManager) buildFilters(spec TraceSpec) []string {
|
||||||
|
var filters []string
|
||||||
|
|
||||||
|
if spec.PID != 0 {
|
||||||
|
filters = append(filters, fmt.Sprintf("pid == %d", spec.PID))
|
||||||
|
}
|
||||||
|
|
||||||
|
if spec.TID != 0 {
|
||||||
|
filters = append(filters, fmt.Sprintf("tid == %d", spec.TID))
|
||||||
|
}
|
||||||
|
|
||||||
|
if spec.UID != -1 {
|
||||||
|
filters = append(filters, fmt.Sprintf("uid == %d", spec.UID))
|
||||||
|
}
|
||||||
|
|
||||||
|
if spec.ProcessName != "" {
|
||||||
|
filters = append(filters, fmt.Sprintf("strncmp(comm, \"%s\", %d) == 0", spec.ProcessName, len(spec.ProcessName)))
|
||||||
|
}
|
||||||
|
|
||||||
|
// Add custom filter
|
||||||
|
if spec.Filter != "" {
|
||||||
|
// Convert common patterns to bpftrace syntax
|
||||||
|
customFilter := strings.ReplaceAll(spec.Filter, "arg", "arg")
|
||||||
|
filters = append(filters, customFilter)
|
||||||
|
}
|
||||||
|
|
||||||
|
return filters
|
||||||
|
}
|
||||||
|
|
||||||
|
// buildOutputFormat creates the output format string
|
||||||
|
func (tm *BCCTraceManager) buildOutputFormat(spec TraceSpec) string {
|
||||||
|
if spec.Format != "" {
|
||||||
|
// Use custom format
|
||||||
|
return fmt.Sprintf("TRACE|%%d|%%d|%%d|%%s|%s|%s", spec.Target, spec.Format)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Default format
|
||||||
|
return fmt.Sprintf("TRACE|%%d|%%d|%%d|%%s|%s|called", spec.Target)
|
||||||
|
}
|
||||||
|
|
||||||
|
// buildArgumentList creates the argument list for printf
|
||||||
|
func (tm *BCCTraceManager) buildArgumentList(spec TraceSpec) []string {
|
||||||
|
// Always include timestamp, pid, tid, comm
|
||||||
|
args := []string{"nsecs", "pid", "tid", "comm"}
|
||||||
|
|
||||||
|
// Add custom arguments
|
||||||
|
for _, arg := range spec.Arguments {
|
||||||
|
switch arg {
|
||||||
|
case "arg1", "arg2", "arg3", "arg4", "arg5", "arg6":
|
||||||
|
args = append(args, fmt.Sprintf("arg%s", strings.TrimPrefix(arg, "arg")))
|
||||||
|
case "retval":
|
||||||
|
args = append(args, "retval")
|
||||||
|
case "cpu":
|
||||||
|
args = append(args, "cpu")
|
||||||
|
default:
|
||||||
|
// Custom expression
|
||||||
|
args = append(args, arg)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
return args
|
||||||
|
}
|
||||||
|
|
||||||
|
// monitorTrace monitors a running trace and collects events
|
||||||
|
func (tm *BCCTraceManager) monitorTrace(traceID string, stdout io.ReadCloser) {
|
||||||
|
tm.tracesLock.Lock()
|
||||||
|
trace, exists := tm.traces[traceID]
|
||||||
|
if !exists {
|
||||||
|
tm.tracesLock.Unlock()
|
||||||
|
return
|
||||||
|
}
|
||||||
|
tm.tracesLock.Unlock()
|
||||||
|
|
||||||
|
// Start reading output in a goroutine
|
||||||
|
go func() {
|
||||||
|
scanner := NewEventScanner(stdout)
|
||||||
|
for scanner.Scan() {
|
||||||
|
event := scanner.Event()
|
||||||
|
if event != nil {
|
||||||
|
tm.tracesLock.Lock()
|
||||||
|
if t, exists := tm.traces[traceID]; exists {
|
||||||
|
t.Events = append(t.Events, *event)
|
||||||
|
}
|
||||||
|
tm.tracesLock.Unlock()
|
||||||
|
}
|
||||||
|
}
|
||||||
|
stdout.Close()
|
||||||
|
}()
|
||||||
|
|
||||||
|
// Wait for the process to complete
|
||||||
|
err := trace.Process.Wait()
|
||||||
|
|
||||||
|
// Clean up
|
||||||
|
trace.Cancel()
|
||||||
|
|
||||||
|
tm.tracesLock.Lock()
|
||||||
|
if err != nil && err.Error() != "signal: killed" {
|
||||||
|
logging.Warning("Trace %s completed with error: %v", traceID, err)
|
||||||
|
} else {
|
||||||
|
logging.Debug("Trace %s completed successfully with %d events",
|
||||||
|
traceID, len(trace.Events))
|
||||||
|
}
|
||||||
|
|
||||||
|
// Signal that monitoring is complete
|
||||||
|
close(trace.Done)
|
||||||
|
tm.tracesLock.Unlock()
|
||||||
|
}
|
||||||
|
|
||||||
|
// GetTraceResult returns the results of a completed trace
|
||||||
|
func (tm *BCCTraceManager) GetTraceResult(traceID string) (*TraceResult, error) {
|
||||||
|
tm.tracesLock.RLock()
|
||||||
|
trace, exists := tm.traces[traceID]
|
||||||
|
if !exists {
|
||||||
|
tm.tracesLock.RUnlock()
|
||||||
|
return nil, fmt.Errorf("trace %s not found", traceID)
|
||||||
|
}
|
||||||
|
tm.tracesLock.RUnlock()
|
||||||
|
|
||||||
|
// Wait for trace monitoring to complete
|
||||||
|
select {
|
||||||
|
case <-trace.Done:
|
||||||
|
// Trace monitoring completed
|
||||||
|
case <-time.After(5 * time.Second):
|
||||||
|
// Timeout waiting for completion
|
||||||
|
return nil, fmt.Errorf("timeout waiting for trace %s to complete", traceID)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Now safely read the final results
|
||||||
|
tm.tracesLock.RLock()
|
||||||
|
defer tm.tracesLock.RUnlock()
|
||||||
|
|
||||||
|
result := &TraceResult{
|
||||||
|
TraceID: traceID,
|
||||||
|
Spec: trace.Spec,
|
||||||
|
Events: make([]TraceEvent, len(trace.Events)),
|
||||||
|
EventCount: len(trace.Events),
|
||||||
|
StartTime: trace.StartTime,
|
||||||
|
EndTime: time.Now(),
|
||||||
|
}
|
||||||
|
|
||||||
|
copy(result.Events, trace.Events)
|
||||||
|
|
||||||
|
// Calculate statistics
|
||||||
|
result.Statistics = tm.calculateStatistics(result.Events, result.EndTime.Sub(result.StartTime))
|
||||||
|
|
||||||
|
// Generate summary
|
||||||
|
result.Summary = tm.generateSummary(result)
|
||||||
|
|
||||||
|
return result, nil
|
||||||
|
}
|
||||||
|
|
||||||
|
// calculateStatistics calculates statistics for the trace results
|
||||||
|
func (tm *BCCTraceManager) calculateStatistics(events []TraceEvent, duration time.Duration) TraceStats {
|
||||||
|
stats := TraceStats{
|
||||||
|
TotalEvents: len(events),
|
||||||
|
EventsByProcess: make(map[string]int),
|
||||||
|
EventsByUID: make(map[int]int),
|
||||||
|
}
|
||||||
|
|
||||||
|
if duration > 0 {
|
||||||
|
stats.EventsPerSecond = float64(len(events)) / duration.Seconds()
|
||||||
|
}
|
||||||
|
|
||||||
|
// Calculate per-process and per-UID statistics
|
||||||
|
for _, event := range events {
|
||||||
|
stats.EventsByProcess[event.ProcessName]++
|
||||||
|
stats.EventsByUID[event.UID]++
|
||||||
|
}
|
||||||
|
|
||||||
|
// Calculate top processes
|
||||||
|
for processName, count := range stats.EventsByProcess {
|
||||||
|
percentage := float64(count) / float64(len(events)) * 100
|
||||||
|
stats.TopProcesses = append(stats.TopProcesses, ProcessStat{
|
||||||
|
ProcessName: processName,
|
||||||
|
EventCount: count,
|
||||||
|
Percentage: percentage,
|
||||||
|
})
|
||||||
|
}
|
||||||
|
|
||||||
|
return stats
|
||||||
|
}
|
||||||
|
|
||||||
|
// generateSummary generates a human-readable summary
|
||||||
|
func (tm *BCCTraceManager) generateSummary(result *TraceResult) string {
|
||||||
|
duration := result.EndTime.Sub(result.StartTime)
|
||||||
|
|
||||||
|
summary := fmt.Sprintf("Traced %s for %v, captured %d events (%.2f events/sec)",
|
||||||
|
result.Spec.Target, duration, result.EventCount, result.Statistics.EventsPerSecond)
|
||||||
|
|
||||||
|
if len(result.Statistics.TopProcesses) > 0 {
|
||||||
|
summary += fmt.Sprintf(", top process: %s (%d events)",
|
||||||
|
result.Statistics.TopProcesses[0].ProcessName,
|
||||||
|
result.Statistics.TopProcesses[0].EventCount)
|
||||||
|
}
|
||||||
|
|
||||||
|
return summary
|
||||||
|
}
|
||||||
|
|
||||||
|
// StopTrace stops an active trace
|
||||||
|
func (tm *BCCTraceManager) StopTrace(traceID string) error {
|
||||||
|
tm.tracesLock.Lock()
|
||||||
|
defer tm.tracesLock.Unlock()
|
||||||
|
|
||||||
|
trace, exists := tm.traces[traceID]
|
||||||
|
if !exists {
|
||||||
|
return fmt.Errorf("trace %s not found", traceID)
|
||||||
|
}
|
||||||
|
|
||||||
|
if trace.Process.ProcessState == nil {
|
||||||
|
// Process is still running, kill it
|
||||||
|
if err := trace.Process.Process.Kill(); err != nil {
|
||||||
|
return fmt.Errorf("failed to stop trace: %w", err)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
trace.Cancel()
|
||||||
|
return nil
|
||||||
|
}
|
||||||
|
|
||||||
|
// ListActiveTraces returns a list of active trace IDs
|
||||||
|
func (tm *BCCTraceManager) ListActiveTraces() []string {
|
||||||
|
tm.tracesLock.RLock()
|
||||||
|
defer tm.tracesLock.RUnlock()
|
||||||
|
|
||||||
|
var active []string
|
||||||
|
for id, trace := range tm.traces {
|
||||||
|
if trace.Process.ProcessState == nil {
|
||||||
|
active = append(active, id)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
return active
|
||||||
|
}
|
||||||
|
|
||||||
|
// GetSummary returns a summary of the trace manager state
|
||||||
|
func (tm *BCCTraceManager) GetSummary() map[string]interface{} {
|
||||||
|
tm.tracesLock.RLock()
|
||||||
|
defer tm.tracesLock.RUnlock()
|
||||||
|
|
||||||
|
activeCount := 0
|
||||||
|
completedCount := 0
|
||||||
|
|
||||||
|
for _, trace := range tm.traces {
|
||||||
|
if trace.Process.ProcessState == nil {
|
||||||
|
activeCount++
|
||||||
|
} else {
|
||||||
|
completedCount++
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
return map[string]interface{}{
|
||||||
|
"capabilities": tm.capabilities,
|
||||||
|
"active_traces": activeCount,
|
||||||
|
"completed_traces": completedCount,
|
||||||
|
"total_traces": len(tm.traces),
|
||||||
|
"active_trace_ids": tm.ListActiveTraces(),
|
||||||
|
}
|
||||||
|
}
|
||||||
396
internal/ebpf/ebpf_trace_specs.go
Normal file
396
internal/ebpf/ebpf_trace_specs.go
Normal file
@@ -0,0 +1,396 @@
|
|||||||
|
package ebpf
|
||||||
|
|
||||||
|
import (
|
||||||
|
"encoding/json"
|
||||||
|
"fmt"
|
||||||
|
"strings"
|
||||||
|
)
|
||||||
|
|
||||||
|
// TestTraceSpecs provides test trace specifications for unit testing the BCC-style tracing
|
||||||
|
// These are used to validate the tracing functionality without requiring remote API calls
|
||||||
|
var TestTraceSpecs = map[string]TraceSpec{
|
||||||
|
// Basic system call tracing for testing
|
||||||
|
"test_sys_open": {
|
||||||
|
ProbeType: "p",
|
||||||
|
Target: "__x64_sys_openat",
|
||||||
|
Format: "opening file: %s",
|
||||||
|
Arguments: []string{"arg2@user"}, // filename
|
||||||
|
Duration: 5, // Short duration for testing
|
||||||
|
},
|
||||||
|
|
||||||
|
"test_sys_read": {
|
||||||
|
ProbeType: "p",
|
||||||
|
Target: "__x64_sys_read",
|
||||||
|
Format: "read %d bytes from fd %d",
|
||||||
|
Arguments: []string{"arg3", "arg1"}, // count, fd
|
||||||
|
Filter: "arg3 > 100", // Only reads >100 bytes for testing
|
||||||
|
Duration: 5,
|
||||||
|
},
|
||||||
|
|
||||||
|
"test_sys_write": {
|
||||||
|
ProbeType: "p",
|
||||||
|
Target: "__x64_sys_write",
|
||||||
|
Format: "write %d bytes to fd %d",
|
||||||
|
Arguments: []string{"arg3", "arg1"}, // count, fd
|
||||||
|
Duration: 5,
|
||||||
|
},
|
||||||
|
|
||||||
|
"test_process_creation": {
|
||||||
|
ProbeType: "p",
|
||||||
|
Target: "__x64_sys_execve",
|
||||||
|
Format: "exec: %s",
|
||||||
|
Arguments: []string{"arg1@user"}, // filename
|
||||||
|
Duration: 5,
|
||||||
|
},
|
||||||
|
|
||||||
|
// Test with different probe types
|
||||||
|
"test_kretprobe": {
|
||||||
|
ProbeType: "r",
|
||||||
|
Target: "__x64_sys_openat",
|
||||||
|
Format: "open returned: %d",
|
||||||
|
Arguments: []string{"retval"},
|
||||||
|
Duration: 5,
|
||||||
|
},
|
||||||
|
|
||||||
|
"test_with_filter": {
|
||||||
|
ProbeType: "p",
|
||||||
|
Target: "__x64_sys_write",
|
||||||
|
Format: "stdout write: %d bytes",
|
||||||
|
Arguments: []string{"arg3"},
|
||||||
|
Filter: "arg1 == 1", // Only stdout writes
|
||||||
|
Duration: 5,
|
||||||
|
},
|
||||||
|
}
|
||||||
|
|
||||||
|
// GetTestSpec returns a pre-defined test trace specification
|
||||||
|
func GetTestSpec(name string) (TraceSpec, bool) {
|
||||||
|
spec, exists := TestTraceSpecs[name]
|
||||||
|
return spec, exists
|
||||||
|
}
|
||||||
|
|
||||||
|
// ListTestSpecs returns all available test trace specifications
|
||||||
|
func ListTestSpecs() map[string]string {
|
||||||
|
descriptions := map[string]string{
|
||||||
|
"test_sys_open": "Test file open operations",
|
||||||
|
"test_sys_read": "Test read operations (>100 bytes)",
|
||||||
|
"test_sys_write": "Test write operations",
|
||||||
|
"test_process_creation": "Test process execution",
|
||||||
|
"test_kretprobe": "Test kretprobe on file open",
|
||||||
|
"test_with_filter": "Test filtered writes to stdout",
|
||||||
|
}
|
||||||
|
|
||||||
|
return descriptions
|
||||||
|
}
|
||||||
|
|
||||||
|
// TraceSpecBuilder helps build custom trace specifications
|
||||||
|
type TraceSpecBuilder struct {
|
||||||
|
spec TraceSpec
|
||||||
|
}
|
||||||
|
|
||||||
|
// NewTraceSpecBuilder creates a new trace specification builder
|
||||||
|
func NewTraceSpecBuilder() *TraceSpecBuilder {
|
||||||
|
return &TraceSpecBuilder{
|
||||||
|
spec: TraceSpec{
|
||||||
|
ProbeType: "p", // Default to kprobe
|
||||||
|
Duration: 30, // Default 30 seconds
|
||||||
|
},
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// Kprobe sets up a kernel probe
|
||||||
|
func (b *TraceSpecBuilder) Kprobe(function string) *TraceSpecBuilder {
|
||||||
|
b.spec.ProbeType = "p"
|
||||||
|
b.spec.Target = function
|
||||||
|
return b
|
||||||
|
}
|
||||||
|
|
||||||
|
// Kretprobe sets up a kernel return probe
|
||||||
|
func (b *TraceSpecBuilder) Kretprobe(function string) *TraceSpecBuilder {
|
||||||
|
b.spec.ProbeType = "r"
|
||||||
|
b.spec.Target = function
|
||||||
|
return b
|
||||||
|
}
|
||||||
|
|
||||||
|
// Tracepoint sets up a tracepoint
|
||||||
|
func (b *TraceSpecBuilder) Tracepoint(category, name string) *TraceSpecBuilder {
|
||||||
|
b.spec.ProbeType = "t"
|
||||||
|
b.spec.Target = fmt.Sprintf("%s:%s", category, name)
|
||||||
|
return b
|
||||||
|
}
|
||||||
|
|
||||||
|
// Uprobe sets up a userspace probe
|
||||||
|
func (b *TraceSpecBuilder) Uprobe(library, function string) *TraceSpecBuilder {
|
||||||
|
b.spec.ProbeType = "u"
|
||||||
|
b.spec.Library = library
|
||||||
|
b.spec.Target = function
|
||||||
|
return b
|
||||||
|
}
|
||||||
|
|
||||||
|
// Format sets the output format string
|
||||||
|
func (b *TraceSpecBuilder) Format(format string, args ...string) *TraceSpecBuilder {
|
||||||
|
b.spec.Format = format
|
||||||
|
b.spec.Arguments = args
|
||||||
|
return b
|
||||||
|
}
|
||||||
|
|
||||||
|
// Filter adds a filter condition
|
||||||
|
func (b *TraceSpecBuilder) Filter(condition string) *TraceSpecBuilder {
|
||||||
|
b.spec.Filter = condition
|
||||||
|
return b
|
||||||
|
}
|
||||||
|
|
||||||
|
// Duration sets the trace duration in seconds
|
||||||
|
func (b *TraceSpecBuilder) Duration(seconds int) *TraceSpecBuilder {
|
||||||
|
b.spec.Duration = seconds
|
||||||
|
return b
|
||||||
|
}
|
||||||
|
|
||||||
|
// PID filters by process ID
|
||||||
|
func (b *TraceSpecBuilder) PID(pid int) *TraceSpecBuilder {
|
||||||
|
b.spec.PID = pid
|
||||||
|
return b
|
||||||
|
}
|
||||||
|
|
||||||
|
// UID filters by user ID
|
||||||
|
func (b *TraceSpecBuilder) UID(uid int) *TraceSpecBuilder {
|
||||||
|
b.spec.UID = uid
|
||||||
|
return b
|
||||||
|
}
|
||||||
|
|
||||||
|
// ProcessName filters by process name
|
||||||
|
func (b *TraceSpecBuilder) ProcessName(name string) *TraceSpecBuilder {
|
||||||
|
b.spec.ProcessName = name
|
||||||
|
return b
|
||||||
|
}
|
||||||
|
|
||||||
|
// Build returns the constructed trace specification
|
||||||
|
func (b *TraceSpecBuilder) Build() TraceSpec {
|
||||||
|
return b.spec
|
||||||
|
}
|
||||||
|
|
||||||
|
// TraceSpecParser parses trace specifications from various formats
|
||||||
|
type TraceSpecParser struct{}
|
||||||
|
|
||||||
|
// NewTraceSpecParser creates a new parser
|
||||||
|
func NewTraceSpecParser() *TraceSpecParser {
|
||||||
|
return &TraceSpecParser{}
|
||||||
|
}
|
||||||
|
|
||||||
|
// ParseFromBCCStyle parses BCC trace.py style specifications
|
||||||
|
// Examples:
|
||||||
|
//
|
||||||
|
// "sys_open" -> trace sys_open syscall
|
||||||
|
// "p::do_sys_open" -> kprobe on do_sys_open
|
||||||
|
// "r::do_sys_open" -> kretprobe on do_sys_open
|
||||||
|
// "t:syscalls:sys_enter_open" -> tracepoint
|
||||||
|
// "sys_read (arg3 > 1024)" -> with filter
|
||||||
|
// "sys_read \"read %d bytes\", arg3" -> with format
|
||||||
|
func (p *TraceSpecParser) ParseFromBCCStyle(spec string) (TraceSpec, error) {
|
||||||
|
result := TraceSpec{
|
||||||
|
ProbeType: "p",
|
||||||
|
Duration: 30,
|
||||||
|
}
|
||||||
|
|
||||||
|
// Split by quotes to separate format string
|
||||||
|
parts := strings.Split(spec, "\"")
|
||||||
|
|
||||||
|
var probeSpec string
|
||||||
|
if len(parts) >= 1 {
|
||||||
|
probeSpec = strings.TrimSpace(parts[0])
|
||||||
|
}
|
||||||
|
|
||||||
|
var formatPart string
|
||||||
|
if len(parts) >= 2 {
|
||||||
|
formatPart = parts[1]
|
||||||
|
}
|
||||||
|
|
||||||
|
var argsPart string
|
||||||
|
if len(parts) >= 3 {
|
||||||
|
argsPart = strings.TrimSpace(parts[2])
|
||||||
|
if strings.HasPrefix(argsPart, ",") {
|
||||||
|
argsPart = strings.TrimSpace(argsPart[1:])
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// Parse probe specification
|
||||||
|
if err := p.parseProbeSpec(probeSpec, &result); err != nil {
|
||||||
|
return result, err
|
||||||
|
}
|
||||||
|
|
||||||
|
// Parse format string
|
||||||
|
if formatPart != "" {
|
||||||
|
result.Format = formatPart
|
||||||
|
}
|
||||||
|
|
||||||
|
// Parse arguments
|
||||||
|
if argsPart != "" {
|
||||||
|
result.Arguments = p.parseArguments(argsPart)
|
||||||
|
}
|
||||||
|
|
||||||
|
return result, nil
|
||||||
|
}
|
||||||
|
|
||||||
|
// parseProbeSpec parses the probe specification part
|
||||||
|
func (p *TraceSpecParser) parseProbeSpec(spec string, result *TraceSpec) error {
|
||||||
|
// Handle filter conditions in parentheses
|
||||||
|
if idx := strings.Index(spec, "("); idx != -1 {
|
||||||
|
filterEnd := strings.LastIndex(spec, ")")
|
||||||
|
if filterEnd > idx {
|
||||||
|
result.Filter = strings.TrimSpace(spec[idx+1 : filterEnd])
|
||||||
|
spec = strings.TrimSpace(spec[:idx])
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// Parse probe type and target
|
||||||
|
if strings.Contains(spec, ":") {
|
||||||
|
parts := strings.SplitN(spec, ":", 3)
|
||||||
|
|
||||||
|
if len(parts) >= 1 && parts[0] != "" {
|
||||||
|
switch parts[0] {
|
||||||
|
case "p":
|
||||||
|
result.ProbeType = "p"
|
||||||
|
case "r":
|
||||||
|
result.ProbeType = "r"
|
||||||
|
case "t":
|
||||||
|
result.ProbeType = "t"
|
||||||
|
case "u":
|
||||||
|
result.ProbeType = "u"
|
||||||
|
default:
|
||||||
|
return fmt.Errorf("unsupported probe type: %s", parts[0])
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
if len(parts) >= 2 {
|
||||||
|
result.Library = parts[1]
|
||||||
|
}
|
||||||
|
|
||||||
|
if len(parts) >= 3 {
|
||||||
|
result.Target = parts[2]
|
||||||
|
} else if len(parts) == 2 {
|
||||||
|
result.Target = parts[1]
|
||||||
|
result.Library = ""
|
||||||
|
}
|
||||||
|
} else {
|
||||||
|
// Simple function name
|
||||||
|
result.Target = spec
|
||||||
|
|
||||||
|
// Auto-detect syscall format
|
||||||
|
if strings.HasPrefix(spec, "sys_") && !strings.HasPrefix(spec, "__x64_sys_") {
|
||||||
|
result.Target = "__x64_sys_" + spec[4:]
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
return nil
|
||||||
|
}
|
||||||
|
|
||||||
|
// parseArguments parses the arguments part
|
||||||
|
func (p *TraceSpecParser) parseArguments(args string) []string {
|
||||||
|
var result []string
|
||||||
|
|
||||||
|
// Split by comma and clean up
|
||||||
|
parts := strings.Split(args, ",")
|
||||||
|
for _, part := range parts {
|
||||||
|
arg := strings.TrimSpace(part)
|
||||||
|
if arg != "" {
|
||||||
|
result = append(result, arg)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
return result
|
||||||
|
}
|
||||||
|
|
||||||
|
// ParseFromJSON parses trace specification from JSON
|
||||||
|
func (p *TraceSpecParser) ParseFromJSON(jsonData []byte) (TraceSpec, error) {
|
||||||
|
var spec TraceSpec
|
||||||
|
err := json.Unmarshal(jsonData, &spec)
|
||||||
|
return spec, err
|
||||||
|
}
|
||||||
|
|
||||||
|
// GetCommonSpec returns a pre-defined test trace specification (renamed for backward compatibility)
|
||||||
|
func GetCommonSpec(name string) (TraceSpec, bool) {
|
||||||
|
// Map old names to new test names for compatibility
|
||||||
|
testName := name
|
||||||
|
if strings.HasPrefix(name, "trace_") {
|
||||||
|
testName = strings.Replace(name, "trace_", "test_", 1)
|
||||||
|
}
|
||||||
|
|
||||||
|
spec, exists := TestTraceSpecs[testName]
|
||||||
|
return spec, exists
|
||||||
|
}
|
||||||
|
|
||||||
|
// ListCommonSpecs returns all available test trace specifications (renamed for backward compatibility)
|
||||||
|
func ListCommonSpecs() map[string]string {
|
||||||
|
return ListTestSpecs()
|
||||||
|
}
|
||||||
|
|
||||||
|
// ValidateTraceSpec validates a trace specification
|
||||||
|
func ValidateTraceSpec(spec TraceSpec) error {
|
||||||
|
if spec.Target == "" {
|
||||||
|
return fmt.Errorf("target function/syscall is required")
|
||||||
|
}
|
||||||
|
|
||||||
|
if spec.Duration <= 0 {
|
||||||
|
return fmt.Errorf("duration must be positive")
|
||||||
|
}
|
||||||
|
|
||||||
|
if spec.Duration > 600 { // 10 minutes max
|
||||||
|
return fmt.Errorf("duration too long (max 600 seconds)")
|
||||||
|
}
|
||||||
|
|
||||||
|
switch spec.ProbeType {
|
||||||
|
case "p", "r", "t", "u":
|
||||||
|
// Valid probe types
|
||||||
|
case "":
|
||||||
|
// Default to kprobe
|
||||||
|
default:
|
||||||
|
return fmt.Errorf("unsupported probe type: %s", spec.ProbeType)
|
||||||
|
}
|
||||||
|
|
||||||
|
if spec.ProbeType == "u" && spec.Library == "" {
|
||||||
|
return fmt.Errorf("library required for userspace probes")
|
||||||
|
}
|
||||||
|
|
||||||
|
if spec.ProbeType == "t" && !strings.Contains(spec.Target, ":") {
|
||||||
|
return fmt.Errorf("tracepoint requires format 'category:name'")
|
||||||
|
}
|
||||||
|
|
||||||
|
return nil
|
||||||
|
}
|
||||||
|
|
||||||
|
// SuggestSyscallTargets suggests syscall targets based on the issue description
|
||||||
|
func SuggestSyscallTargets(issueDescription string) []string {
|
||||||
|
description := strings.ToLower(issueDescription)
|
||||||
|
var suggestions []string
|
||||||
|
|
||||||
|
// File I/O issues
|
||||||
|
if strings.Contains(description, "file") || strings.Contains(description, "disk") || strings.Contains(description, "io") {
|
||||||
|
suggestions = append(suggestions, "trace_sys_open", "trace_sys_read", "trace_sys_write", "trace_sys_unlink")
|
||||||
|
}
|
||||||
|
|
||||||
|
// Network issues
|
||||||
|
if strings.Contains(description, "network") || strings.Contains(description, "socket") || strings.Contains(description, "connection") {
|
||||||
|
suggestions = append(suggestions, "trace_sys_connect", "trace_sys_socket", "trace_sys_bind", "trace_sys_accept")
|
||||||
|
}
|
||||||
|
|
||||||
|
// Process issues
|
||||||
|
if strings.Contains(description, "process") || strings.Contains(description, "crash") || strings.Contains(description, "exec") {
|
||||||
|
suggestions = append(suggestions, "trace_sys_execve", "trace_sys_clone", "trace_sys_exit", "trace_sys_kill")
|
||||||
|
}
|
||||||
|
|
||||||
|
// Memory issues
|
||||||
|
if strings.Contains(description, "memory") || strings.Contains(description, "malloc") || strings.Contains(description, "leak") {
|
||||||
|
suggestions = append(suggestions, "trace_sys_mmap", "trace_sys_brk")
|
||||||
|
}
|
||||||
|
|
||||||
|
// Performance issues - trace common syscalls
|
||||||
|
if strings.Contains(description, "slow") || strings.Contains(description, "performance") || strings.Contains(description, "hang") {
|
||||||
|
suggestions = append(suggestions, "trace_sys_read", "trace_sys_write", "trace_sys_connect", "trace_sys_mmap")
|
||||||
|
}
|
||||||
|
|
||||||
|
// If no specific suggestions, provide general monitoring
|
||||||
|
if len(suggestions) == 0 {
|
||||||
|
suggestions = append(suggestions, "trace_sys_execve", "trace_sys_open", "trace_sys_connect")
|
||||||
|
}
|
||||||
|
|
||||||
|
return suggestions
|
||||||
|
}
|
||||||
921
internal/ebpf/ebpf_trace_test.go
Normal file
921
internal/ebpf/ebpf_trace_test.go
Normal file
@@ -0,0 +1,921 @@
|
|||||||
|
package ebpf
|
||||||
|
|
||||||
|
import (
|
||||||
|
"encoding/json"
|
||||||
|
"fmt"
|
||||||
|
"os"
|
||||||
|
"strings"
|
||||||
|
"testing"
|
||||||
|
"time"
|
||||||
|
)
|
||||||
|
|
||||||
|
// TestBCCTracing demonstrates and tests the new BCC-style tracing functionality
|
||||||
|
// This test documents the expected behavior and response format of the agent
|
||||||
|
func TestBCCTracing(t *testing.T) {
|
||||||
|
fmt.Println("=== BCC-Style eBPF Tracing Unit Tests ===")
|
||||||
|
fmt.Println()
|
||||||
|
|
||||||
|
// Test 1: List available test specifications
|
||||||
|
t.Run("ListTestSpecs", func(t *testing.T) {
|
||||||
|
specs := ListTestSpecs()
|
||||||
|
fmt.Printf("📋 Available Test Specifications:\n")
|
||||||
|
for name, description := range specs {
|
||||||
|
fmt.Printf(" - %s: %s\n", name, description)
|
||||||
|
}
|
||||||
|
fmt.Println()
|
||||||
|
|
||||||
|
if len(specs) == 0 {
|
||||||
|
t.Error("No test specifications available")
|
||||||
|
}
|
||||||
|
})
|
||||||
|
|
||||||
|
// Test 2: Parse BCC-style specifications
|
||||||
|
t.Run("ParseBCCStyle", func(t *testing.T) {
|
||||||
|
parser := NewTraceSpecParser()
|
||||||
|
|
||||||
|
testCases := []struct {
|
||||||
|
input string
|
||||||
|
expected string
|
||||||
|
}{
|
||||||
|
{
|
||||||
|
input: "sys_open",
|
||||||
|
expected: "__x64_sys_open",
|
||||||
|
},
|
||||||
|
{
|
||||||
|
input: "p::do_sys_open",
|
||||||
|
expected: "do_sys_open",
|
||||||
|
},
|
||||||
|
{
|
||||||
|
input: "r::sys_read",
|
||||||
|
expected: "sys_read",
|
||||||
|
},
|
||||||
|
{
|
||||||
|
input: "sys_write (arg1 == 1)",
|
||||||
|
expected: "__x64_sys_write",
|
||||||
|
},
|
||||||
|
}
|
||||||
|
|
||||||
|
fmt.Printf("🔍 Testing BCC-style parsing:\n")
|
||||||
|
for _, tc := range testCases {
|
||||||
|
spec, err := parser.ParseFromBCCStyle(tc.input)
|
||||||
|
if err != nil {
|
||||||
|
t.Errorf("Failed to parse '%s': %v", tc.input, err)
|
||||||
|
continue
|
||||||
|
}
|
||||||
|
|
||||||
|
fmt.Printf(" Input: '%s' -> Target: '%s', Type: '%s'\n",
|
||||||
|
tc.input, spec.Target, spec.ProbeType)
|
||||||
|
|
||||||
|
if spec.Target != tc.expected {
|
||||||
|
t.Errorf("Expected target '%s', got '%s'", tc.expected, spec.Target)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
fmt.Println()
|
||||||
|
})
|
||||||
|
|
||||||
|
// Test 3: Validate trace specifications
|
||||||
|
t.Run("ValidateSpecs", func(t *testing.T) {
|
||||||
|
fmt.Printf("✅ Testing trace specification validation:\n")
|
||||||
|
|
||||||
|
// Valid spec
|
||||||
|
validSpec := TraceSpec{
|
||||||
|
ProbeType: "p",
|
||||||
|
Target: "__x64_sys_openat",
|
||||||
|
Format: "opening file",
|
||||||
|
Duration: 5,
|
||||||
|
}
|
||||||
|
|
||||||
|
if err := ValidateTraceSpec(validSpec); err != nil {
|
||||||
|
t.Errorf("Valid spec failed validation: %v", err)
|
||||||
|
} else {
|
||||||
|
fmt.Printf(" ✓ Valid specification passed\n")
|
||||||
|
}
|
||||||
|
|
||||||
|
// Invalid spec - no target
|
||||||
|
invalidSpec := TraceSpec{
|
||||||
|
ProbeType: "p",
|
||||||
|
Duration: 5,
|
||||||
|
}
|
||||||
|
|
||||||
|
if err := ValidateTraceSpec(invalidSpec); err == nil {
|
||||||
|
t.Error("Invalid spec (no target) should have failed validation")
|
||||||
|
} else {
|
||||||
|
fmt.Printf(" ✓ Invalid specification correctly rejected: %s\n", err.Error())
|
||||||
|
}
|
||||||
|
|
||||||
|
fmt.Println()
|
||||||
|
})
|
||||||
|
|
||||||
|
// Test 4: Simulate agent response format
|
||||||
|
t.Run("SimulateAgentResponse", func(t *testing.T) {
|
||||||
|
fmt.Printf("🤖 Simulating agent response for BCC-style tracing:\n")
|
||||||
|
|
||||||
|
// Get a test specification
|
||||||
|
testSpec, exists := GetTestSpec("test_sys_open")
|
||||||
|
if !exists {
|
||||||
|
t.Fatal("test_sys_open specification not found")
|
||||||
|
}
|
||||||
|
|
||||||
|
// Simulate what the agent would return
|
||||||
|
mockResponse := simulateTraceExecution(testSpec)
|
||||||
|
|
||||||
|
// Print the response format
|
||||||
|
responseJSON, _ := json.MarshalIndent(mockResponse, "", " ")
|
||||||
|
fmt.Printf(" Expected Response Format:\n%s\n", string(responseJSON))
|
||||||
|
|
||||||
|
// Validate response structure
|
||||||
|
if mockResponse["success"] != true {
|
||||||
|
t.Error("Expected successful trace execution")
|
||||||
|
}
|
||||||
|
|
||||||
|
if mockResponse["type"] != "bcc_trace" {
|
||||||
|
t.Error("Expected type to be 'bcc_trace'")
|
||||||
|
}
|
||||||
|
|
||||||
|
events, hasEvents := mockResponse["events"].([]TraceEvent)
|
||||||
|
if !hasEvents || len(events) == 0 {
|
||||||
|
t.Error("Expected trace events in response")
|
||||||
|
}
|
||||||
|
|
||||||
|
fmt.Println()
|
||||||
|
})
|
||||||
|
|
||||||
|
// Test 5: Test different probe types
|
||||||
|
t.Run("TestProbeTypes", func(t *testing.T) {
|
||||||
|
fmt.Printf("🔬 Testing different probe types:\n")
|
||||||
|
|
||||||
|
probeTests := []struct {
|
||||||
|
specName string
|
||||||
|
expected string
|
||||||
|
}{
|
||||||
|
{"test_sys_open", "kprobe"},
|
||||||
|
{"test_kretprobe", "kretprobe"},
|
||||||
|
{"test_with_filter", "kprobe with filter"},
|
||||||
|
}
|
||||||
|
|
||||||
|
for _, test := range probeTests {
|
||||||
|
spec, exists := GetTestSpec(test.specName)
|
||||||
|
if !exists {
|
||||||
|
t.Errorf("Test spec '%s' not found", test.specName)
|
||||||
|
continue
|
||||||
|
}
|
||||||
|
|
||||||
|
response := simulateTraceExecution(spec)
|
||||||
|
fmt.Printf(" %s -> %s: %d events captured\n",
|
||||||
|
test.specName, test.expected, response["event_count"])
|
||||||
|
}
|
||||||
|
fmt.Println()
|
||||||
|
})
|
||||||
|
|
||||||
|
// Test 6: Test trace spec builder
|
||||||
|
t.Run("TestTraceSpecBuilder", func(t *testing.T) {
|
||||||
|
fmt.Printf("🏗️ Testing trace specification builder:\n")
|
||||||
|
|
||||||
|
// Build a custom trace spec
|
||||||
|
spec := NewTraceSpecBuilder().
|
||||||
|
Kprobe("__x64_sys_write").
|
||||||
|
Format("write syscall: %d bytes", "arg3").
|
||||||
|
Filter("arg1 == 1").
|
||||||
|
Duration(3).
|
||||||
|
Build()
|
||||||
|
|
||||||
|
fmt.Printf(" Built spec: Target=%s, Format=%s, Filter=%s\n",
|
||||||
|
spec.Target, spec.Format, spec.Filter)
|
||||||
|
|
||||||
|
if spec.Target != "__x64_sys_write" {
|
||||||
|
t.Error("Builder failed to set target correctly")
|
||||||
|
}
|
||||||
|
|
||||||
|
if spec.ProbeType != "p" {
|
||||||
|
t.Error("Builder failed to set probe type correctly")
|
||||||
|
}
|
||||||
|
|
||||||
|
fmt.Println()
|
||||||
|
})
|
||||||
|
}
|
||||||
|
|
||||||
|
// simulateTraceExecution simulates what the agent would return for a trace execution
|
||||||
|
// This documents the expected response format from the agent
|
||||||
|
func simulateTraceExecution(spec TraceSpec) map[string]interface{} {
|
||||||
|
// Simulate some trace events
|
||||||
|
events := []TraceEvent{
|
||||||
|
{
|
||||||
|
Timestamp: time.Now().Unix(),
|
||||||
|
PID: 1234,
|
||||||
|
TID: 1234,
|
||||||
|
ProcessName: "test_process",
|
||||||
|
Function: spec.Target,
|
||||||
|
Message: fmt.Sprintf(spec.Format, "test_file.txt"),
|
||||||
|
RawArgs: map[string]string{
|
||||||
|
"arg1": "5",
|
||||||
|
"arg2": "test_file.txt",
|
||||||
|
"arg3": "1024",
|
||||||
|
},
|
||||||
|
},
|
||||||
|
{
|
||||||
|
Timestamp: time.Now().Unix(),
|
||||||
|
PID: 5678,
|
||||||
|
TID: 5678,
|
||||||
|
ProcessName: "another_process",
|
||||||
|
Function: spec.Target,
|
||||||
|
Message: fmt.Sprintf(spec.Format, "data.log"),
|
||||||
|
RawArgs: map[string]string{
|
||||||
|
"arg1": "3",
|
||||||
|
"arg2": "data.log",
|
||||||
|
"arg3": "512",
|
||||||
|
},
|
||||||
|
},
|
||||||
|
}
|
||||||
|
|
||||||
|
// Simulate trace statistics
|
||||||
|
stats := TraceStats{
|
||||||
|
TotalEvents: len(events),
|
||||||
|
EventsByProcess: map[string]int{"test_process": 1, "another_process": 1},
|
||||||
|
EventsByUID: map[int]int{1000: 2},
|
||||||
|
EventsPerSecond: float64(len(events)) / float64(spec.Duration),
|
||||||
|
TopProcesses: []ProcessStat{
|
||||||
|
{ProcessName: "test_process", EventCount: 1, Percentage: 50.0},
|
||||||
|
{ProcessName: "another_process", EventCount: 1, Percentage: 50.0},
|
||||||
|
},
|
||||||
|
}
|
||||||
|
|
||||||
|
// Return the expected agent response format
|
||||||
|
return map[string]interface{}{
|
||||||
|
"name": spec.Target,
|
||||||
|
"type": "bcc_trace",
|
||||||
|
"target": spec.Target,
|
||||||
|
"duration": spec.Duration,
|
||||||
|
"description": fmt.Sprintf("Traced %s for %d seconds", spec.Target, spec.Duration),
|
||||||
|
"status": "completed",
|
||||||
|
"success": true,
|
||||||
|
"event_count": len(events),
|
||||||
|
"events": events,
|
||||||
|
"statistics": stats,
|
||||||
|
"data_points": len(events),
|
||||||
|
"probe_type": spec.ProbeType,
|
||||||
|
"format": spec.Format,
|
||||||
|
"filter": spec.Filter,
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// TestTraceManagerCapabilities tests the trace manager capabilities
|
||||||
|
func TestTraceManagerCapabilities(t *testing.T) {
|
||||||
|
fmt.Println("=== BCC Trace Manager Capabilities Test ===")
|
||||||
|
fmt.Println()
|
||||||
|
|
||||||
|
manager := NewBCCTraceManager()
|
||||||
|
caps := manager.GetCapabilities()
|
||||||
|
|
||||||
|
fmt.Printf("🔧 Trace Manager Capabilities:\n")
|
||||||
|
for capability, available := range caps {
|
||||||
|
status := "❌ Not Available"
|
||||||
|
if available {
|
||||||
|
status = "✅ Available"
|
||||||
|
}
|
||||||
|
fmt.Printf(" %s: %s\n", capability, status)
|
||||||
|
}
|
||||||
|
fmt.Println()
|
||||||
|
|
||||||
|
// Check essential capabilities
|
||||||
|
if !caps["kernel_ebpf"] {
|
||||||
|
fmt.Printf("⚠️ Warning: Kernel eBPF support not detected\n")
|
||||||
|
}
|
||||||
|
|
||||||
|
if !caps["bpftrace"] {
|
||||||
|
fmt.Printf("⚠️ Warning: bpftrace not available (install with: apt install bpftrace)\n")
|
||||||
|
}
|
||||||
|
|
||||||
|
if !caps["root_access"] {
|
||||||
|
fmt.Printf("⚠️ Warning: Root access required for eBPF tracing\n")
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// BenchmarkTraceSpecParsing benchmarks the trace specification parsing
|
||||||
|
func BenchmarkTraceSpecParsing(b *testing.B) {
|
||||||
|
parser := NewTraceSpecParser()
|
||||||
|
testInput := "sys_open \"opening %s\", arg2@user"
|
||||||
|
|
||||||
|
b.ResetTimer()
|
||||||
|
for i := 0; i < b.N; i++ {
|
||||||
|
_, err := parser.ParseFromBCCStyle(testInput)
|
||||||
|
if err != nil {
|
||||||
|
b.Fatal(err)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// TestSyscallSuggestions tests the syscall suggestion functionality
|
||||||
|
func TestSyscallSuggestions(t *testing.T) {
|
||||||
|
fmt.Println("=== Syscall Suggestion Test ===")
|
||||||
|
fmt.Println()
|
||||||
|
|
||||||
|
testCases := []struct {
|
||||||
|
issue string
|
||||||
|
expected int // minimum expected suggestions
|
||||||
|
description string
|
||||||
|
}{
|
||||||
|
{
|
||||||
|
issue: "file not found error",
|
||||||
|
expected: 1,
|
||||||
|
description: "File I/O issue should suggest file-related syscalls",
|
||||||
|
},
|
||||||
|
{
|
||||||
|
issue: "network connection timeout",
|
||||||
|
expected: 1,
|
||||||
|
description: "Network issue should suggest network syscalls",
|
||||||
|
},
|
||||||
|
{
|
||||||
|
issue: "process crashes randomly",
|
||||||
|
expected: 1,
|
||||||
|
description: "Process issue should suggest process-related syscalls",
|
||||||
|
},
|
||||||
|
{
|
||||||
|
issue: "memory leak detected",
|
||||||
|
expected: 1,
|
||||||
|
description: "Memory issue should suggest memory syscalls",
|
||||||
|
},
|
||||||
|
{
|
||||||
|
issue: "application is slow",
|
||||||
|
expected: 1,
|
||||||
|
description: "Performance issue should suggest monitoring syscalls",
|
||||||
|
},
|
||||||
|
}
|
||||||
|
|
||||||
|
fmt.Printf("💡 Testing syscall suggestions:\n")
|
||||||
|
for _, tc := range testCases {
|
||||||
|
suggestions := SuggestSyscallTargets(tc.issue)
|
||||||
|
fmt.Printf(" Issue: '%s' -> %d suggestions: %v\n",
|
||||||
|
tc.issue, len(suggestions), suggestions)
|
||||||
|
|
||||||
|
if len(suggestions) < tc.expected {
|
||||||
|
t.Errorf("Expected at least %d suggestions for '%s', got %d",
|
||||||
|
tc.expected, tc.issue, len(suggestions))
|
||||||
|
}
|
||||||
|
}
|
||||||
|
fmt.Println()
|
||||||
|
}
|
||||||
|
|
||||||
|
// TestMain runs the tests and provides a summary
|
||||||
|
func TestMain(m *testing.M) {
|
||||||
|
fmt.Println("🚀 Starting BCC-Style eBPF Tracing Tests")
|
||||||
|
fmt.Println("========================================")
|
||||||
|
fmt.Println()
|
||||||
|
|
||||||
|
// Run capability check first
|
||||||
|
manager := NewBCCTraceManager()
|
||||||
|
caps := manager.GetCapabilities()
|
||||||
|
|
||||||
|
if !caps["kernel_ebpf"] {
|
||||||
|
fmt.Println("⚠️ Kernel eBPF support not detected - some tests may be limited")
|
||||||
|
}
|
||||||
|
if !caps["bpftrace"] {
|
||||||
|
fmt.Println("⚠️ bpftrace not available - install with: sudo apt install bpftrace")
|
||||||
|
}
|
||||||
|
if !caps["root_access"] {
|
||||||
|
fmt.Println("⚠️ Root access required for actual eBPF tracing")
|
||||||
|
}
|
||||||
|
|
||||||
|
fmt.Println()
|
||||||
|
|
||||||
|
// Run the tests
|
||||||
|
code := m.Run()
|
||||||
|
|
||||||
|
fmt.Println()
|
||||||
|
fmt.Println("========================================")
|
||||||
|
if code == 0 {
|
||||||
|
fmt.Println("✅ All BCC-Style eBPF Tracing Tests Passed!")
|
||||||
|
} else {
|
||||||
|
fmt.Println("❌ Some tests failed")
|
||||||
|
}
|
||||||
|
|
||||||
|
os.Exit(code)
|
||||||
|
}
|
||||||
|
|
||||||
|
// TestBCCTraceManagerRootTest tests the actual BCC trace manager with root privileges
|
||||||
|
// This test requires root access and will only run meaningful tests when root
|
||||||
|
func TestBCCTraceManagerRootTest(t *testing.T) {
|
||||||
|
fmt.Println("=== BCC Trace Manager Root Test ===")
|
||||||
|
|
||||||
|
// Check if running as root
|
||||||
|
if os.Geteuid() != 0 {
|
||||||
|
t.Skip("⚠️ Skipping root test - not running as root (use: sudo go test -run TestBCCTraceManagerRootTest)")
|
||||||
|
return
|
||||||
|
}
|
||||||
|
|
||||||
|
fmt.Println("✅ Running as root - can test actual eBPF functionality")
|
||||||
|
|
||||||
|
// Test 1: Create BCC trace manager and check capabilities
|
||||||
|
manager := NewBCCTraceManager()
|
||||||
|
caps := manager.GetCapabilities()
|
||||||
|
|
||||||
|
fmt.Printf("🔍 BCC Trace Manager Capabilities:\n")
|
||||||
|
for cap, available := range caps {
|
||||||
|
status := "❌"
|
||||||
|
if available {
|
||||||
|
status = "✅"
|
||||||
|
}
|
||||||
|
fmt.Printf(" %s %s: %v\n", status, cap, available)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Require essential capabilities
|
||||||
|
if !caps["bpftrace"] {
|
||||||
|
t.Fatal("❌ bpftrace not available - install bpftrace package")
|
||||||
|
}
|
||||||
|
|
||||||
|
if !caps["root_access"] {
|
||||||
|
t.Fatal("❌ Root access not detected")
|
||||||
|
}
|
||||||
|
|
||||||
|
// Test 2: Create and execute a simple trace
|
||||||
|
fmt.Println("\n🔬 Testing actual eBPF trace execution...")
|
||||||
|
|
||||||
|
spec := TraceSpec{
|
||||||
|
ProbeType: "t", // tracepoint
|
||||||
|
Target: "syscalls:sys_enter_openat",
|
||||||
|
Format: "file access",
|
||||||
|
Arguments: []string{}, // Remove invalid arg2@user for tracepoints
|
||||||
|
Duration: 3, // 3 seconds
|
||||||
|
}
|
||||||
|
|
||||||
|
fmt.Printf("📝 Starting trace: %s for %d seconds\n", spec.Target, spec.Duration)
|
||||||
|
|
||||||
|
traceID, err := manager.StartTrace(spec)
|
||||||
|
if err != nil {
|
||||||
|
t.Fatalf("❌ Failed to start trace: %v", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
fmt.Printf("🚀 Trace started with ID: %s\n", traceID)
|
||||||
|
|
||||||
|
// Generate some file access to capture
|
||||||
|
go func() {
|
||||||
|
time.Sleep(1 * time.Second)
|
||||||
|
// Create some file operations to trace
|
||||||
|
for i := 0; i < 3; i++ {
|
||||||
|
testFile := fmt.Sprintf("/tmp/bcc_test_%d.txt", i)
|
||||||
|
|
||||||
|
// This will trigger sys_openat syscalls
|
||||||
|
if file, err := os.Create(testFile); err == nil {
|
||||||
|
file.WriteString("BCC trace test")
|
||||||
|
file.Close()
|
||||||
|
os.Remove(testFile)
|
||||||
|
}
|
||||||
|
time.Sleep(500 * time.Millisecond)
|
||||||
|
}
|
||||||
|
}()
|
||||||
|
|
||||||
|
// Wait for trace to complete
|
||||||
|
time.Sleep(time.Duration(spec.Duration+1) * time.Second)
|
||||||
|
|
||||||
|
// Get results
|
||||||
|
result, err := manager.GetTraceResult(traceID)
|
||||||
|
if err != nil {
|
||||||
|
// Try to stop the trace if it's still running
|
||||||
|
manager.StopTrace(traceID)
|
||||||
|
t.Fatalf("❌ Failed to get trace results: %v", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
fmt.Printf("\n📊 Trace Results Summary:\n")
|
||||||
|
fmt.Printf(" • Trace ID: %s\n", result.TraceID)
|
||||||
|
fmt.Printf(" • Target: %s\n", result.Spec.Target)
|
||||||
|
fmt.Printf(" • Duration: %v\n", result.EndTime.Sub(result.StartTime))
|
||||||
|
fmt.Printf(" • Events captured: %d\n", result.EventCount)
|
||||||
|
fmt.Printf(" • Events per second: %.2f\n", result.Statistics.EventsPerSecond)
|
||||||
|
fmt.Printf(" • Summary: %s\n", result.Summary)
|
||||||
|
|
||||||
|
if len(result.Events) > 0 {
|
||||||
|
fmt.Printf("\n📝 Sample Events (first 3):\n")
|
||||||
|
for i, event := range result.Events {
|
||||||
|
if i >= 3 {
|
||||||
|
break
|
||||||
|
}
|
||||||
|
fmt.Printf(" %d. PID:%d TID:%d Process:%s Message:%s\n",
|
||||||
|
i+1, event.PID, event.TID, event.ProcessName, event.Message)
|
||||||
|
}
|
||||||
|
|
||||||
|
if len(result.Events) > 3 {
|
||||||
|
fmt.Printf(" ... and %d more events\n", len(result.Events)-3)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// Test 3: Validate the trace produced real data
|
||||||
|
if result.EventCount == 0 {
|
||||||
|
fmt.Println("⚠️ Warning: No events captured - this might be normal for a quiet system")
|
||||||
|
} else {
|
||||||
|
fmt.Printf("✅ Successfully captured %d real eBPF events!\n", result.EventCount)
|
||||||
|
}
|
||||||
|
|
||||||
|
fmt.Println("\n🧪 Testing comprehensive system tracing (Network, Disk, CPU, Memory, Userspace)...")
|
||||||
|
|
||||||
|
testSpecs := []TraceSpec{
|
||||||
|
// === SYSCALL TRACING ===
|
||||||
|
{
|
||||||
|
ProbeType: "p", // kprobe
|
||||||
|
Target: "__x64_sys_write",
|
||||||
|
Format: "write: fd=%d count=%d",
|
||||||
|
Arguments: []string{"arg1", "arg3"},
|
||||||
|
Duration: 2,
|
||||||
|
},
|
||||||
|
{
|
||||||
|
ProbeType: "p", // kprobe
|
||||||
|
Target: "__x64_sys_read",
|
||||||
|
Format: "read: fd=%d count=%d",
|
||||||
|
Arguments: []string{"arg1", "arg3"},
|
||||||
|
Duration: 2,
|
||||||
|
},
|
||||||
|
{
|
||||||
|
ProbeType: "p", // kprobe
|
||||||
|
Target: "__x64_sys_connect",
|
||||||
|
Format: "network connect: fd=%d",
|
||||||
|
Arguments: []string{"arg1"},
|
||||||
|
Duration: 2,
|
||||||
|
},
|
||||||
|
{
|
||||||
|
ProbeType: "p", // kprobe
|
||||||
|
Target: "__x64_sys_accept",
|
||||||
|
Format: "network accept: fd=%d",
|
||||||
|
Arguments: []string{"arg1"},
|
||||||
|
Duration: 2,
|
||||||
|
},
|
||||||
|
// === BLOCK I/O TRACING ===
|
||||||
|
{
|
||||||
|
ProbeType: "t", // tracepoint
|
||||||
|
Target: "block:block_io_start",
|
||||||
|
Format: "block I/O start",
|
||||||
|
Arguments: []string{},
|
||||||
|
Duration: 2,
|
||||||
|
},
|
||||||
|
{
|
||||||
|
ProbeType: "t", // tracepoint
|
||||||
|
Target: "block:block_io_done",
|
||||||
|
Format: "block I/O complete",
|
||||||
|
Arguments: []string{},
|
||||||
|
Duration: 2,
|
||||||
|
},
|
||||||
|
// === CPU SCHEDULER TRACING ===
|
||||||
|
{
|
||||||
|
ProbeType: "t", // tracepoint
|
||||||
|
Target: "sched:sched_migrate_task",
|
||||||
|
Format: "task migration",
|
||||||
|
Arguments: []string{},
|
||||||
|
Duration: 2,
|
||||||
|
},
|
||||||
|
{
|
||||||
|
ProbeType: "t", // tracepoint
|
||||||
|
Target: "sched:sched_pi_setprio",
|
||||||
|
Format: "priority change",
|
||||||
|
Arguments: []string{},
|
||||||
|
Duration: 2,
|
||||||
|
},
|
||||||
|
// === MEMORY MANAGEMENT ===
|
||||||
|
{
|
||||||
|
ProbeType: "t", // tracepoint
|
||||||
|
Target: "syscalls:sys_enter_brk",
|
||||||
|
Format: "memory allocation: brk",
|
||||||
|
Arguments: []string{},
|
||||||
|
Duration: 2,
|
||||||
|
},
|
||||||
|
// === KERNEL MEMORY TRACING ===
|
||||||
|
{
|
||||||
|
ProbeType: "t", // tracepoint
|
||||||
|
Target: "kmem:kfree",
|
||||||
|
Format: "kernel memory free",
|
||||||
|
Arguments: []string{},
|
||||||
|
Duration: 2,
|
||||||
|
},
|
||||||
|
}
|
||||||
|
|
||||||
|
for i, testSpec := range testSpecs {
|
||||||
|
category := "unknown"
|
||||||
|
if strings.Contains(testSpec.Target, "sys_write") || strings.Contains(testSpec.Target, "sys_read") {
|
||||||
|
category = "filesystem"
|
||||||
|
} else if strings.Contains(testSpec.Target, "sys_connect") || strings.Contains(testSpec.Target, "sys_accept") {
|
||||||
|
category = "network"
|
||||||
|
} else if strings.Contains(testSpec.Target, "block:") {
|
||||||
|
category = "disk I/O"
|
||||||
|
} else if strings.Contains(testSpec.Target, "sched:") {
|
||||||
|
category = "CPU/scheduler"
|
||||||
|
} else if strings.Contains(testSpec.Target, "sys_brk") || strings.Contains(testSpec.Target, "kmem:") {
|
||||||
|
category = "memory"
|
||||||
|
}
|
||||||
|
|
||||||
|
fmt.Printf("\n 🔍 Test %d: [%s] Tracing %s for %d seconds\n", i+1, category, testSpec.Target, testSpec.Duration)
|
||||||
|
|
||||||
|
testTraceID, err := manager.StartTrace(testSpec)
|
||||||
|
if err != nil {
|
||||||
|
fmt.Printf(" ❌ Failed to start: %v\n", err)
|
||||||
|
continue
|
||||||
|
}
|
||||||
|
|
||||||
|
// Generate activity specific to this trace type
|
||||||
|
go func(target, probeType string) {
|
||||||
|
time.Sleep(500 * time.Millisecond)
|
||||||
|
switch {
|
||||||
|
case strings.Contains(target, "sys_write") || strings.Contains(target, "sys_read"):
|
||||||
|
// Generate file I/O
|
||||||
|
for j := 0; j < 3; j++ {
|
||||||
|
testFile := fmt.Sprintf("/tmp/io_test_%d.txt", j)
|
||||||
|
if file, err := os.Create(testFile); err == nil {
|
||||||
|
file.WriteString("BCC tracing test data for I/O operations")
|
||||||
|
file.Sync()
|
||||||
|
file.Close()
|
||||||
|
|
||||||
|
// Read the file back
|
||||||
|
if readFile, err := os.Open(testFile); err == nil {
|
||||||
|
buffer := make([]byte, 1024)
|
||||||
|
readFile.Read(buffer)
|
||||||
|
readFile.Close()
|
||||||
|
}
|
||||||
|
os.Remove(testFile)
|
||||||
|
}
|
||||||
|
time.Sleep(200 * time.Millisecond)
|
||||||
|
}
|
||||||
|
case strings.Contains(target, "block:"):
|
||||||
|
// Generate disk I/O to trigger block layer events
|
||||||
|
for j := 0; j < 3; j++ {
|
||||||
|
testFile := fmt.Sprintf("/tmp/block_test_%d.txt", j)
|
||||||
|
if file, err := os.Create(testFile); err == nil {
|
||||||
|
// Write substantial data to trigger block I/O
|
||||||
|
data := make([]byte, 1024*4) // 4KB
|
||||||
|
for k := range data {
|
||||||
|
data[k] = byte(k % 256)
|
||||||
|
}
|
||||||
|
file.Write(data)
|
||||||
|
file.Sync() // Force write to disk
|
||||||
|
file.Close()
|
||||||
|
}
|
||||||
|
os.Remove(testFile)
|
||||||
|
time.Sleep(300 * time.Millisecond)
|
||||||
|
}
|
||||||
|
case strings.Contains(target, "sched:"):
|
||||||
|
// Generate CPU activity to trigger scheduler events
|
||||||
|
go func() {
|
||||||
|
for j := 0; j < 100; j++ {
|
||||||
|
// Create short-lived goroutines to trigger scheduler activity
|
||||||
|
go func() {
|
||||||
|
time.Sleep(time.Millisecond * 1)
|
||||||
|
}()
|
||||||
|
time.Sleep(time.Millisecond * 10)
|
||||||
|
}
|
||||||
|
}()
|
||||||
|
case strings.Contains(target, "sys_brk") || strings.Contains(target, "kmem:"):
|
||||||
|
// Generate memory allocation activity
|
||||||
|
for j := 0; j < 5; j++ {
|
||||||
|
// Allocate and free memory to trigger memory management
|
||||||
|
data := make([]byte, 1024*1024) // 1MB
|
||||||
|
for k := range data {
|
||||||
|
data[k] = byte(k % 256)
|
||||||
|
}
|
||||||
|
data = nil // Allow GC
|
||||||
|
time.Sleep(200 * time.Millisecond)
|
||||||
|
}
|
||||||
|
case strings.Contains(target, "sys_connect") || strings.Contains(target, "sys_accept"):
|
||||||
|
// Network operations (these may not generate events in test environment)
|
||||||
|
fmt.Printf(" Note: Network syscalls may not trigger events without actual network activity\n")
|
||||||
|
default:
|
||||||
|
// Generic activity
|
||||||
|
for j := 0; j < 3; j++ {
|
||||||
|
testFile := fmt.Sprintf("/tmp/generic_test_%d.txt", j)
|
||||||
|
if file, err := os.Create(testFile); err == nil {
|
||||||
|
file.WriteString("Generic test activity")
|
||||||
|
file.Close()
|
||||||
|
}
|
||||||
|
os.Remove(testFile)
|
||||||
|
time.Sleep(300 * time.Millisecond)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}(testSpec.Target, testSpec.ProbeType)
|
||||||
|
|
||||||
|
// Wait for trace completion
|
||||||
|
time.Sleep(time.Duration(testSpec.Duration+1) * time.Second)
|
||||||
|
|
||||||
|
testResult, err := manager.GetTraceResult(testTraceID)
|
||||||
|
if err != nil {
|
||||||
|
manager.StopTrace(testTraceID)
|
||||||
|
fmt.Printf(" ⚠️ Result error: %v\n", err)
|
||||||
|
continue
|
||||||
|
}
|
||||||
|
|
||||||
|
fmt.Printf(" 📊 Results for %s:\n", testSpec.Target)
|
||||||
|
fmt.Printf(" • Total events: %d\n", testResult.EventCount)
|
||||||
|
fmt.Printf(" • Events/sec: %.2f\n", testResult.Statistics.EventsPerSecond)
|
||||||
|
fmt.Printf(" • Duration: %v\n", testResult.EndTime.Sub(testResult.StartTime))
|
||||||
|
|
||||||
|
// Show process breakdown
|
||||||
|
if len(testResult.Statistics.TopProcesses) > 0 {
|
||||||
|
fmt.Printf(" • Top processes:\n")
|
||||||
|
for j, proc := range testResult.Statistics.TopProcesses {
|
||||||
|
if j >= 3 { // Show top 3
|
||||||
|
break
|
||||||
|
}
|
||||||
|
fmt.Printf(" - %s: %d events (%.1f%%)\n",
|
||||||
|
proc.ProcessName, proc.EventCount, proc.Percentage)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// Show sample events with PIDs, counts, etc.
|
||||||
|
if len(testResult.Events) > 0 {
|
||||||
|
fmt.Printf(" • Sample events:\n")
|
||||||
|
for j, event := range testResult.Events {
|
||||||
|
if j >= 5 { // Show first 5 events
|
||||||
|
break
|
||||||
|
}
|
||||||
|
fmt.Printf(" [%d] PID:%d TID:%d Process:%s Message:%s\n",
|
||||||
|
j+1, event.PID, event.TID, event.ProcessName, event.Message)
|
||||||
|
}
|
||||||
|
if len(testResult.Events) > 5 {
|
||||||
|
fmt.Printf(" ... and %d more events\n", len(testResult.Events)-5)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
if testResult.EventCount > 0 {
|
||||||
|
fmt.Printf(" ✅ Success: Captured %d real syscall events!\n", testResult.EventCount)
|
||||||
|
} else {
|
||||||
|
fmt.Printf(" ⚠️ No events captured (may be normal for this syscall)\n")
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
fmt.Println("\n🎉 BCC Trace Manager Root Test Complete!")
|
||||||
|
fmt.Println("✅ Real eBPF tracing is working and ready for production use!")
|
||||||
|
}
|
||||||
|
|
||||||
|
// TestAgentEBPFIntegration tests the agent's integration with BCC-style eBPF tracing
|
||||||
|
// This demonstrates the complete flow from agent to eBPF results
|
||||||
|
func TestAgentEBPFIntegration(t *testing.T) {
|
||||||
|
if os.Geteuid() != 0 {
|
||||||
|
t.Skip("⚠️ Skipping agent integration test - requires root access")
|
||||||
|
return
|
||||||
|
}
|
||||||
|
|
||||||
|
fmt.Println("\n=== Agent eBPF Integration Test ===")
|
||||||
|
fmt.Println("This test demonstrates the complete agent flow with BCC-style tracing")
|
||||||
|
|
||||||
|
// Create eBPF manager directly for testing
|
||||||
|
manager := NewBCCTraceManager()
|
||||||
|
|
||||||
|
// Test multiple syscalls that would be sent by remote API
|
||||||
|
testEBPFRequests := []struct {
|
||||||
|
Name string `json:"name"`
|
||||||
|
Type string `json:"type"`
|
||||||
|
Target string `json:"target"`
|
||||||
|
Duration int `json:"duration"`
|
||||||
|
Description string `json:"description"`
|
||||||
|
Filters map[string]string `json:"filters"`
|
||||||
|
}{
|
||||||
|
{
|
||||||
|
Name: "file_operations",
|
||||||
|
Type: "syscall",
|
||||||
|
Target: "sys_openat", // Will be converted to __x64_sys_openat
|
||||||
|
Duration: 3,
|
||||||
|
Description: "trace file open operations",
|
||||||
|
Filters: map[string]string{},
|
||||||
|
},
|
||||||
|
{
|
||||||
|
Name: "network_operations",
|
||||||
|
Type: "syscall",
|
||||||
|
Target: "__x64_sys_connect",
|
||||||
|
Duration: 2,
|
||||||
|
Description: "trace network connections",
|
||||||
|
Filters: map[string]string{},
|
||||||
|
},
|
||||||
|
{
|
||||||
|
Name: "io_operations",
|
||||||
|
Type: "syscall",
|
||||||
|
Target: "sys_write",
|
||||||
|
Duration: 2,
|
||||||
|
Description: "trace write operations",
|
||||||
|
Filters: map[string]string{},
|
||||||
|
},
|
||||||
|
}
|
||||||
|
|
||||||
|
fmt.Printf("🚀 Testing eBPF manager with %d eBPF programs...\n\n", len(testEBPFRequests))
|
||||||
|
|
||||||
|
// Convert to trace specs and execute using manager directly
|
||||||
|
var traceSpecs []TraceSpec
|
||||||
|
for _, req := range testEBPFRequests {
|
||||||
|
spec := TraceSpec{
|
||||||
|
ProbeType: "p", // kprobe
|
||||||
|
Target: "__x64_" + req.Target,
|
||||||
|
Format: req.Description,
|
||||||
|
Duration: req.Duration,
|
||||||
|
}
|
||||||
|
traceSpecs = append(traceSpecs, spec)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Execute traces sequentially for testing
|
||||||
|
var results []map[string]interface{}
|
||||||
|
for i, spec := range traceSpecs {
|
||||||
|
fmt.Printf("Starting trace %d: %s\n", i+1, spec.Target)
|
||||||
|
|
||||||
|
traceID, err := manager.StartTrace(spec)
|
||||||
|
if err != nil {
|
||||||
|
fmt.Printf("Failed to start trace: %v\n", err)
|
||||||
|
continue
|
||||||
|
}
|
||||||
|
|
||||||
|
// Wait for trace duration
|
||||||
|
time.Sleep(time.Duration(spec.Duration) * time.Second)
|
||||||
|
|
||||||
|
traceResult, err := manager.GetTraceResult(traceID)
|
||||||
|
if err != nil {
|
||||||
|
fmt.Printf("Failed to get results: %v\n", err)
|
||||||
|
continue
|
||||||
|
}
|
||||||
|
|
||||||
|
result := map[string]interface{}{
|
||||||
|
"name": testEBPFRequests[i].Name,
|
||||||
|
"target": spec.Target,
|
||||||
|
"success": true,
|
||||||
|
"event_count": traceResult.EventCount,
|
||||||
|
"summary": traceResult.Summary,
|
||||||
|
}
|
||||||
|
results = append(results, result)
|
||||||
|
}
|
||||||
|
|
||||||
|
fmt.Printf("📊 Agent eBPF Execution Results:\n")
|
||||||
|
fmt.Printf("=" + strings.Repeat("=", 50) + "\n\n")
|
||||||
|
|
||||||
|
for i, result := range results {
|
||||||
|
fmt.Printf("🔍 Program %d: %s\n", i+1, result["name"])
|
||||||
|
fmt.Printf(" Target: %s\n", result["target"])
|
||||||
|
fmt.Printf(" Type: %s\n", result["type"])
|
||||||
|
fmt.Printf(" Status: %s\n", result["status"])
|
||||||
|
fmt.Printf(" Success: %v\n", result["success"])
|
||||||
|
|
||||||
|
if result["success"].(bool) {
|
||||||
|
if eventCount, ok := result["event_count"].(int); ok {
|
||||||
|
fmt.Printf(" Events captured: %d\n", eventCount)
|
||||||
|
}
|
||||||
|
if dataPoints, ok := result["data_points"].(int); ok {
|
||||||
|
fmt.Printf(" Data points: %d\n", dataPoints)
|
||||||
|
}
|
||||||
|
if summary, ok := result["summary"].(string); ok {
|
||||||
|
fmt.Printf(" Summary: %s\n", summary)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Show events if available
|
||||||
|
if events, ok := result["events"].([]TraceEvent); ok && len(events) > 0 {
|
||||||
|
fmt.Printf(" Sample events:\n")
|
||||||
|
for j, event := range events {
|
||||||
|
if j >= 3 { // Show first 3
|
||||||
|
break
|
||||||
|
}
|
||||||
|
fmt.Printf(" [%d] PID:%d Process:%s Message:%s\n",
|
||||||
|
j+1, event.PID, event.ProcessName, event.Message)
|
||||||
|
}
|
||||||
|
if len(events) > 3 {
|
||||||
|
fmt.Printf(" ... and %d more events\n", len(events)-3)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// Show statistics if available
|
||||||
|
if stats, ok := result["statistics"].(TraceStats); ok {
|
||||||
|
fmt.Printf(" Statistics:\n")
|
||||||
|
fmt.Printf(" - Events/sec: %.2f\n", stats.EventsPerSecond)
|
||||||
|
fmt.Printf(" - Total processes: %d\n", len(stats.EventsByProcess))
|
||||||
|
if len(stats.TopProcesses) > 0 {
|
||||||
|
fmt.Printf(" - Top process: %s (%d events)\n",
|
||||||
|
stats.TopProcesses[0].ProcessName, stats.TopProcesses[0].EventCount)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
} else {
|
||||||
|
if errMsg, ok := result["error"].(string); ok {
|
||||||
|
fmt.Printf(" Error: %s\n", errMsg)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
fmt.Println()
|
||||||
|
}
|
||||||
|
|
||||||
|
// Validate expected agent response format
|
||||||
|
t.Run("ValidateAgentResponseFormat", func(t *testing.T) {
|
||||||
|
for i, result := range results {
|
||||||
|
// Check required fields
|
||||||
|
requiredFields := []string{"name", "type", "target", "duration", "description", "status", "success"}
|
||||||
|
for _, field := range requiredFields {
|
||||||
|
if _, exists := result[field]; !exists {
|
||||||
|
t.Errorf("Result %d missing required field: %s", i, field)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// If successful, check for data fields
|
||||||
|
if success, ok := result["success"].(bool); ok && success {
|
||||||
|
// Should have either event_count or data_points
|
||||||
|
hasEventCount := false
|
||||||
|
hasDataPoints := false
|
||||||
|
|
||||||
|
if _, ok := result["event_count"]; ok {
|
||||||
|
hasEventCount = true
|
||||||
|
}
|
||||||
|
if _, ok := result["data_points"]; ok {
|
||||||
|
hasDataPoints = true
|
||||||
|
}
|
||||||
|
|
||||||
|
if !hasEventCount && !hasDataPoints {
|
||||||
|
t.Errorf("Successful result %d should have event_count or data_points", i)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
})
|
||||||
|
|
||||||
|
fmt.Println("✅ Agent eBPF Integration Test Complete!")
|
||||||
|
fmt.Println("📈 The agent correctly processes eBPF requests and returns detailed syscall data!")
|
||||||
|
}
|
||||||
@@ -1,4 +1,4 @@
|
|||||||
package main
|
package executor
|
||||||
|
|
||||||
import (
|
import (
|
||||||
"context"
|
"context"
|
||||||
@@ -6,6 +6,8 @@ import (
|
|||||||
"os/exec"
|
"os/exec"
|
||||||
"strings"
|
"strings"
|
||||||
"time"
|
"time"
|
||||||
|
|
||||||
|
"nannyagentv2/internal/types"
|
||||||
)
|
)
|
||||||
|
|
||||||
// CommandExecutor handles safe execution of diagnostic commands
|
// CommandExecutor handles safe execution of diagnostic commands
|
||||||
@@ -21,8 +23,8 @@ func NewCommandExecutor(timeout time.Duration) *CommandExecutor {
|
|||||||
}
|
}
|
||||||
|
|
||||||
// Execute executes a command safely with timeout and validation
|
// Execute executes a command safely with timeout and validation
|
||||||
func (ce *CommandExecutor) Execute(cmd Command) CommandResult {
|
func (ce *CommandExecutor) Execute(cmd types.Command) types.CommandResult {
|
||||||
result := CommandResult{
|
result := types.CommandResult{
|
||||||
ID: cmd.ID,
|
ID: cmd.ID,
|
||||||
Command: cmd.Command,
|
Command: cmd.Command,
|
||||||
}
|
}
|
||||||
183
internal/logging/logger.go
Normal file
183
internal/logging/logger.go
Normal file
@@ -0,0 +1,183 @@
|
|||||||
|
package logging
|
||||||
|
|
||||||
|
import (
|
||||||
|
"fmt"
|
||||||
|
"log"
|
||||||
|
"log/syslog"
|
||||||
|
"os"
|
||||||
|
"strings"
|
||||||
|
)
|
||||||
|
|
||||||
|
// LogLevel defines the logging level
|
||||||
|
type LogLevel int
|
||||||
|
|
||||||
|
const (
|
||||||
|
LevelDebug LogLevel = iota
|
||||||
|
LevelInfo
|
||||||
|
LevelWarning
|
||||||
|
LevelError
|
||||||
|
)
|
||||||
|
|
||||||
|
func (l LogLevel) String() string {
|
||||||
|
switch l {
|
||||||
|
case LevelDebug:
|
||||||
|
return "DEBUG"
|
||||||
|
case LevelInfo:
|
||||||
|
return "INFO"
|
||||||
|
case LevelWarning:
|
||||||
|
return "WARN"
|
||||||
|
case LevelError:
|
||||||
|
return "ERROR"
|
||||||
|
default:
|
||||||
|
return "INFO"
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// Logger provides structured logging with configurable levels
|
||||||
|
type Logger struct {
|
||||||
|
syslogWriter *syslog.Writer
|
||||||
|
level LogLevel
|
||||||
|
showEmoji bool
|
||||||
|
}
|
||||||
|
|
||||||
|
var defaultLogger *Logger
|
||||||
|
|
||||||
|
func init() {
|
||||||
|
defaultLogger = NewLogger()
|
||||||
|
}
|
||||||
|
|
||||||
|
// NewLogger creates a new logger with default configuration
|
||||||
|
func NewLogger() *Logger {
|
||||||
|
return NewLoggerWithLevel(getLogLevelFromEnv())
|
||||||
|
}
|
||||||
|
|
||||||
|
// NewLoggerWithLevel creates a logger with specified level
|
||||||
|
func NewLoggerWithLevel(level LogLevel) *Logger {
|
||||||
|
l := &Logger{
|
||||||
|
level: level,
|
||||||
|
showEmoji: os.Getenv("LOG_NO_EMOJI") != "true",
|
||||||
|
}
|
||||||
|
|
||||||
|
// Try to connect to syslog
|
||||||
|
if writer, err := syslog.New(syslog.LOG_INFO|syslog.LOG_DAEMON, "nannyagentv2"); err == nil {
|
||||||
|
l.syslogWriter = writer
|
||||||
|
}
|
||||||
|
|
||||||
|
return l
|
||||||
|
}
|
||||||
|
|
||||||
|
// getLogLevelFromEnv parses log level from environment variable
|
||||||
|
func getLogLevelFromEnv() LogLevel {
|
||||||
|
level := strings.ToUpper(os.Getenv("LOG_LEVEL"))
|
||||||
|
switch level {
|
||||||
|
case "DEBUG":
|
||||||
|
return LevelDebug
|
||||||
|
case "INFO", "":
|
||||||
|
return LevelInfo
|
||||||
|
case "WARN", "WARNING":
|
||||||
|
return LevelWarning
|
||||||
|
case "ERROR":
|
||||||
|
return LevelError
|
||||||
|
default:
|
||||||
|
return LevelInfo
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// logMessage handles the actual logging
|
||||||
|
func (l *Logger) logMessage(level LogLevel, format string, args ...interface{}) {
|
||||||
|
if level < l.level {
|
||||||
|
return
|
||||||
|
}
|
||||||
|
|
||||||
|
msg := fmt.Sprintf(format, args...)
|
||||||
|
prefix := fmt.Sprintf("[%s]", level.String())
|
||||||
|
|
||||||
|
// Add emoji prefix if enabled
|
||||||
|
if l.showEmoji {
|
||||||
|
switch level {
|
||||||
|
case LevelDebug:
|
||||||
|
prefix = "🔍 " + prefix
|
||||||
|
case LevelInfo:
|
||||||
|
prefix = "ℹ️ " + prefix
|
||||||
|
case LevelWarning:
|
||||||
|
prefix = "⚠️ " + prefix
|
||||||
|
case LevelError:
|
||||||
|
prefix = "❌ " + prefix
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// Log to syslog if available
|
||||||
|
if l.syslogWriter != nil {
|
||||||
|
switch level {
|
||||||
|
case LevelDebug:
|
||||||
|
l.syslogWriter.Debug(msg)
|
||||||
|
case LevelInfo:
|
||||||
|
l.syslogWriter.Info(msg)
|
||||||
|
case LevelWarning:
|
||||||
|
l.syslogWriter.Warning(msg)
|
||||||
|
case LevelError:
|
||||||
|
l.syslogWriter.Err(msg)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
log.Printf("%s %s", prefix, msg)
|
||||||
|
}
|
||||||
|
|
||||||
|
func (l *Logger) Debug(format string, args ...interface{}) {
|
||||||
|
l.logMessage(LevelDebug, format, args...)
|
||||||
|
}
|
||||||
|
|
||||||
|
func (l *Logger) Info(format string, args ...interface{}) {
|
||||||
|
l.logMessage(LevelInfo, format, args...)
|
||||||
|
}
|
||||||
|
|
||||||
|
func (l *Logger) Warning(format string, args ...interface{}) {
|
||||||
|
l.logMessage(LevelWarning, format, args...)
|
||||||
|
}
|
||||||
|
|
||||||
|
func (l *Logger) Error(format string, args ...interface{}) {
|
||||||
|
l.logMessage(LevelError, format, args...)
|
||||||
|
}
|
||||||
|
|
||||||
|
// SetLevel changes the logging level
|
||||||
|
func (l *Logger) SetLevel(level LogLevel) {
|
||||||
|
l.level = level
|
||||||
|
}
|
||||||
|
|
||||||
|
// GetLevel returns current logging level
|
||||||
|
func (l *Logger) GetLevel() LogLevel {
|
||||||
|
return l.level
|
||||||
|
}
|
||||||
|
|
||||||
|
func (l *Logger) Close() {
|
||||||
|
if l.syslogWriter != nil {
|
||||||
|
l.syslogWriter.Close()
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// Global logging functions
|
||||||
|
func Debug(format string, args ...interface{}) {
|
||||||
|
defaultLogger.Debug(format, args...)
|
||||||
|
}
|
||||||
|
|
||||||
|
func Info(format string, args ...interface{}) {
|
||||||
|
defaultLogger.Info(format, args...)
|
||||||
|
}
|
||||||
|
|
||||||
|
func Warning(format string, args ...interface{}) {
|
||||||
|
defaultLogger.Warning(format, args...)
|
||||||
|
}
|
||||||
|
|
||||||
|
func Error(format string, args ...interface{}) {
|
||||||
|
defaultLogger.Error(format, args...)
|
||||||
|
}
|
||||||
|
|
||||||
|
// SetLevel sets the global logger level
|
||||||
|
func SetLevel(level LogLevel) {
|
||||||
|
defaultLogger.SetLevel(level)
|
||||||
|
}
|
||||||
|
|
||||||
|
// GetLevel gets the global logger level
|
||||||
|
func GetLevel() LogLevel {
|
||||||
|
return defaultLogger.GetLevel()
|
||||||
|
}
|
||||||
318
internal/metrics/collector.go
Normal file
318
internal/metrics/collector.go
Normal file
@@ -0,0 +1,318 @@
|
|||||||
|
package metrics
|
||||||
|
|
||||||
|
import (
|
||||||
|
"bytes"
|
||||||
|
"crypto/sha256"
|
||||||
|
"encoding/json"
|
||||||
|
"fmt"
|
||||||
|
"io"
|
||||||
|
"math"
|
||||||
|
"net/http"
|
||||||
|
"strings"
|
||||||
|
"time"
|
||||||
|
|
||||||
|
"github.com/shirou/gopsutil/v3/cpu"
|
||||||
|
"github.com/shirou/gopsutil/v3/disk"
|
||||||
|
"github.com/shirou/gopsutil/v3/host"
|
||||||
|
"github.com/shirou/gopsutil/v3/load"
|
||||||
|
"github.com/shirou/gopsutil/v3/mem"
|
||||||
|
psnet "github.com/shirou/gopsutil/v3/net"
|
||||||
|
|
||||||
|
"nannyagentv2/internal/types"
|
||||||
|
)
|
||||||
|
|
||||||
|
// Collector handles system metrics collection
|
||||||
|
type Collector struct {
|
||||||
|
agentVersion string
|
||||||
|
}
|
||||||
|
|
||||||
|
// NewCollector creates a new metrics collector
|
||||||
|
func NewCollector(agentVersion string) *Collector {
|
||||||
|
return &Collector{
|
||||||
|
agentVersion: agentVersion,
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// GatherSystemMetrics collects comprehensive system metrics
|
||||||
|
func (c *Collector) GatherSystemMetrics() (*types.SystemMetrics, error) {
|
||||||
|
metrics := &types.SystemMetrics{
|
||||||
|
Timestamp: time.Now(),
|
||||||
|
}
|
||||||
|
|
||||||
|
// System Information
|
||||||
|
if hostInfo, err := host.Info(); err == nil {
|
||||||
|
metrics.Hostname = hostInfo.Hostname
|
||||||
|
metrics.Platform = hostInfo.Platform
|
||||||
|
metrics.PlatformFamily = hostInfo.PlatformFamily
|
||||||
|
metrics.PlatformVersion = hostInfo.PlatformVersion
|
||||||
|
metrics.KernelVersion = hostInfo.KernelVersion
|
||||||
|
metrics.KernelArch = hostInfo.KernelArch
|
||||||
|
}
|
||||||
|
|
||||||
|
// CPU Metrics
|
||||||
|
if percentages, err := cpu.Percent(time.Second, false); err == nil && len(percentages) > 0 {
|
||||||
|
metrics.CPUUsage = math.Round(percentages[0]*100) / 100
|
||||||
|
}
|
||||||
|
|
||||||
|
if cpuInfo, err := cpu.Info(); err == nil && len(cpuInfo) > 0 {
|
||||||
|
metrics.CPUCores = len(cpuInfo)
|
||||||
|
metrics.CPUModel = cpuInfo[0].ModelName
|
||||||
|
}
|
||||||
|
|
||||||
|
// Memory Metrics
|
||||||
|
if memInfo, err := mem.VirtualMemory(); err == nil {
|
||||||
|
metrics.MemoryUsage = math.Round(float64(memInfo.Used)/(1024*1024)*100) / 100 // MB
|
||||||
|
metrics.MemoryTotal = memInfo.Total
|
||||||
|
metrics.MemoryUsed = memInfo.Used
|
||||||
|
metrics.MemoryFree = memInfo.Free
|
||||||
|
metrics.MemoryAvailable = memInfo.Available
|
||||||
|
}
|
||||||
|
|
||||||
|
if swapInfo, err := mem.SwapMemory(); err == nil {
|
||||||
|
metrics.SwapTotal = swapInfo.Total
|
||||||
|
metrics.SwapUsed = swapInfo.Used
|
||||||
|
metrics.SwapFree = swapInfo.Free
|
||||||
|
}
|
||||||
|
|
||||||
|
// Disk Metrics
|
||||||
|
if diskInfo, err := disk.Usage("/"); err == nil {
|
||||||
|
metrics.DiskUsage = math.Round(diskInfo.UsedPercent*100) / 100
|
||||||
|
metrics.DiskTotal = diskInfo.Total
|
||||||
|
metrics.DiskUsed = diskInfo.Used
|
||||||
|
metrics.DiskFree = diskInfo.Free
|
||||||
|
}
|
||||||
|
|
||||||
|
// Load Averages
|
||||||
|
if loadAvg, err := load.Avg(); err == nil {
|
||||||
|
metrics.LoadAvg1 = math.Round(loadAvg.Load1*100) / 100
|
||||||
|
metrics.LoadAvg5 = math.Round(loadAvg.Load5*100) / 100
|
||||||
|
metrics.LoadAvg15 = math.Round(loadAvg.Load15*100) / 100
|
||||||
|
}
|
||||||
|
|
||||||
|
// Process Count (simplified - using a constant for now)
|
||||||
|
// Note: gopsutil doesn't have host.Processes(), would need process.Processes()
|
||||||
|
metrics.ProcessCount = 0 // Placeholder
|
||||||
|
|
||||||
|
// Network Metrics
|
||||||
|
netIn, netOut := c.getNetworkStats()
|
||||||
|
metrics.NetworkInKbps = netIn
|
||||||
|
metrics.NetworkOutKbps = netOut
|
||||||
|
|
||||||
|
if netIOCounters, err := psnet.IOCounters(false); err == nil && len(netIOCounters) > 0 {
|
||||||
|
netIO := netIOCounters[0]
|
||||||
|
metrics.NetworkInBytes = netIO.BytesRecv
|
||||||
|
metrics.NetworkOutBytes = netIO.BytesSent
|
||||||
|
}
|
||||||
|
|
||||||
|
// IP Address and Location
|
||||||
|
metrics.IPAddress = c.getIPAddress()
|
||||||
|
metrics.Location = c.getLocation() // Placeholder
|
||||||
|
|
||||||
|
// Filesystem Information
|
||||||
|
metrics.FilesystemInfo = c.getFilesystemInfo()
|
||||||
|
|
||||||
|
// Block Devices
|
||||||
|
metrics.BlockDevices = c.getBlockDevices()
|
||||||
|
|
||||||
|
return metrics, nil
|
||||||
|
}
|
||||||
|
|
||||||
|
// getNetworkStats returns network input/output rates in Kbps
|
||||||
|
func (c *Collector) getNetworkStats() (float64, float64) {
|
||||||
|
netIOCounters, err := psnet.IOCounters(false)
|
||||||
|
if err != nil || len(netIOCounters) == 0 {
|
||||||
|
return 0.0, 0.0
|
||||||
|
}
|
||||||
|
|
||||||
|
// Use the first interface for aggregate stats
|
||||||
|
netIO := netIOCounters[0]
|
||||||
|
|
||||||
|
// Convert bytes to kilobits per second (simplified - cumulative bytes to kilobits)
|
||||||
|
netInKbps := float64(netIO.BytesRecv) * 8 / 1024
|
||||||
|
netOutKbps := float64(netIO.BytesSent) * 8 / 1024
|
||||||
|
|
||||||
|
return netInKbps, netOutKbps
|
||||||
|
}
|
||||||
|
|
||||||
|
// getIPAddress returns the primary IP address of the system
|
||||||
|
func (c *Collector) getIPAddress() string {
|
||||||
|
interfaces, err := psnet.Interfaces()
|
||||||
|
if err != nil {
|
||||||
|
return "unknown"
|
||||||
|
}
|
||||||
|
|
||||||
|
for _, iface := range interfaces {
|
||||||
|
if len(iface.Addrs) > 0 && !strings.Contains(iface.Addrs[0].Addr, "127.0.0.1") {
|
||||||
|
return strings.Split(iface.Addrs[0].Addr, "/")[0] // Remove CIDR if present
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
return "unknown"
|
||||||
|
}
|
||||||
|
|
||||||
|
// getLocation returns basic location information (placeholder)
|
||||||
|
func (c *Collector) getLocation() string {
|
||||||
|
return "unknown" // Would integrate with GeoIP service
|
||||||
|
}
|
||||||
|
|
||||||
|
// getFilesystemInfo returns information about mounted filesystems
|
||||||
|
func (c *Collector) getFilesystemInfo() []types.FilesystemInfo {
|
||||||
|
partitions, err := disk.Partitions(false)
|
||||||
|
if err != nil {
|
||||||
|
return []types.FilesystemInfo{}
|
||||||
|
}
|
||||||
|
|
||||||
|
var filesystems []types.FilesystemInfo
|
||||||
|
for _, partition := range partitions {
|
||||||
|
usage, err := disk.Usage(partition.Mountpoint)
|
||||||
|
if err != nil {
|
||||||
|
continue
|
||||||
|
}
|
||||||
|
|
||||||
|
fs := types.FilesystemInfo{
|
||||||
|
Mountpoint: partition.Mountpoint,
|
||||||
|
Fstype: partition.Fstype,
|
||||||
|
Total: usage.Total,
|
||||||
|
Used: usage.Used,
|
||||||
|
Free: usage.Free,
|
||||||
|
UsagePercent: math.Round(usage.UsedPercent*100) / 100,
|
||||||
|
}
|
||||||
|
filesystems = append(filesystems, fs)
|
||||||
|
}
|
||||||
|
|
||||||
|
return filesystems
|
||||||
|
}
|
||||||
|
|
||||||
|
// getBlockDevices returns information about block devices
|
||||||
|
func (c *Collector) getBlockDevices() []types.BlockDevice {
|
||||||
|
partitions, err := disk.Partitions(true)
|
||||||
|
if err != nil {
|
||||||
|
return []types.BlockDevice{}
|
||||||
|
}
|
||||||
|
|
||||||
|
var devices []types.BlockDevice
|
||||||
|
deviceMap := make(map[string]bool)
|
||||||
|
|
||||||
|
for _, partition := range partitions {
|
||||||
|
// Only include actual block devices
|
||||||
|
if strings.HasPrefix(partition.Device, "/dev/") {
|
||||||
|
deviceName := partition.Device
|
||||||
|
if !deviceMap[deviceName] {
|
||||||
|
deviceMap[deviceName] = true
|
||||||
|
|
||||||
|
device := types.BlockDevice{
|
||||||
|
Name: deviceName,
|
||||||
|
Model: "unknown",
|
||||||
|
Size: 0,
|
||||||
|
SerialNumber: "unknown",
|
||||||
|
}
|
||||||
|
devices = append(devices, device)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
return devices
|
||||||
|
}
|
||||||
|
|
||||||
|
// SendMetrics sends system metrics to the agent-auth-api endpoint
|
||||||
|
func (c *Collector) SendMetrics(agentAuthURL, accessToken, agentID string, metrics *types.SystemMetrics) error {
|
||||||
|
// Create flattened metrics request for agent-auth-api
|
||||||
|
metricsReq := c.CreateMetricsRequest(agentID, metrics)
|
||||||
|
|
||||||
|
return c.sendMetricsRequest(agentAuthURL, accessToken, metricsReq)
|
||||||
|
}
|
||||||
|
|
||||||
|
// CreateMetricsRequest converts SystemMetrics to the flattened format expected by agent-auth-api
|
||||||
|
func (c *Collector) CreateMetricsRequest(agentID string, systemMetrics *types.SystemMetrics) *types.MetricsRequest {
|
||||||
|
return &types.MetricsRequest{
|
||||||
|
AgentID: agentID,
|
||||||
|
CPUUsage: systemMetrics.CPUUsage,
|
||||||
|
MemoryUsage: systemMetrics.MemoryUsage,
|
||||||
|
DiskUsage: systemMetrics.DiskUsage,
|
||||||
|
NetworkInKbps: systemMetrics.NetworkInKbps,
|
||||||
|
NetworkOutKbps: systemMetrics.NetworkOutKbps,
|
||||||
|
IPAddress: systemMetrics.IPAddress,
|
||||||
|
Location: systemMetrics.Location,
|
||||||
|
AgentVersion: c.agentVersion,
|
||||||
|
KernelVersion: systemMetrics.KernelVersion,
|
||||||
|
DeviceFingerprint: c.generateDeviceFingerprint(systemMetrics),
|
||||||
|
LoadAverages: map[string]float64{
|
||||||
|
"load1": systemMetrics.LoadAvg1,
|
||||||
|
"load5": systemMetrics.LoadAvg5,
|
||||||
|
"load15": systemMetrics.LoadAvg15,
|
||||||
|
},
|
||||||
|
OSInfo: map[string]string{
|
||||||
|
"cpu_cores": fmt.Sprintf("%d", systemMetrics.CPUCores),
|
||||||
|
"memory": fmt.Sprintf("%.1fGi", float64(systemMetrics.MemoryTotal)/(1024*1024*1024)),
|
||||||
|
"uptime": "unknown", // Will be calculated by the server or client
|
||||||
|
"platform": systemMetrics.Platform,
|
||||||
|
"platform_family": systemMetrics.PlatformFamily,
|
||||||
|
"platform_version": systemMetrics.PlatformVersion,
|
||||||
|
"kernel_version": systemMetrics.KernelVersion,
|
||||||
|
"kernel_arch": systemMetrics.KernelArch,
|
||||||
|
},
|
||||||
|
FilesystemInfo: systemMetrics.FilesystemInfo,
|
||||||
|
BlockDevices: systemMetrics.BlockDevices,
|
||||||
|
NetworkStats: map[string]uint64{
|
||||||
|
"bytes_sent": systemMetrics.NetworkOutBytes,
|
||||||
|
"bytes_recv": systemMetrics.NetworkInBytes,
|
||||||
|
"total_bytes": systemMetrics.NetworkInBytes + systemMetrics.NetworkOutBytes,
|
||||||
|
},
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// sendMetricsRequest sends the metrics request to the agent-auth-api
|
||||||
|
func (c *Collector) sendMetricsRequest(agentAuthURL, accessToken string, metricsReq *types.MetricsRequest) error {
|
||||||
|
// Wrap metrics in the expected payload structure
|
||||||
|
payload := map[string]interface{}{
|
||||||
|
"metrics": metricsReq,
|
||||||
|
"timestamp": time.Now().UTC().Format(time.RFC3339),
|
||||||
|
}
|
||||||
|
|
||||||
|
jsonData, err := json.Marshal(payload)
|
||||||
|
if err != nil {
|
||||||
|
return fmt.Errorf("failed to marshal metrics: %w", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Send to /metrics endpoint
|
||||||
|
metricsURL := fmt.Sprintf("%s/metrics", agentAuthURL)
|
||||||
|
req, err := http.NewRequest("POST", metricsURL, bytes.NewBuffer(jsonData))
|
||||||
|
if err != nil {
|
||||||
|
return fmt.Errorf("failed to create request: %w", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
req.Header.Set("Content-Type", "application/json")
|
||||||
|
req.Header.Set("Authorization", fmt.Sprintf("Bearer %s", accessToken))
|
||||||
|
|
||||||
|
client := &http.Client{Timeout: 30 * time.Second}
|
||||||
|
resp, err := client.Do(req)
|
||||||
|
if err != nil {
|
||||||
|
return fmt.Errorf("failed to send metrics: %w", err)
|
||||||
|
}
|
||||||
|
defer resp.Body.Close()
|
||||||
|
|
||||||
|
// Read response
|
||||||
|
body, err := io.ReadAll(resp.Body)
|
||||||
|
if err != nil {
|
||||||
|
return fmt.Errorf("failed to read response: %w", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Check response status
|
||||||
|
if resp.StatusCode == http.StatusUnauthorized {
|
||||||
|
return fmt.Errorf("unauthorized")
|
||||||
|
}
|
||||||
|
|
||||||
|
if resp.StatusCode != http.StatusOK {
|
||||||
|
return fmt.Errorf("metrics request failed with status %d: %s", resp.StatusCode, string(body))
|
||||||
|
}
|
||||||
|
|
||||||
|
return nil
|
||||||
|
}
|
||||||
|
|
||||||
|
// generateDeviceFingerprint creates a unique device identifier
|
||||||
|
func (c *Collector) generateDeviceFingerprint(metrics *types.SystemMetrics) string {
|
||||||
|
fingerprint := fmt.Sprintf("%s-%s-%s", metrics.Hostname, metrics.Platform, metrics.KernelVersion)
|
||||||
|
hasher := sha256.New()
|
||||||
|
hasher.Write([]byte(fingerprint))
|
||||||
|
return fmt.Sprintf("%x", hasher.Sum(nil))[:16]
|
||||||
|
}
|
||||||
529
internal/server/investigation_server.go
Normal file
529
internal/server/investigation_server.go
Normal file
@@ -0,0 +1,529 @@
|
|||||||
|
package server
|
||||||
|
|
||||||
|
import (
|
||||||
|
"encoding/json"
|
||||||
|
"fmt"
|
||||||
|
"net/http"
|
||||||
|
"os"
|
||||||
|
"strings"
|
||||||
|
"time"
|
||||||
|
|
||||||
|
"nannyagentv2/internal/auth"
|
||||||
|
"nannyagentv2/internal/logging"
|
||||||
|
"nannyagentv2/internal/metrics"
|
||||||
|
"nannyagentv2/internal/types"
|
||||||
|
|
||||||
|
"github.com/sashabaranov/go-openai"
|
||||||
|
)
|
||||||
|
|
||||||
|
// InvestigationRequest represents a request from Supabase to start an investigation
|
||||||
|
type InvestigationRequest struct {
|
||||||
|
InvestigationID string `json:"investigation_id"`
|
||||||
|
ApplicationGroup string `json:"application_group"`
|
||||||
|
Issue string `json:"issue"`
|
||||||
|
Context map[string]string `json:"context"`
|
||||||
|
Priority string `json:"priority"`
|
||||||
|
InitiatedBy string `json:"initiated_by"`
|
||||||
|
}
|
||||||
|
|
||||||
|
// InvestigationResponse represents the agent's response to an investigation
|
||||||
|
type InvestigationResponse struct {
|
||||||
|
AgentID string `json:"agent_id"`
|
||||||
|
InvestigationID string `json:"investigation_id"`
|
||||||
|
Status string `json:"status"`
|
||||||
|
Commands []types.CommandResult `json:"commands,omitempty"`
|
||||||
|
AIResponse string `json:"ai_response,omitempty"`
|
||||||
|
EpisodeID string `json:"episode_id,omitempty"`
|
||||||
|
Timestamp time.Time `json:"timestamp"`
|
||||||
|
Error string `json:"error,omitempty"`
|
||||||
|
}
|
||||||
|
|
||||||
|
// InvestigationServer handles reverse investigation requests from Supabase
|
||||||
|
type InvestigationServer struct {
|
||||||
|
agent types.DiagnosticAgent // Original agent for direct user interactions
|
||||||
|
applicationAgent types.DiagnosticAgent // Separate agent for application-initiated investigations
|
||||||
|
port string
|
||||||
|
agentID string
|
||||||
|
metricsCollector *metrics.Collector
|
||||||
|
authManager *auth.AuthManager
|
||||||
|
startTime time.Time
|
||||||
|
supabaseURL string
|
||||||
|
}
|
||||||
|
|
||||||
|
// NewInvestigationServer creates a new investigation server
|
||||||
|
func NewInvestigationServer(agent types.DiagnosticAgent, authManager *auth.AuthManager) *InvestigationServer {
|
||||||
|
port := os.Getenv("AGENT_PORT")
|
||||||
|
if port == "" {
|
||||||
|
port = "1234"
|
||||||
|
}
|
||||||
|
|
||||||
|
// Get agent ID from authentication system
|
||||||
|
var agentID string
|
||||||
|
if authManager != nil {
|
||||||
|
if id, err := authManager.GetCurrentAgentID(); err == nil {
|
||||||
|
agentID = id
|
||||||
|
|
||||||
|
} else {
|
||||||
|
logging.Error("Failed to get agent ID from auth manager: %v", err)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// Fallback to environment variable or generate one if auth fails
|
||||||
|
if agentID == "" {
|
||||||
|
agentID = os.Getenv("AGENT_ID")
|
||||||
|
if agentID == "" {
|
||||||
|
agentID = fmt.Sprintf("agent-%d", time.Now().Unix())
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// Create metrics collector
|
||||||
|
metricsCollector := metrics.NewCollector("v2.0.0")
|
||||||
|
|
||||||
|
// TODO: Fix application agent creation - use main agent for now
|
||||||
|
// Create a separate agent for application-initiated investigations
|
||||||
|
// applicationAgent := NewLinuxDiagnosticAgent()
|
||||||
|
// Override the model to use the application-specific function
|
||||||
|
// applicationAgent.model = "tensorzero::function_name::diagnose_and_heal_application"
|
||||||
|
|
||||||
|
return &InvestigationServer{
|
||||||
|
agent: agent,
|
||||||
|
applicationAgent: agent, // Use same agent for now
|
||||||
|
port: port,
|
||||||
|
agentID: agentID,
|
||||||
|
metricsCollector: metricsCollector,
|
||||||
|
authManager: authManager,
|
||||||
|
startTime: time.Now(),
|
||||||
|
supabaseURL: os.Getenv("SUPABASE_PROJECT_URL"),
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// DiagnoseIssueForApplication handles diagnostic requests initiated from application/portal
|
||||||
|
func (s *InvestigationServer) DiagnoseIssueForApplication(issue, episodeID string) error {
|
||||||
|
// Set the episode ID on the application agent for continuity
|
||||||
|
// TODO: Fix episode ID handling with interface
|
||||||
|
// s.applicationAgent.episodeID = episodeID
|
||||||
|
return s.applicationAgent.DiagnoseIssue(issue)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Start starts the HTTP server and realtime polling for investigation requests
|
||||||
|
func (s *InvestigationServer) Start() error {
|
||||||
|
mux := http.NewServeMux()
|
||||||
|
|
||||||
|
// Health check endpoint
|
||||||
|
mux.HandleFunc("/health", s.handleHealth)
|
||||||
|
|
||||||
|
// Investigation endpoint
|
||||||
|
mux.HandleFunc("/investigate", s.handleInvestigation)
|
||||||
|
|
||||||
|
// Agent status endpoint
|
||||||
|
mux.HandleFunc("/status", s.handleStatus)
|
||||||
|
|
||||||
|
// Start realtime polling for backend-initiated investigations
|
||||||
|
if s.supabaseURL != "" && s.authManager != nil {
|
||||||
|
go s.startRealtimePolling()
|
||||||
|
logging.Info("Realtime investigation polling enabled")
|
||||||
|
} else {
|
||||||
|
logging.Warning("Realtime investigation polling disabled (missing Supabase config or auth)")
|
||||||
|
}
|
||||||
|
|
||||||
|
server := &http.Server{
|
||||||
|
Addr: ":" + s.port,
|
||||||
|
Handler: mux,
|
||||||
|
ReadTimeout: 30 * time.Second,
|
||||||
|
WriteTimeout: 30 * time.Second,
|
||||||
|
}
|
||||||
|
|
||||||
|
logging.Info("Investigation server started on port %s (Agent ID: %s)", s.port, s.agentID)
|
||||||
|
return server.ListenAndServe()
|
||||||
|
}
|
||||||
|
|
||||||
|
// handleHealth responds to health check requests
|
||||||
|
func (s *InvestigationServer) handleHealth(w http.ResponseWriter, r *http.Request) {
|
||||||
|
if r.Method != http.MethodGet {
|
||||||
|
http.Error(w, "Method not allowed", http.StatusMethodNotAllowed)
|
||||||
|
return
|
||||||
|
}
|
||||||
|
|
||||||
|
response := map[string]interface{}{
|
||||||
|
"status": "healthy",
|
||||||
|
"agent_id": s.agentID,
|
||||||
|
"timestamp": time.Now(),
|
||||||
|
"version": "v2.0.0",
|
||||||
|
}
|
||||||
|
|
||||||
|
w.Header().Set("Content-Type", "application/json")
|
||||||
|
json.NewEncoder(w).Encode(response)
|
||||||
|
}
|
||||||
|
|
||||||
|
// handleStatus responds with agent status and capabilities
|
||||||
|
func (s *InvestigationServer) handleStatus(w http.ResponseWriter, r *http.Request) {
|
||||||
|
if r.Method != http.MethodGet {
|
||||||
|
http.Error(w, "Method not allowed", http.StatusMethodNotAllowed)
|
||||||
|
return
|
||||||
|
}
|
||||||
|
|
||||||
|
// Collect current system metrics
|
||||||
|
systemMetrics, err := s.metricsCollector.GatherSystemMetrics()
|
||||||
|
if err != nil {
|
||||||
|
http.Error(w, fmt.Sprintf("Failed to collect metrics: %v", err), http.StatusInternalServerError)
|
||||||
|
return
|
||||||
|
}
|
||||||
|
|
||||||
|
// Convert to metrics request format for consistent data structure
|
||||||
|
metricsReq := s.metricsCollector.CreateMetricsRequest(s.agentID, systemMetrics)
|
||||||
|
|
||||||
|
response := map[string]interface{}{
|
||||||
|
"agent_id": s.agentID,
|
||||||
|
"status": "ready",
|
||||||
|
"capabilities": []string{"system_diagnostics", "ebpf_monitoring", "command_execution", "ai_analysis"},
|
||||||
|
"system_info": map[string]interface{}{
|
||||||
|
"os": fmt.Sprintf("%s %s", metricsReq.OSInfo["platform"], metricsReq.OSInfo["platform_version"]),
|
||||||
|
"kernel": metricsReq.KernelVersion,
|
||||||
|
"architecture": metricsReq.OSInfo["kernel_arch"],
|
||||||
|
"cpu_cores": metricsReq.OSInfo["cpu_cores"],
|
||||||
|
"memory": metricsReq.MemoryUsage,
|
||||||
|
"private_ips": metricsReq.IPAddress,
|
||||||
|
"load_average": fmt.Sprintf("%.2f, %.2f, %.2f",
|
||||||
|
metricsReq.LoadAverages["load1"],
|
||||||
|
metricsReq.LoadAverages["load5"],
|
||||||
|
metricsReq.LoadAverages["load15"]),
|
||||||
|
"disk_usage": fmt.Sprintf("Root: %.0fG/%.0fG (%.0f%% used)",
|
||||||
|
float64(metricsReq.FilesystemInfo[0].Used)/1024/1024/1024,
|
||||||
|
float64(metricsReq.FilesystemInfo[0].Total)/1024/1024/1024,
|
||||||
|
metricsReq.DiskUsage),
|
||||||
|
},
|
||||||
|
"uptime": time.Since(s.startTime),
|
||||||
|
"last_contact": time.Now(),
|
||||||
|
}
|
||||||
|
|
||||||
|
w.Header().Set("Content-Type", "application/json")
|
||||||
|
json.NewEncoder(w).Encode(response)
|
||||||
|
}
|
||||||
|
|
||||||
|
// sendCommandResultsToTensorZero sends command results back to TensorZero and continues conversation
|
||||||
|
func (s *InvestigationServer) sendCommandResultsToTensorZero(diagnosticResp types.DiagnosticResponse, commandResults []types.CommandResult) (interface{}, error) {
|
||||||
|
// Build conversation history like in agent.go
|
||||||
|
messages := []openai.ChatCompletionMessage{
|
||||||
|
// Add the original diagnostic response as assistant message
|
||||||
|
{
|
||||||
|
Role: openai.ChatMessageRoleAssistant,
|
||||||
|
Content: fmt.Sprintf(`{"response_type":"diagnostic","reasoning":"%s","commands":%s}`,
|
||||||
|
diagnosticResp.Reasoning,
|
||||||
|
mustMarshalJSON(diagnosticResp.Commands)),
|
||||||
|
},
|
||||||
|
}
|
||||||
|
|
||||||
|
// Add command results as user message (same as agent.go does)
|
||||||
|
resultsJSON, err := json.MarshalIndent(commandResults, "", " ")
|
||||||
|
if err != nil {
|
||||||
|
return nil, fmt.Errorf("failed to marshal command results: %w", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
messages = append(messages, openai.ChatCompletionMessage{
|
||||||
|
Role: openai.ChatMessageRoleUser,
|
||||||
|
Content: string(resultsJSON),
|
||||||
|
})
|
||||||
|
|
||||||
|
// Send to TensorZero via application agent's sendRequest method
|
||||||
|
logging.Debug("Sending command results to TensorZero for analysis")
|
||||||
|
response, err := s.applicationAgent.SendRequest(messages)
|
||||||
|
if err != nil {
|
||||||
|
return nil, fmt.Errorf("failed to send request to TensorZero: %w", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
if len(response.Choices) == 0 {
|
||||||
|
return nil, fmt.Errorf("no choices in TensorZero response")
|
||||||
|
}
|
||||||
|
|
||||||
|
content := response.Choices[0].Message.Content
|
||||||
|
logging.Debug("TensorZero continued analysis: %s", content)
|
||||||
|
|
||||||
|
// Try to parse the response to determine if it's diagnostic or resolution
|
||||||
|
var diagnosticNextResp types.DiagnosticResponse
|
||||||
|
var resolutionResp types.ResolutionResponse
|
||||||
|
|
||||||
|
// Check if it's another diagnostic response
|
||||||
|
if err := json.Unmarshal([]byte(content), &diagnosticNextResp); err == nil && diagnosticNextResp.ResponseType == "diagnostic" {
|
||||||
|
logging.Debug("TensorZero requests %d more commands", len(diagnosticNextResp.Commands))
|
||||||
|
return map[string]interface{}{
|
||||||
|
"type": "diagnostic",
|
||||||
|
"response": diagnosticNextResp,
|
||||||
|
"raw": content,
|
||||||
|
}, nil
|
||||||
|
}
|
||||||
|
|
||||||
|
// Check if it's a resolution response
|
||||||
|
if err := json.Unmarshal([]byte(content), &resolutionResp); err == nil && resolutionResp.ResponseType == "resolution" {
|
||||||
|
|
||||||
|
return map[string]interface{}{
|
||||||
|
"type": "resolution",
|
||||||
|
"response": resolutionResp,
|
||||||
|
"raw": content,
|
||||||
|
}, nil
|
||||||
|
}
|
||||||
|
|
||||||
|
// Return raw response if we can't parse it
|
||||||
|
return map[string]interface{}{
|
||||||
|
"type": "unknown",
|
||||||
|
"raw": content,
|
||||||
|
}, nil
|
||||||
|
}
|
||||||
|
|
||||||
|
// Helper function to marshal JSON without errors
|
||||||
|
func mustMarshalJSON(v interface{}) string {
|
||||||
|
data, _ := json.Marshal(v)
|
||||||
|
return string(data)
|
||||||
|
}
|
||||||
|
|
||||||
|
// processInvestigation handles the actual investigation using TensorZero
|
||||||
|
// This endpoint receives either:
|
||||||
|
// 1. DiagnosticResponse - Commands and eBPF programs to execute
|
||||||
|
// 2. ResolutionResponse - Final resolution (no execution needed)
|
||||||
|
func (s *InvestigationServer) handleInvestigation(w http.ResponseWriter, r *http.Request) {
|
||||||
|
if r.Method != http.MethodPost {
|
||||||
|
http.Error(w, "Method not allowed - only POST accepted", http.StatusMethodNotAllowed)
|
||||||
|
return
|
||||||
|
}
|
||||||
|
|
||||||
|
// Parse the request body to determine what type of response this is
|
||||||
|
var requestBody map[string]interface{}
|
||||||
|
if err := json.NewDecoder(r.Body).Decode(&requestBody); err != nil {
|
||||||
|
http.Error(w, fmt.Sprintf("Invalid JSON: %v", err), http.StatusBadRequest)
|
||||||
|
return
|
||||||
|
}
|
||||||
|
|
||||||
|
// Check the response_type field to determine how to handle this
|
||||||
|
responseType, ok := requestBody["response_type"].(string)
|
||||||
|
if !ok {
|
||||||
|
http.Error(w, "Missing or invalid response_type field", http.StatusBadRequest)
|
||||||
|
return
|
||||||
|
}
|
||||||
|
|
||||||
|
logging.Debug("Received investigation payload with response_type: %s", responseType)
|
||||||
|
|
||||||
|
switch responseType {
|
||||||
|
case "diagnostic":
|
||||||
|
// This is a DiagnosticResponse with commands to execute
|
||||||
|
response := s.handleDiagnosticExecution(requestBody)
|
||||||
|
w.Header().Set("Content-Type", "application/json")
|
||||||
|
json.NewEncoder(w).Encode(response)
|
||||||
|
|
||||||
|
case "resolution":
|
||||||
|
// This is a ResolutionResponse - final result, just acknowledge
|
||||||
|
fmt.Printf("📋 Received final resolution from backend\n")
|
||||||
|
w.Header().Set("Content-Type", "application/json")
|
||||||
|
json.NewEncoder(w).Encode(map[string]interface{}{
|
||||||
|
"success": true,
|
||||||
|
"message": "Resolution received and acknowledged",
|
||||||
|
"agent_id": s.agentID,
|
||||||
|
})
|
||||||
|
|
||||||
|
default:
|
||||||
|
http.Error(w, fmt.Sprintf("Unknown response_type: %s", responseType), http.StatusBadRequest)
|
||||||
|
return
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// handleDiagnosticExecution executes commands from a DiagnosticResponse
|
||||||
|
func (s *InvestigationServer) handleDiagnosticExecution(requestBody map[string]interface{}) map[string]interface{} {
|
||||||
|
// Parse as DiagnosticResponse
|
||||||
|
var diagnosticResp types.DiagnosticResponse
|
||||||
|
|
||||||
|
// Convert the map back to JSON and then parse it properly
|
||||||
|
jsonData, err := json.Marshal(requestBody)
|
||||||
|
if err != nil {
|
||||||
|
return map[string]interface{}{
|
||||||
|
"success": false,
|
||||||
|
"error": fmt.Sprintf("Failed to re-marshal request: %v", err),
|
||||||
|
"agent_id": s.agentID,
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
if err := json.Unmarshal(jsonData, &diagnosticResp); err != nil {
|
||||||
|
return map[string]interface{}{
|
||||||
|
"success": false,
|
||||||
|
"error": fmt.Sprintf("Failed to parse DiagnosticResponse: %v", err),
|
||||||
|
"agent_id": s.agentID,
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
fmt.Printf("📋 Executing %d commands from backend\n", len(diagnosticResp.Commands))
|
||||||
|
|
||||||
|
// Execute all commands
|
||||||
|
commandResults := make([]types.CommandResult, 0, len(diagnosticResp.Commands))
|
||||||
|
|
||||||
|
for _, cmd := range diagnosticResp.Commands {
|
||||||
|
fmt.Printf("⚙️ Executing command '%s': %s\n", cmd.ID, cmd.Command)
|
||||||
|
|
||||||
|
// Use the agent's executor to run the command
|
||||||
|
result := s.agent.ExecuteCommand(cmd)
|
||||||
|
commandResults = append(commandResults, result)
|
||||||
|
|
||||||
|
if result.Error != "" {
|
||||||
|
fmt.Printf("⚠️ Command '%s' had error: %s\n", cmd.ID, result.Error)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// Send command results back to TensorZero for continued analysis
|
||||||
|
fmt.Printf("🔄 Sending %d command results back to TensorZero for continued analysis\n", len(commandResults))
|
||||||
|
|
||||||
|
nextResponse, err := s.sendCommandResultsToTensorZero(diagnosticResp, commandResults)
|
||||||
|
if err != nil {
|
||||||
|
return map[string]interface{}{
|
||||||
|
"success": false,
|
||||||
|
"error": fmt.Sprintf("Failed to continue TensorZero conversation: %v", err),
|
||||||
|
"agent_id": s.agentID,
|
||||||
|
"command_results": commandResults, // Still return the results
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// Return both the command results and the next response from TensorZero
|
||||||
|
return map[string]interface{}{
|
||||||
|
"success": true,
|
||||||
|
"agent_id": s.agentID,
|
||||||
|
"command_results": commandResults,
|
||||||
|
"commands_executed": len(commandResults),
|
||||||
|
"next_response": nextResponse,
|
||||||
|
"timestamp": time.Now().Format(time.RFC3339),
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// PendingInvestigation represents a pending investigation from the database
|
||||||
|
type PendingInvestigation struct {
|
||||||
|
ID string `json:"id"`
|
||||||
|
InvestigationID string `json:"investigation_id"`
|
||||||
|
AgentID string `json:"agent_id"`
|
||||||
|
DiagnosticPayload map[string]interface{} `json:"diagnostic_payload"`
|
||||||
|
EpisodeID *string `json:"episode_id"`
|
||||||
|
Status string `json:"status"`
|
||||||
|
CreatedAt time.Time `json:"created_at"`
|
||||||
|
}
|
||||||
|
|
||||||
|
// startRealtimePolling begins polling for pending investigations
|
||||||
|
func (s *InvestigationServer) startRealtimePolling() {
|
||||||
|
fmt.Printf("🔄 Starting realtime investigation polling for agent %s\n", s.agentID)
|
||||||
|
|
||||||
|
ticker := time.NewTicker(5 * time.Second) // Poll every 5 seconds
|
||||||
|
defer ticker.Stop()
|
||||||
|
|
||||||
|
for range ticker.C {
|
||||||
|
s.checkForPendingInvestigations()
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// checkForPendingInvestigations checks for new pending investigations
|
||||||
|
func (s *InvestigationServer) checkForPendingInvestigations() {
|
||||||
|
url := fmt.Sprintf("%s/rest/v1/pending_investigations?agent_id=eq.%s&status=eq.pending&order=created_at.desc",
|
||||||
|
s.supabaseURL, s.agentID)
|
||||||
|
|
||||||
|
req, err := http.NewRequest("GET", url, nil)
|
||||||
|
if err != nil {
|
||||||
|
return // Silent fail for polling
|
||||||
|
}
|
||||||
|
|
||||||
|
// Get token from auth manager
|
||||||
|
authToken, err := s.authManager.LoadToken()
|
||||||
|
if err != nil {
|
||||||
|
return // Silent fail for polling
|
||||||
|
}
|
||||||
|
|
||||||
|
req.Header.Set("Authorization", fmt.Sprintf("Bearer %s", authToken.AccessToken))
|
||||||
|
req.Header.Set("Accept", "application/json")
|
||||||
|
|
||||||
|
client := &http.Client{Timeout: 10 * time.Second}
|
||||||
|
resp, err := client.Do(req)
|
||||||
|
if err != nil {
|
||||||
|
return // Silent fail for polling
|
||||||
|
}
|
||||||
|
defer resp.Body.Close()
|
||||||
|
|
||||||
|
if resp.StatusCode != 200 {
|
||||||
|
return // Silent fail for polling
|
||||||
|
}
|
||||||
|
|
||||||
|
var investigations []PendingInvestigation
|
||||||
|
err = json.NewDecoder(resp.Body).Decode(&investigations)
|
||||||
|
if err != nil {
|
||||||
|
return // Silent fail for polling
|
||||||
|
}
|
||||||
|
|
||||||
|
for _, investigation := range investigations {
|
||||||
|
fmt.Printf("🔍 Found pending investigation: %s\n", investigation.ID)
|
||||||
|
go s.handlePendingInvestigation(investigation)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// handlePendingInvestigation processes a single pending investigation
|
||||||
|
func (s *InvestigationServer) handlePendingInvestigation(investigation PendingInvestigation) {
|
||||||
|
fmt.Printf("🚀 Processing realtime investigation %s\n", investigation.InvestigationID)
|
||||||
|
|
||||||
|
// Mark as executing
|
||||||
|
err := s.updateInvestigationStatus(investigation.ID, "executing", nil, nil)
|
||||||
|
if err != nil {
|
||||||
|
fmt.Printf("❌ Failed to mark investigation as executing: %v\n", err)
|
||||||
|
return
|
||||||
|
}
|
||||||
|
|
||||||
|
// Execute diagnostic commands using existing handleDiagnosticExecution method
|
||||||
|
results := s.handleDiagnosticExecution(investigation.DiagnosticPayload)
|
||||||
|
|
||||||
|
// Mark as completed with results
|
||||||
|
err = s.updateInvestigationStatus(investigation.ID, "completed", results, nil)
|
||||||
|
if err != nil {
|
||||||
|
fmt.Printf("❌ Failed to mark investigation as completed: %v\n", err)
|
||||||
|
return
|
||||||
|
}
|
||||||
|
|
||||||
|
}
|
||||||
|
|
||||||
|
// updateInvestigationStatus updates the status of a pending investigation
|
||||||
|
func (s *InvestigationServer) updateInvestigationStatus(id, status string, results map[string]interface{}, errorMsg *string) error {
|
||||||
|
updateData := map[string]interface{}{
|
||||||
|
"status": status,
|
||||||
|
}
|
||||||
|
|
||||||
|
if status == "executing" {
|
||||||
|
updateData["started_at"] = time.Now().UTC().Format(time.RFC3339)
|
||||||
|
} else if status == "completed" {
|
||||||
|
updateData["completed_at"] = time.Now().UTC().Format(time.RFC3339)
|
||||||
|
if results != nil {
|
||||||
|
updateData["command_results"] = results
|
||||||
|
}
|
||||||
|
} else if status == "failed" && errorMsg != nil {
|
||||||
|
updateData["error_message"] = *errorMsg
|
||||||
|
updateData["completed_at"] = time.Now().UTC().Format(time.RFC3339)
|
||||||
|
}
|
||||||
|
|
||||||
|
jsonData, err := json.Marshal(updateData)
|
||||||
|
if err != nil {
|
||||||
|
return fmt.Errorf("failed to marshal update data: %v", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
url := fmt.Sprintf("%s/rest/v1/pending_investigations?id=eq.%s", s.supabaseURL, id)
|
||||||
|
req, err := http.NewRequest("PATCH", url, strings.NewReader(string(jsonData)))
|
||||||
|
if err != nil {
|
||||||
|
return fmt.Errorf("failed to create request: %v", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Get token from auth manager
|
||||||
|
authToken, err := s.authManager.LoadToken()
|
||||||
|
if err != nil {
|
||||||
|
return fmt.Errorf("failed to load auth token: %v", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
req.Header.Set("Authorization", fmt.Sprintf("Bearer %s", authToken.AccessToken))
|
||||||
|
req.Header.Set("Content-Type", "application/json")
|
||||||
|
|
||||||
|
client := &http.Client{Timeout: 10 * time.Second}
|
||||||
|
resp, err := client.Do(req)
|
||||||
|
if err != nil {
|
||||||
|
return fmt.Errorf("failed to update investigation: %v", err)
|
||||||
|
}
|
||||||
|
defer resp.Body.Close()
|
||||||
|
|
||||||
|
if resp.StatusCode != 200 && resp.StatusCode != 204 {
|
||||||
|
return fmt.Errorf("supabase update error: %d", resp.StatusCode)
|
||||||
|
}
|
||||||
|
|
||||||
|
return nil
|
||||||
|
}
|
||||||
@@ -1,4 +1,4 @@
|
|||||||
package main
|
package system
|
||||||
|
|
||||||
import (
|
import (
|
||||||
"fmt"
|
"fmt"
|
||||||
@@ -6,6 +6,9 @@ import (
|
|||||||
"runtime"
|
"runtime"
|
||||||
"strings"
|
"strings"
|
||||||
"time"
|
"time"
|
||||||
|
|
||||||
|
"nannyagentv2/internal/executor"
|
||||||
|
"nannyagentv2/internal/types"
|
||||||
)
|
)
|
||||||
|
|
||||||
// SystemInfo represents basic system information
|
// SystemInfo represents basic system information
|
||||||
@@ -25,42 +28,42 @@ type SystemInfo struct {
|
|||||||
// GatherSystemInfo collects basic system information
|
// GatherSystemInfo collects basic system information
|
||||||
func GatherSystemInfo() *SystemInfo {
|
func GatherSystemInfo() *SystemInfo {
|
||||||
info := &SystemInfo{}
|
info := &SystemInfo{}
|
||||||
executor := NewCommandExecutor(5 * time.Second)
|
executor := executor.NewCommandExecutor(5 * time.Second)
|
||||||
|
|
||||||
// Basic system info
|
// Basic system info
|
||||||
if result := executor.Execute(Command{ID: "hostname", Command: "hostname"}); result.ExitCode == 0 {
|
if result := executor.Execute(types.Command{ID: "hostname", Command: "hostname"}); result.ExitCode == 0 {
|
||||||
info.Hostname = strings.TrimSpace(result.Output)
|
info.Hostname = strings.TrimSpace(result.Output)
|
||||||
}
|
}
|
||||||
|
|
||||||
if result := executor.Execute(Command{ID: "os", Command: "lsb_release -d 2>/dev/null | cut -f2 || cat /etc/os-release | grep PRETTY_NAME | cut -d'=' -f2 | tr -d '\"'"}); result.ExitCode == 0 {
|
if result := executor.Execute(types.Command{ID: "os", Command: "lsb_release -d 2>/dev/null | cut -f2 || cat /etc/os-release | grep PRETTY_NAME | cut -d'=' -f2 | tr -d '\"'"}); result.ExitCode == 0 {
|
||||||
info.OS = strings.TrimSpace(result.Output)
|
info.OS = strings.TrimSpace(result.Output)
|
||||||
}
|
}
|
||||||
|
|
||||||
if result := executor.Execute(Command{ID: "kernel", Command: "uname -r"}); result.ExitCode == 0 {
|
if result := executor.Execute(types.Command{ID: "kernel", Command: "uname -r"}); result.ExitCode == 0 {
|
||||||
info.Kernel = strings.TrimSpace(result.Output)
|
info.Kernel = strings.TrimSpace(result.Output)
|
||||||
}
|
}
|
||||||
|
|
||||||
if result := executor.Execute(Command{ID: "arch", Command: "uname -m"}); result.ExitCode == 0 {
|
if result := executor.Execute(types.Command{ID: "arch", Command: "uname -m"}); result.ExitCode == 0 {
|
||||||
info.Architecture = strings.TrimSpace(result.Output)
|
info.Architecture = strings.TrimSpace(result.Output)
|
||||||
}
|
}
|
||||||
|
|
||||||
if result := executor.Execute(Command{ID: "cores", Command: "nproc"}); result.ExitCode == 0 {
|
if result := executor.Execute(types.Command{ID: "cores", Command: "nproc"}); result.ExitCode == 0 {
|
||||||
info.CPUCores = strings.TrimSpace(result.Output)
|
info.CPUCores = strings.TrimSpace(result.Output)
|
||||||
}
|
}
|
||||||
|
|
||||||
if result := executor.Execute(Command{ID: "memory", Command: "free -h | grep Mem | awk '{print $2}'"}); result.ExitCode == 0 {
|
if result := executor.Execute(types.Command{ID: "memory", Command: "free -h | grep Mem | awk '{print $2}'"}); result.ExitCode == 0 {
|
||||||
info.Memory = strings.TrimSpace(result.Output)
|
info.Memory = strings.TrimSpace(result.Output)
|
||||||
}
|
}
|
||||||
|
|
||||||
if result := executor.Execute(Command{ID: "uptime", Command: "uptime -p"}); result.ExitCode == 0 {
|
if result := executor.Execute(types.Command{ID: "uptime", Command: "uptime -p"}); result.ExitCode == 0 {
|
||||||
info.Uptime = strings.TrimSpace(result.Output)
|
info.Uptime = strings.TrimSpace(result.Output)
|
||||||
}
|
}
|
||||||
|
|
||||||
if result := executor.Execute(Command{ID: "load", Command: "uptime | awk -F'load average:' '{print $2}' | xargs"}); result.ExitCode == 0 {
|
if result := executor.Execute(types.Command{ID: "load", Command: "uptime | awk -F'load average:' '{print $2}' | xargs"}); result.ExitCode == 0 {
|
||||||
info.LoadAverage = strings.TrimSpace(result.Output)
|
info.LoadAverage = strings.TrimSpace(result.Output)
|
||||||
}
|
}
|
||||||
|
|
||||||
if result := executor.Execute(Command{ID: "disk", Command: "df -h / | tail -1 | awk '{print \"Root: \" $3 \"/\" $2 \" (\" $5 \" used)\"}'"}); result.ExitCode == 0 {
|
if result := executor.Execute(types.Command{ID: "disk", Command: "df -h / | tail -1 | awk '{print \"Root: \" $3 \"/\" $2 \" (\" $5 \" used)\"}'"}); result.ExitCode == 0 {
|
||||||
info.DiskUsage = strings.TrimSpace(result.Output)
|
info.DiskUsage = strings.TrimSpace(result.Output)
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -152,50 +155,3 @@ ISSUE DESCRIPTION:`,
|
|||||||
info.PrivateIPs,
|
info.PrivateIPs,
|
||||||
runtime.Version())
|
runtime.Version())
|
||||||
}
|
}
|
||||||
|
|
||||||
// FormatSystemInfoWithEBPFForPrompt formats system information including eBPF capabilities
|
|
||||||
func FormatSystemInfoWithEBPFForPrompt(info *SystemInfo, ebpfManager EBPFManagerInterface) string {
|
|
||||||
baseInfo := FormatSystemInfoForPrompt(info)
|
|
||||||
|
|
||||||
if ebpfManager == nil {
|
|
||||||
return baseInfo + "\neBPF CAPABILITIES: Not available\n"
|
|
||||||
}
|
|
||||||
|
|
||||||
capabilities := ebpfManager.GetCapabilities()
|
|
||||||
summary := ebpfManager.GetSummary()
|
|
||||||
|
|
||||||
ebpfInfo := fmt.Sprintf(`
|
|
||||||
eBPF MONITORING CAPABILITIES:
|
|
||||||
- System Call Tracing: %v
|
|
||||||
- Network Activity Tracing: %v
|
|
||||||
- Process Monitoring: %v
|
|
||||||
- File System Monitoring: %v
|
|
||||||
- Performance Monitoring: %v
|
|
||||||
- Security Event Monitoring: %v
|
|
||||||
|
|
||||||
eBPF INTEGRATION GUIDE:
|
|
||||||
To request eBPF monitoring during diagnosis, include these fields in your JSON response:
|
|
||||||
{
|
|
||||||
"response_type": "diagnostic",
|
|
||||||
"reasoning": "explanation of why eBPF monitoring is needed",
|
|
||||||
"commands": [regular diagnostic commands],
|
|
||||||
"ebpf_capabilities": ["syscall_trace", "network_trace", "process_trace"],
|
|
||||||
"ebpf_duration_seconds": 15,
|
|
||||||
"ebpf_filters": {"pid": "process_id", "comm": "process_name", "path": "/specific/path"}
|
|
||||||
}
|
|
||||||
|
|
||||||
Available eBPF capabilities: %v
|
|
||||||
eBPF Status: %v
|
|
||||||
|
|
||||||
`,
|
|
||||||
capabilities["tracepoint"],
|
|
||||||
capabilities["kprobe"],
|
|
||||||
capabilities["kernel_support"],
|
|
||||||
capabilities["tracepoint"],
|
|
||||||
capabilities["kernel_support"],
|
|
||||||
capabilities["bpftrace_available"],
|
|
||||||
capabilities,
|
|
||||||
summary)
|
|
||||||
|
|
||||||
return baseInfo + ebpfInfo
|
|
||||||
}
|
|
||||||
290
internal/types/types.go
Normal file
290
internal/types/types.go
Normal file
@@ -0,0 +1,290 @@
|
|||||||
|
package types
|
||||||
|
|
||||||
|
import (
|
||||||
|
"time"
|
||||||
|
|
||||||
|
"nannyagentv2/internal/ebpf"
|
||||||
|
|
||||||
|
"github.com/sashabaranov/go-openai"
|
||||||
|
)
|
||||||
|
|
||||||
|
// SystemMetrics represents comprehensive system performance metrics
|
||||||
|
type SystemMetrics struct {
|
||||||
|
// System Information
|
||||||
|
Hostname string `json:"hostname"`
|
||||||
|
Platform string `json:"platform"`
|
||||||
|
PlatformFamily string `json:"platform_family"`
|
||||||
|
PlatformVersion string `json:"platform_version"`
|
||||||
|
KernelVersion string `json:"kernel_version"`
|
||||||
|
KernelArch string `json:"kernel_arch"`
|
||||||
|
|
||||||
|
// CPU Metrics
|
||||||
|
CPUUsage float64 `json:"cpu_usage"`
|
||||||
|
CPUCores int `json:"cpu_cores"`
|
||||||
|
CPUModel string `json:"cpu_model"`
|
||||||
|
|
||||||
|
// Memory Metrics
|
||||||
|
MemoryUsage float64 `json:"memory_usage"`
|
||||||
|
MemoryTotal uint64 `json:"memory_total"`
|
||||||
|
MemoryUsed uint64 `json:"memory_used"`
|
||||||
|
MemoryFree uint64 `json:"memory_free"`
|
||||||
|
MemoryAvailable uint64 `json:"memory_available"`
|
||||||
|
SwapTotal uint64 `json:"swap_total"`
|
||||||
|
SwapUsed uint64 `json:"swap_used"`
|
||||||
|
SwapFree uint64 `json:"swap_free"`
|
||||||
|
|
||||||
|
// Disk Metrics
|
||||||
|
DiskUsage float64 `json:"disk_usage"`
|
||||||
|
DiskTotal uint64 `json:"disk_total"`
|
||||||
|
DiskUsed uint64 `json:"disk_used"`
|
||||||
|
DiskFree uint64 `json:"disk_free"`
|
||||||
|
|
||||||
|
// Network Metrics
|
||||||
|
NetworkInKbps float64 `json:"network_in_kbps"`
|
||||||
|
NetworkOutKbps float64 `json:"network_out_kbps"`
|
||||||
|
NetworkInBytes uint64 `json:"network_in_bytes"`
|
||||||
|
NetworkOutBytes uint64 `json:"network_out_bytes"`
|
||||||
|
|
||||||
|
// System Load
|
||||||
|
LoadAvg1 float64 `json:"load_avg_1"`
|
||||||
|
LoadAvg5 float64 `json:"load_avg_5"`
|
||||||
|
LoadAvg15 float64 `json:"load_avg_15"`
|
||||||
|
|
||||||
|
// Process Information
|
||||||
|
ProcessCount int `json:"process_count"`
|
||||||
|
|
||||||
|
// Network Information
|
||||||
|
IPAddress string `json:"ip_address"`
|
||||||
|
Location string `json:"location"`
|
||||||
|
|
||||||
|
// Filesystem Information
|
||||||
|
FilesystemInfo []FilesystemInfo `json:"filesystem_info"`
|
||||||
|
BlockDevices []BlockDevice `json:"block_devices"`
|
||||||
|
|
||||||
|
// Timestamp
|
||||||
|
Timestamp time.Time `json:"timestamp"`
|
||||||
|
}
|
||||||
|
|
||||||
|
// FilesystemInfo represents filesystem information
|
||||||
|
type FilesystemInfo struct {
|
||||||
|
Device string `json:"device"`
|
||||||
|
Mountpoint string `json:"mountpoint"`
|
||||||
|
Type string `json:"type"`
|
||||||
|
Fstype string `json:"fstype"`
|
||||||
|
Total uint64 `json:"total"`
|
||||||
|
Used uint64 `json:"used"`
|
||||||
|
Free uint64 `json:"free"`
|
||||||
|
Usage float64 `json:"usage"`
|
||||||
|
UsagePercent float64 `json:"usage_percent"`
|
||||||
|
}
|
||||||
|
|
||||||
|
// BlockDevice represents a block device
|
||||||
|
type BlockDevice struct {
|
||||||
|
Name string `json:"name"`
|
||||||
|
Size uint64 `json:"size"`
|
||||||
|
Type string `json:"type"`
|
||||||
|
Model string `json:"model,omitempty"`
|
||||||
|
SerialNumber string `json:"serial_number"`
|
||||||
|
}
|
||||||
|
|
||||||
|
// NetworkStats represents network interface statistics
|
||||||
|
type NetworkStats struct {
|
||||||
|
Interface string `json:"interface"`
|
||||||
|
BytesRecv uint64 `json:"bytes_recv"`
|
||||||
|
BytesSent uint64 `json:"bytes_sent"`
|
||||||
|
PacketsRecv uint64 `json:"packets_recv"`
|
||||||
|
PacketsSent uint64 `json:"packets_sent"`
|
||||||
|
ErrorsIn uint64 `json:"errors_in"`
|
||||||
|
ErrorsOut uint64 `json:"errors_out"`
|
||||||
|
DropsIn uint64 `json:"drops_in"`
|
||||||
|
DropsOut uint64 `json:"drops_out"`
|
||||||
|
}
|
||||||
|
|
||||||
|
// AuthToken represents an authentication token
|
||||||
|
type AuthToken struct {
|
||||||
|
AccessToken string `json:"access_token"`
|
||||||
|
RefreshToken string `json:"refresh_token"`
|
||||||
|
TokenType string `json:"token_type"`
|
||||||
|
ExpiresAt time.Time `json:"expires_at"`
|
||||||
|
AgentID string `json:"agent_id"`
|
||||||
|
}
|
||||||
|
|
||||||
|
// DeviceAuthRequest represents the device authorization request
|
||||||
|
type DeviceAuthRequest struct {
|
||||||
|
ClientID string `json:"client_id"`
|
||||||
|
Scope string `json:"scope,omitempty"`
|
||||||
|
}
|
||||||
|
|
||||||
|
// DeviceAuthResponse represents the device authorization response
|
||||||
|
type DeviceAuthResponse struct {
|
||||||
|
DeviceCode string `json:"device_code"`
|
||||||
|
UserCode string `json:"user_code"`
|
||||||
|
VerificationURI string `json:"verification_uri"`
|
||||||
|
ExpiresIn int `json:"expires_in"`
|
||||||
|
Interval int `json:"interval"`
|
||||||
|
}
|
||||||
|
|
||||||
|
// TokenRequest represents the token request for device flow
|
||||||
|
type TokenRequest struct {
|
||||||
|
GrantType string `json:"grant_type"`
|
||||||
|
DeviceCode string `json:"device_code,omitempty"`
|
||||||
|
RefreshToken string `json:"refresh_token,omitempty"`
|
||||||
|
ClientID string `json:"client_id,omitempty"`
|
||||||
|
}
|
||||||
|
|
||||||
|
// TokenResponse represents the token response
|
||||||
|
type TokenResponse struct {
|
||||||
|
AccessToken string `json:"access_token"`
|
||||||
|
RefreshToken string `json:"refresh_token"`
|
||||||
|
TokenType string `json:"token_type"`
|
||||||
|
ExpiresIn int `json:"expires_in"`
|
||||||
|
AgentID string `json:"agent_id,omitempty"`
|
||||||
|
Error string `json:"error,omitempty"`
|
||||||
|
ErrorDescription string `json:"error_description,omitempty"`
|
||||||
|
}
|
||||||
|
|
||||||
|
// HeartbeatRequest represents the agent heartbeat request
|
||||||
|
type HeartbeatRequest struct {
|
||||||
|
AgentID string `json:"agent_id"`
|
||||||
|
Status string `json:"status"`
|
||||||
|
Metrics SystemMetrics `json:"metrics"`
|
||||||
|
}
|
||||||
|
|
||||||
|
// MetricsRequest represents the flattened metrics payload expected by agent-auth-api
|
||||||
|
type MetricsRequest struct {
|
||||||
|
// Agent identification
|
||||||
|
AgentID string `json:"agent_id"`
|
||||||
|
|
||||||
|
// Basic metrics
|
||||||
|
CPUUsage float64 `json:"cpu_usage"`
|
||||||
|
MemoryUsage float64 `json:"memory_usage"`
|
||||||
|
DiskUsage float64 `json:"disk_usage"`
|
||||||
|
|
||||||
|
// Network metrics
|
||||||
|
NetworkInKbps float64 `json:"network_in_kbps"`
|
||||||
|
NetworkOutKbps float64 `json:"network_out_kbps"`
|
||||||
|
|
||||||
|
// System information
|
||||||
|
IPAddress string `json:"ip_address"`
|
||||||
|
Location string `json:"location"`
|
||||||
|
AgentVersion string `json:"agent_version"`
|
||||||
|
KernelVersion string `json:"kernel_version"`
|
||||||
|
DeviceFingerprint string `json:"device_fingerprint"`
|
||||||
|
|
||||||
|
// Structured data (JSON fields in database)
|
||||||
|
LoadAverages map[string]float64 `json:"load_averages"`
|
||||||
|
OSInfo map[string]string `json:"os_info"`
|
||||||
|
FilesystemInfo []FilesystemInfo `json:"filesystem_info"`
|
||||||
|
BlockDevices []BlockDevice `json:"block_devices"`
|
||||||
|
NetworkStats map[string]uint64 `json:"network_stats"`
|
||||||
|
}
|
||||||
|
|
||||||
|
// Agent types for TensorZero integration
|
||||||
|
type DiagnosticResponse struct {
|
||||||
|
ResponseType string `json:"response_type"`
|
||||||
|
Reasoning string `json:"reasoning"`
|
||||||
|
Commands []Command `json:"commands"`
|
||||||
|
}
|
||||||
|
|
||||||
|
// ResolutionResponse represents a resolution response
|
||||||
|
type ResolutionResponse struct {
|
||||||
|
ResponseType string `json:"response_type"`
|
||||||
|
RootCause string `json:"root_cause"`
|
||||||
|
ResolutionPlan string `json:"resolution_plan"`
|
||||||
|
Confidence string `json:"confidence"`
|
||||||
|
}
|
||||||
|
|
||||||
|
// Command represents a command to execute
|
||||||
|
type Command struct {
|
||||||
|
ID string `json:"id"`
|
||||||
|
Command string `json:"command"`
|
||||||
|
Description string `json:"description"`
|
||||||
|
}
|
||||||
|
|
||||||
|
// CommandResult represents the result of an executed command
|
||||||
|
type CommandResult struct {
|
||||||
|
ID string `json:"id"`
|
||||||
|
Command string `json:"command"`
|
||||||
|
Description string `json:"description"`
|
||||||
|
Output string `json:"output"`
|
||||||
|
ExitCode int `json:"exit_code"`
|
||||||
|
Error string `json:"error,omitempty"`
|
||||||
|
}
|
||||||
|
|
||||||
|
// EBPFRequest represents an eBPF trace request from external API
|
||||||
|
type EBPFRequest struct {
|
||||||
|
Name string `json:"name"`
|
||||||
|
Type string `json:"type"` // "tracepoint", "kprobe", "kretprobe"
|
||||||
|
Target string `json:"target"` // tracepoint path or function name
|
||||||
|
Duration int `json:"duration"` // seconds
|
||||||
|
Filters map[string]string `json:"filters,omitempty"`
|
||||||
|
Description string `json:"description"`
|
||||||
|
}
|
||||||
|
|
||||||
|
// EBPFEnhancedDiagnosticResponse represents enhanced diagnostic response with eBPF
|
||||||
|
type EBPFEnhancedDiagnosticResponse struct {
|
||||||
|
ResponseType string `json:"response_type"`
|
||||||
|
Reasoning string `json:"reasoning"`
|
||||||
|
Commands []string `json:"commands"` // Changed to []string to match current prompt format
|
||||||
|
EBPFPrograms []EBPFRequest `json:"ebpf_programs"`
|
||||||
|
NextActions []string `json:"next_actions,omitempty"`
|
||||||
|
}
|
||||||
|
|
||||||
|
// TensorZeroRequest represents a request to TensorZero
|
||||||
|
type TensorZeroRequest struct {
|
||||||
|
Model string `json:"model"`
|
||||||
|
Messages []map[string]interface{} `json:"messages"`
|
||||||
|
EpisodeID string `json:"tensorzero::episode_id,omitempty"`
|
||||||
|
}
|
||||||
|
|
||||||
|
// TensorZeroResponse represents a response from TensorZero
|
||||||
|
type TensorZeroResponse struct {
|
||||||
|
Choices []map[string]interface{} `json:"choices"`
|
||||||
|
EpisodeID string `json:"episode_id"`
|
||||||
|
}
|
||||||
|
|
||||||
|
// SystemInfo represents system information (for compatibility)
|
||||||
|
type SystemInfo struct {
|
||||||
|
Hostname string `json:"hostname"`
|
||||||
|
Platform string `json:"platform"`
|
||||||
|
PlatformInfo map[string]string `json:"platform_info"`
|
||||||
|
KernelVersion string `json:"kernel_version"`
|
||||||
|
Uptime string `json:"uptime"`
|
||||||
|
LoadAverage []float64 `json:"load_average"`
|
||||||
|
CPUInfo map[string]string `json:"cpu_info"`
|
||||||
|
MemoryInfo map[string]string `json:"memory_info"`
|
||||||
|
DiskInfo []map[string]string `json:"disk_info"`
|
||||||
|
}
|
||||||
|
|
||||||
|
// AgentConfig represents agent configuration
|
||||||
|
type AgentConfig struct {
|
||||||
|
TensorZeroAPIKey string `json:"tensorzero_api_key"`
|
||||||
|
APIURL string `json:"api_url"`
|
||||||
|
Timeout int `json:"timeout"`
|
||||||
|
Debug bool `json:"debug"`
|
||||||
|
MaxRetries int `json:"max_retries"`
|
||||||
|
BackoffFactor int `json:"backoff_factor"`
|
||||||
|
EpisodeID string `json:"episode_id,omitempty"`
|
||||||
|
}
|
||||||
|
|
||||||
|
// PendingInvestigation represents a pending investigation from the database
|
||||||
|
type PendingInvestigation struct {
|
||||||
|
ID string `json:"id"`
|
||||||
|
InvestigationID string `json:"investigation_id"`
|
||||||
|
AgentID string `json:"agent_id"`
|
||||||
|
DiagnosticPayload map[string]interface{} `json:"diagnostic_payload"`
|
||||||
|
EpisodeID *string `json:"episode_id"`
|
||||||
|
Status string `json:"status"`
|
||||||
|
CreatedAt time.Time `json:"created_at"`
|
||||||
|
}
|
||||||
|
|
||||||
|
// DiagnosticAgent interface for agent functionality needed by other packages
|
||||||
|
type DiagnosticAgent interface {
|
||||||
|
DiagnoseIssue(issue string) error
|
||||||
|
// Exported method names to match what websocket client calls
|
||||||
|
ConvertEBPFProgramsToTraceSpecs(ebpfRequests []EBPFRequest) []ebpf.TraceSpec
|
||||||
|
ExecuteEBPFTraces(traceSpecs []ebpf.TraceSpec) []map[string]interface{}
|
||||||
|
SendRequestWithEpisode(messages []openai.ChatCompletionMessage, episodeID string) (*openai.ChatCompletionResponse, error)
|
||||||
|
SendRequest(messages []openai.ChatCompletionMessage) (*openai.ChatCompletionResponse, error)
|
||||||
|
ExecuteCommand(cmd Command) CommandResult
|
||||||
|
}
|
||||||
842
internal/websocket/websocket_client.go
Normal file
842
internal/websocket/websocket_client.go
Normal file
@@ -0,0 +1,842 @@
|
|||||||
|
package websocket
|
||||||
|
|
||||||
|
import (
|
||||||
|
"context"
|
||||||
|
"encoding/json"
|
||||||
|
"fmt"
|
||||||
|
"log"
|
||||||
|
"net"
|
||||||
|
"net/http"
|
||||||
|
"os"
|
||||||
|
"os/exec"
|
||||||
|
"strings"
|
||||||
|
"time"
|
||||||
|
|
||||||
|
"nannyagentv2/internal/auth"
|
||||||
|
"nannyagentv2/internal/logging"
|
||||||
|
"nannyagentv2/internal/metrics"
|
||||||
|
"nannyagentv2/internal/types"
|
||||||
|
|
||||||
|
"github.com/gorilla/websocket"
|
||||||
|
"github.com/sashabaranov/go-openai"
|
||||||
|
)
|
||||||
|
|
||||||
|
// Helper function for minimum of two integers
|
||||||
|
|
||||||
|
// WebSocketMessage represents a message sent over WebSocket
|
||||||
|
type WebSocketMessage struct {
|
||||||
|
Type string `json:"type"`
|
||||||
|
Data interface{} `json:"data"`
|
||||||
|
}
|
||||||
|
|
||||||
|
// InvestigationTask represents a task sent to the agent
|
||||||
|
type InvestigationTask struct {
|
||||||
|
TaskID string `json:"task_id"`
|
||||||
|
InvestigationID string `json:"investigation_id"`
|
||||||
|
AgentID string `json:"agent_id"`
|
||||||
|
DiagnosticPayload map[string]interface{} `json:"diagnostic_payload"`
|
||||||
|
EpisodeID string `json:"episode_id,omitempty"`
|
||||||
|
}
|
||||||
|
|
||||||
|
// TaskResult represents the result of a completed task
|
||||||
|
type TaskResult struct {
|
||||||
|
TaskID string `json:"task_id"`
|
||||||
|
Success bool `json:"success"`
|
||||||
|
CommandResults map[string]interface{} `json:"command_results,omitempty"`
|
||||||
|
Error string `json:"error,omitempty"`
|
||||||
|
}
|
||||||
|
|
||||||
|
// HeartbeatData represents heartbeat information
|
||||||
|
type HeartbeatData struct {
|
||||||
|
AgentID string `json:"agent_id"`
|
||||||
|
Timestamp time.Time `json:"timestamp"`
|
||||||
|
Version string `json:"version"`
|
||||||
|
}
|
||||||
|
|
||||||
|
// WebSocketClient handles WebSocket connection to Supabase backend
|
||||||
|
type WebSocketClient struct {
|
||||||
|
agent types.DiagnosticAgent // DiagnosticAgent interface
|
||||||
|
conn *websocket.Conn
|
||||||
|
agentID string
|
||||||
|
authManager *auth.AuthManager
|
||||||
|
metricsCollector *metrics.Collector
|
||||||
|
supabaseURL string
|
||||||
|
token string
|
||||||
|
ctx context.Context
|
||||||
|
cancel context.CancelFunc
|
||||||
|
consecutiveFailures int // Track consecutive connection failures
|
||||||
|
}
|
||||||
|
|
||||||
|
// NewWebSocketClient creates a new WebSocket client
|
||||||
|
func NewWebSocketClient(agent types.DiagnosticAgent, authManager *auth.AuthManager) *WebSocketClient {
|
||||||
|
// Get agent ID from authentication system
|
||||||
|
var agentID string
|
||||||
|
if authManager != nil {
|
||||||
|
if id, err := authManager.GetCurrentAgentID(); err == nil {
|
||||||
|
agentID = id
|
||||||
|
// Agent ID retrieved successfully
|
||||||
|
} else {
|
||||||
|
logging.Error("Failed to get agent ID from auth manager: %v", err)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// Fallback to environment variable or generate one if auth fails
|
||||||
|
if agentID == "" {
|
||||||
|
agentID = os.Getenv("AGENT_ID")
|
||||||
|
if agentID == "" {
|
||||||
|
agentID = fmt.Sprintf("agent-%d", time.Now().Unix())
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
supabaseURL := os.Getenv("SUPABASE_PROJECT_URL")
|
||||||
|
if supabaseURL == "" {
|
||||||
|
log.Fatal("❌ SUPABASE_PROJECT_URL environment variable is required")
|
||||||
|
}
|
||||||
|
|
||||||
|
// Create metrics collector
|
||||||
|
metricsCollector := metrics.NewCollector("v2.0.0")
|
||||||
|
|
||||||
|
ctx, cancel := context.WithCancel(context.Background())
|
||||||
|
|
||||||
|
return &WebSocketClient{
|
||||||
|
agent: agent,
|
||||||
|
agentID: agentID,
|
||||||
|
authManager: authManager,
|
||||||
|
metricsCollector: metricsCollector,
|
||||||
|
supabaseURL: supabaseURL,
|
||||||
|
ctx: ctx,
|
||||||
|
cancel: cancel,
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// Start starts the WebSocket connection and message handling
|
||||||
|
func (w *WebSocketClient) Start() error {
|
||||||
|
// Starting WebSocket client
|
||||||
|
|
||||||
|
if err := w.connect(); err != nil {
|
||||||
|
return fmt.Errorf("failed to establish WebSocket connection: %v", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Start message reading loop
|
||||||
|
go w.handleMessages()
|
||||||
|
|
||||||
|
// Start heartbeat
|
||||||
|
go w.startHeartbeat()
|
||||||
|
|
||||||
|
// Start database polling for pending investigations
|
||||||
|
go w.pollPendingInvestigations()
|
||||||
|
|
||||||
|
// WebSocket client started
|
||||||
|
return nil
|
||||||
|
}
|
||||||
|
|
||||||
|
// Stop closes the WebSocket connection
|
||||||
|
func (c *WebSocketClient) Stop() {
|
||||||
|
c.cancel()
|
||||||
|
if c.conn != nil {
|
||||||
|
c.conn.Close()
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// getAuthToken retrieves authentication token
|
||||||
|
func (c *WebSocketClient) getAuthToken() error {
|
||||||
|
if c.authManager == nil {
|
||||||
|
return fmt.Errorf("auth manager not available")
|
||||||
|
}
|
||||||
|
|
||||||
|
token, err := c.authManager.EnsureAuthenticated()
|
||||||
|
if err != nil {
|
||||||
|
return fmt.Errorf("authentication failed: %v", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
c.token = token.AccessToken
|
||||||
|
return nil
|
||||||
|
}
|
||||||
|
|
||||||
|
// connect establishes WebSocket connection
|
||||||
|
func (c *WebSocketClient) connect() error {
|
||||||
|
// Get fresh auth token
|
||||||
|
if err := c.getAuthToken(); err != nil {
|
||||||
|
return fmt.Errorf("failed to get auth token: %v", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Convert HTTP URL to WebSocket URL
|
||||||
|
wsURL := strings.Replace(c.supabaseURL, "https://", "wss://", 1)
|
||||||
|
wsURL = strings.Replace(wsURL, "http://", "ws://", 1)
|
||||||
|
wsURL += "/functions/v1/websocket-agent-handler"
|
||||||
|
|
||||||
|
// Connecting to WebSocket
|
||||||
|
|
||||||
|
// Set up headers
|
||||||
|
headers := http.Header{}
|
||||||
|
headers.Set("Authorization", "Bearer "+c.token)
|
||||||
|
|
||||||
|
// Connect
|
||||||
|
dialer := websocket.Dialer{
|
||||||
|
HandshakeTimeout: 10 * time.Second,
|
||||||
|
}
|
||||||
|
|
||||||
|
conn, resp, err := dialer.Dial(wsURL, headers)
|
||||||
|
if err != nil {
|
||||||
|
c.consecutiveFailures++
|
||||||
|
if c.consecutiveFailures >= 5 && resp != nil {
|
||||||
|
logging.Error("WebSocket handshake failed with status: %d (failure #%d)", resp.StatusCode, c.consecutiveFailures)
|
||||||
|
}
|
||||||
|
return fmt.Errorf("websocket connection failed: %v", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
c.conn = conn
|
||||||
|
// WebSocket client connected
|
||||||
|
return nil
|
||||||
|
}
|
||||||
|
|
||||||
|
// handleMessages processes incoming WebSocket messages
|
||||||
|
func (c *WebSocketClient) handleMessages() {
|
||||||
|
defer func() {
|
||||||
|
if c.conn != nil {
|
||||||
|
// Closing WebSocket connection
|
||||||
|
c.conn.Close()
|
||||||
|
}
|
||||||
|
}()
|
||||||
|
|
||||||
|
// Started WebSocket message listener
|
||||||
|
connectionStart := time.Now()
|
||||||
|
|
||||||
|
for {
|
||||||
|
select {
|
||||||
|
case <-c.ctx.Done():
|
||||||
|
// Only log context cancellation if there have been failures
|
||||||
|
if c.consecutiveFailures >= 5 {
|
||||||
|
logging.Debug("Context cancelled after %v, stopping message handler", time.Since(connectionStart))
|
||||||
|
}
|
||||||
|
return
|
||||||
|
default:
|
||||||
|
// Set read deadline to detect connection issues
|
||||||
|
c.conn.SetReadDeadline(time.Now().Add(90 * time.Second))
|
||||||
|
|
||||||
|
var message WebSocketMessage
|
||||||
|
readStart := time.Now()
|
||||||
|
err := c.conn.ReadJSON(&message)
|
||||||
|
readDuration := time.Since(readStart)
|
||||||
|
|
||||||
|
if err != nil {
|
||||||
|
connectionDuration := time.Since(connectionStart)
|
||||||
|
|
||||||
|
// Only log specific errors after failure threshold
|
||||||
|
if c.consecutiveFailures >= 5 {
|
||||||
|
if websocket.IsCloseError(err, websocket.CloseNormalClosure, websocket.CloseGoingAway) {
|
||||||
|
logging.Debug("WebSocket closed normally after %v: %v", connectionDuration, err)
|
||||||
|
} else if websocket.IsUnexpectedCloseError(err, websocket.CloseGoingAway, websocket.CloseAbnormalClosure) {
|
||||||
|
logging.Error("ABNORMAL CLOSE after %v (code 1006 = server-side timeout/kill): %v", connectionDuration, err)
|
||||||
|
logging.Debug("Last read took %v, connection lived %v", readDuration, connectionDuration)
|
||||||
|
} else if netErr, ok := err.(net.Error); ok && netErr.Timeout() {
|
||||||
|
logging.Warning("READ TIMEOUT after %v: %v", connectionDuration, err)
|
||||||
|
} else {
|
||||||
|
logging.Error("WebSocket error after %v: %v", connectionDuration, err)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// Track consecutive failures for diagnostic threshold
|
||||||
|
c.consecutiveFailures++
|
||||||
|
|
||||||
|
// Only show diagnostics after multiple failures
|
||||||
|
if c.consecutiveFailures >= 5 {
|
||||||
|
logging.Debug("DIAGNOSTIC - Connection failed #%d after %v", c.consecutiveFailures, connectionDuration)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Attempt reconnection instead of returning immediately
|
||||||
|
go c.attemptReconnection()
|
||||||
|
return
|
||||||
|
}
|
||||||
|
|
||||||
|
// Received WebSocket message successfully - reset failure counter
|
||||||
|
c.consecutiveFailures = 0
|
||||||
|
|
||||||
|
switch message.Type {
|
||||||
|
case "connection_ack":
|
||||||
|
// Connection acknowledged
|
||||||
|
|
||||||
|
case "heartbeat_ack":
|
||||||
|
// Heartbeat acknowledged
|
||||||
|
|
||||||
|
case "investigation_task":
|
||||||
|
// Received investigation task - processing
|
||||||
|
go c.handleInvestigationTask(message.Data)
|
||||||
|
|
||||||
|
case "task_result_ack":
|
||||||
|
// Task result acknowledged
|
||||||
|
|
||||||
|
default:
|
||||||
|
logging.Warning("Unknown message type: %s", message.Type)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// handleInvestigationTask processes investigation tasks from the backend
|
||||||
|
func (c *WebSocketClient) handleInvestigationTask(data interface{}) {
|
||||||
|
// Parse task data
|
||||||
|
taskBytes, err := json.Marshal(data)
|
||||||
|
if err != nil {
|
||||||
|
logging.Error("Error marshaling task data: %v", err)
|
||||||
|
return
|
||||||
|
}
|
||||||
|
|
||||||
|
var task InvestigationTask
|
||||||
|
err = json.Unmarshal(taskBytes, &task)
|
||||||
|
if err != nil {
|
||||||
|
logging.Error("Error unmarshaling investigation task: %v", err)
|
||||||
|
return
|
||||||
|
}
|
||||||
|
|
||||||
|
// Processing investigation task
|
||||||
|
|
||||||
|
// Execute diagnostic commands
|
||||||
|
results, err := c.executeDiagnosticCommands(task.DiagnosticPayload)
|
||||||
|
|
||||||
|
// Prepare task result
|
||||||
|
taskResult := TaskResult{
|
||||||
|
TaskID: task.TaskID,
|
||||||
|
Success: err == nil,
|
||||||
|
}
|
||||||
|
|
||||||
|
if err != nil {
|
||||||
|
taskResult.Error = err.Error()
|
||||||
|
logging.Error("Task execution failed: %v", err)
|
||||||
|
} else {
|
||||||
|
taskResult.CommandResults = results
|
||||||
|
// Task executed successfully
|
||||||
|
}
|
||||||
|
|
||||||
|
// Send result back
|
||||||
|
c.sendTaskResult(taskResult)
|
||||||
|
}
|
||||||
|
|
||||||
|
// executeDiagnosticCommands executes the commands from a diagnostic response
|
||||||
|
func (c *WebSocketClient) executeDiagnosticCommands(diagnosticPayload map[string]interface{}) (map[string]interface{}, error) {
|
||||||
|
results := map[string]interface{}{
|
||||||
|
"agent_id": c.agentID,
|
||||||
|
"execution_time": time.Now().UTC().Format(time.RFC3339),
|
||||||
|
"command_results": []map[string]interface{}{},
|
||||||
|
}
|
||||||
|
|
||||||
|
// Extract commands from diagnostic payload
|
||||||
|
commands, ok := diagnosticPayload["commands"].([]interface{})
|
||||||
|
if !ok {
|
||||||
|
return nil, fmt.Errorf("no commands found in diagnostic payload")
|
||||||
|
}
|
||||||
|
|
||||||
|
var commandResults []map[string]interface{}
|
||||||
|
|
||||||
|
for _, cmd := range commands {
|
||||||
|
cmdMap, ok := cmd.(map[string]interface{})
|
||||||
|
if !ok {
|
||||||
|
continue
|
||||||
|
}
|
||||||
|
|
||||||
|
id, _ := cmdMap["id"].(string)
|
||||||
|
command, _ := cmdMap["command"].(string)
|
||||||
|
description, _ := cmdMap["description"].(string)
|
||||||
|
|
||||||
|
if command == "" {
|
||||||
|
continue
|
||||||
|
}
|
||||||
|
|
||||||
|
// Executing command
|
||||||
|
|
||||||
|
// Execute the command
|
||||||
|
output, exitCode, err := c.executeCommand(command)
|
||||||
|
|
||||||
|
result := map[string]interface{}{
|
||||||
|
"id": id,
|
||||||
|
"command": command,
|
||||||
|
"description": description,
|
||||||
|
"output": output,
|
||||||
|
"exit_code": exitCode,
|
||||||
|
"success": err == nil && exitCode == 0,
|
||||||
|
}
|
||||||
|
|
||||||
|
if err != nil {
|
||||||
|
result["error"] = err.Error()
|
||||||
|
logging.Warning("Command [%s] failed: %v (exit code: %d)", id, err, exitCode)
|
||||||
|
}
|
||||||
|
|
||||||
|
commandResults = append(commandResults, result)
|
||||||
|
}
|
||||||
|
|
||||||
|
results["command_results"] = commandResults
|
||||||
|
results["total_commands"] = len(commandResults)
|
||||||
|
results["successful_commands"] = c.countSuccessfulCommands(commandResults)
|
||||||
|
|
||||||
|
// Execute eBPF programs if present
|
||||||
|
ebpfPrograms, hasEBPF := diagnosticPayload["ebpf_programs"].([]interface{})
|
||||||
|
if hasEBPF && len(ebpfPrograms) > 0 {
|
||||||
|
ebpfResults := c.executeEBPFPrograms(ebpfPrograms)
|
||||||
|
results["ebpf_results"] = ebpfResults
|
||||||
|
results["total_ebpf_programs"] = len(ebpfPrograms)
|
||||||
|
}
|
||||||
|
|
||||||
|
return results, nil
|
||||||
|
}
|
||||||
|
|
||||||
|
// executeEBPFPrograms executes eBPF monitoring programs using the real eBPF manager
|
||||||
|
func (c *WebSocketClient) executeEBPFPrograms(ebpfPrograms []interface{}) []map[string]interface{} {
|
||||||
|
var ebpfRequests []types.EBPFRequest
|
||||||
|
|
||||||
|
// Convert interface{} to EBPFRequest structs
|
||||||
|
for _, prog := range ebpfPrograms {
|
||||||
|
progMap, ok := prog.(map[string]interface{})
|
||||||
|
if !ok {
|
||||||
|
continue
|
||||||
|
}
|
||||||
|
|
||||||
|
name, _ := progMap["name"].(string)
|
||||||
|
progType, _ := progMap["type"].(string)
|
||||||
|
target, _ := progMap["target"].(string)
|
||||||
|
duration, _ := progMap["duration"].(float64)
|
||||||
|
description, _ := progMap["description"].(string)
|
||||||
|
|
||||||
|
if name == "" || progType == "" || target == "" {
|
||||||
|
continue
|
||||||
|
}
|
||||||
|
|
||||||
|
ebpfRequests = append(ebpfRequests, types.EBPFRequest{
|
||||||
|
Name: name,
|
||||||
|
Type: progType,
|
||||||
|
Target: target,
|
||||||
|
Duration: int(duration),
|
||||||
|
Description: description,
|
||||||
|
})
|
||||||
|
}
|
||||||
|
|
||||||
|
// Execute eBPF programs using the agent's new BCC concurrent execution logic
|
||||||
|
traceSpecs := c.agent.ConvertEBPFProgramsToTraceSpecs(ebpfRequests)
|
||||||
|
return c.agent.ExecuteEBPFTraces(traceSpecs)
|
||||||
|
}
|
||||||
|
|
||||||
|
// executeCommandsFromPayload executes commands from a payload and returns results
|
||||||
|
func (c *WebSocketClient) executeCommandsFromPayload(commands []interface{}) []map[string]interface{} {
|
||||||
|
var commandResults []map[string]interface{}
|
||||||
|
|
||||||
|
for _, cmd := range commands {
|
||||||
|
cmdMap, ok := cmd.(map[string]interface{})
|
||||||
|
if !ok {
|
||||||
|
continue
|
||||||
|
}
|
||||||
|
|
||||||
|
id, _ := cmdMap["id"].(string)
|
||||||
|
command, _ := cmdMap["command"].(string)
|
||||||
|
description, _ := cmdMap["description"].(string)
|
||||||
|
|
||||||
|
if command == "" {
|
||||||
|
continue
|
||||||
|
}
|
||||||
|
|
||||||
|
// Execute the command
|
||||||
|
output, exitCode, err := c.executeCommand(command)
|
||||||
|
|
||||||
|
result := map[string]interface{}{
|
||||||
|
"id": id,
|
||||||
|
"command": command,
|
||||||
|
"description": description,
|
||||||
|
"output": output,
|
||||||
|
"exit_code": exitCode,
|
||||||
|
"success": err == nil && exitCode == 0,
|
||||||
|
}
|
||||||
|
|
||||||
|
if err != nil {
|
||||||
|
result["error"] = err.Error()
|
||||||
|
logging.Warning("Command [%s] failed: %v (exit code: %d)", id, err, exitCode)
|
||||||
|
}
|
||||||
|
|
||||||
|
commandResults = append(commandResults, result)
|
||||||
|
}
|
||||||
|
|
||||||
|
return commandResults
|
||||||
|
}
|
||||||
|
|
||||||
|
// executeCommand executes a shell command and returns output, exit code, and error
|
||||||
|
func (c *WebSocketClient) executeCommand(command string) (string, int, error) {
|
||||||
|
// Parse command into parts
|
||||||
|
parts := strings.Fields(command)
|
||||||
|
if len(parts) == 0 {
|
||||||
|
return "", -1, fmt.Errorf("empty command")
|
||||||
|
}
|
||||||
|
|
||||||
|
// Create command with timeout
|
||||||
|
ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
|
||||||
|
defer cancel()
|
||||||
|
|
||||||
|
cmd := exec.CommandContext(ctx, parts[0], parts[1:]...)
|
||||||
|
cmd.Env = os.Environ()
|
||||||
|
|
||||||
|
output, err := cmd.CombinedOutput()
|
||||||
|
exitCode := 0
|
||||||
|
|
||||||
|
if err != nil {
|
||||||
|
if exitError, ok := err.(*exec.ExitError); ok {
|
||||||
|
exitCode = exitError.ExitCode()
|
||||||
|
} else {
|
||||||
|
exitCode = -1
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
return string(output), exitCode, err
|
||||||
|
}
|
||||||
|
|
||||||
|
// countSuccessfulCommands counts the number of successful commands
|
||||||
|
func (c *WebSocketClient) countSuccessfulCommands(results []map[string]interface{}) int {
|
||||||
|
count := 0
|
||||||
|
for _, result := range results {
|
||||||
|
if success, ok := result["success"].(bool); ok && success {
|
||||||
|
count++
|
||||||
|
}
|
||||||
|
}
|
||||||
|
return count
|
||||||
|
}
|
||||||
|
|
||||||
|
// sendTaskResult sends a task result back to the backend
|
||||||
|
func (c *WebSocketClient) sendTaskResult(result TaskResult) {
|
||||||
|
message := WebSocketMessage{
|
||||||
|
Type: "task_result",
|
||||||
|
Data: result,
|
||||||
|
}
|
||||||
|
|
||||||
|
err := c.conn.WriteJSON(message)
|
||||||
|
if err != nil {
|
||||||
|
logging.Error("Error sending task result: %v", err)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// startHeartbeat sends periodic heartbeat messages
|
||||||
|
func (c *WebSocketClient) startHeartbeat() {
|
||||||
|
ticker := time.NewTicker(30 * time.Second) // Heartbeat every 30 seconds
|
||||||
|
defer ticker.Stop()
|
||||||
|
|
||||||
|
// Starting heartbeat
|
||||||
|
|
||||||
|
for {
|
||||||
|
select {
|
||||||
|
case <-c.ctx.Done():
|
||||||
|
logging.Debug("Heartbeat stopped due to context cancellation")
|
||||||
|
return
|
||||||
|
case <-ticker.C:
|
||||||
|
// Sending heartbeat
|
||||||
|
heartbeat := WebSocketMessage{
|
||||||
|
Type: "heartbeat",
|
||||||
|
Data: HeartbeatData{
|
||||||
|
AgentID: c.agentID,
|
||||||
|
Timestamp: time.Now(),
|
||||||
|
Version: "v2.0.0",
|
||||||
|
},
|
||||||
|
}
|
||||||
|
|
||||||
|
err := c.conn.WriteJSON(heartbeat)
|
||||||
|
if err != nil {
|
||||||
|
logging.Error("Error sending heartbeat: %v", err)
|
||||||
|
logging.Debug("Heartbeat failed, connection likely dead")
|
||||||
|
return
|
||||||
|
}
|
||||||
|
// Heartbeat sent
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// pollPendingInvestigations polls the database for pending investigations
|
||||||
|
func (c *WebSocketClient) pollPendingInvestigations() {
|
||||||
|
// Starting database polling
|
||||||
|
ticker := time.NewTicker(5 * time.Second) // Poll every 5 seconds
|
||||||
|
defer ticker.Stop()
|
||||||
|
|
||||||
|
for {
|
||||||
|
select {
|
||||||
|
case <-c.ctx.Done():
|
||||||
|
return
|
||||||
|
case <-ticker.C:
|
||||||
|
c.checkForPendingInvestigations()
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// checkForPendingInvestigations checks the database for new pending investigations via proxy
|
||||||
|
func (c *WebSocketClient) checkForPendingInvestigations() {
|
||||||
|
// Use Edge Function proxy instead of direct database access
|
||||||
|
url := fmt.Sprintf("%s/functions/v1/agent-database-proxy/pending-investigations", c.supabaseURL)
|
||||||
|
|
||||||
|
// Poll database for pending investigations
|
||||||
|
|
||||||
|
req, err := http.NewRequest("GET", url, nil)
|
||||||
|
if err != nil {
|
||||||
|
// Request creation failed
|
||||||
|
return
|
||||||
|
}
|
||||||
|
|
||||||
|
// Only JWT token needed for proxy - no API keys exposed
|
||||||
|
req.Header.Set("Authorization", fmt.Sprintf("Bearer %s", c.token))
|
||||||
|
req.Header.Set("Accept", "application/json")
|
||||||
|
|
||||||
|
client := &http.Client{Timeout: 10 * time.Second}
|
||||||
|
resp, err := client.Do(req)
|
||||||
|
if err != nil {
|
||||||
|
// Database request failed
|
||||||
|
return
|
||||||
|
}
|
||||||
|
defer resp.Body.Close()
|
||||||
|
|
||||||
|
if resp.StatusCode != 200 {
|
||||||
|
return
|
||||||
|
}
|
||||||
|
|
||||||
|
var investigations []types.PendingInvestigation
|
||||||
|
err = json.NewDecoder(resp.Body).Decode(&investigations)
|
||||||
|
if err != nil {
|
||||||
|
// Response decode failed
|
||||||
|
return
|
||||||
|
}
|
||||||
|
|
||||||
|
for _, investigation := range investigations {
|
||||||
|
go c.handlePendingInvestigation(investigation)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// handlePendingInvestigation processes a pending investigation from database polling
|
||||||
|
func (c *WebSocketClient) handlePendingInvestigation(investigation types.PendingInvestigation) {
|
||||||
|
// Processing pending investigation
|
||||||
|
|
||||||
|
// Mark as executing
|
||||||
|
err := c.updateInvestigationStatus(investigation.ID, "executing", nil, nil)
|
||||||
|
if err != nil {
|
||||||
|
return
|
||||||
|
}
|
||||||
|
|
||||||
|
// Execute diagnostic commands
|
||||||
|
results, err := c.executeDiagnosticCommands(investigation.DiagnosticPayload)
|
||||||
|
|
||||||
|
// Prepare the base results map we'll send to DB
|
||||||
|
resultsForDB := map[string]interface{}{
|
||||||
|
"agent_id": c.agentID,
|
||||||
|
"execution_time": time.Now().UTC().Format(time.RFC3339),
|
||||||
|
"command_results": results,
|
||||||
|
}
|
||||||
|
|
||||||
|
// If command execution failed, mark investigation as failed
|
||||||
|
if err != nil {
|
||||||
|
errorMsg := err.Error()
|
||||||
|
// Include partial results when possible
|
||||||
|
if results != nil {
|
||||||
|
resultsForDB["command_results"] = results
|
||||||
|
}
|
||||||
|
c.updateInvestigationStatus(investigation.ID, "failed", resultsForDB, &errorMsg)
|
||||||
|
// Investigation failed
|
||||||
|
return
|
||||||
|
}
|
||||||
|
|
||||||
|
// Try to continue the TensorZero conversation by sending command results back
|
||||||
|
// Build messages: assistant = diagnostic payload, user = command results
|
||||||
|
diagJSON, _ := json.Marshal(investigation.DiagnosticPayload)
|
||||||
|
commandsJSON, _ := json.MarshalIndent(results, "", " ")
|
||||||
|
|
||||||
|
messages := []openai.ChatCompletionMessage{
|
||||||
|
{
|
||||||
|
Role: openai.ChatMessageRoleAssistant,
|
||||||
|
Content: string(diagJSON),
|
||||||
|
},
|
||||||
|
{
|
||||||
|
Role: openai.ChatMessageRoleUser,
|
||||||
|
Content: string(commandsJSON),
|
||||||
|
},
|
||||||
|
}
|
||||||
|
|
||||||
|
// Use the episode ID from the investigation to maintain conversation continuity
|
||||||
|
episodeID := ""
|
||||||
|
if investigation.EpisodeID != nil {
|
||||||
|
episodeID = *investigation.EpisodeID
|
||||||
|
}
|
||||||
|
|
||||||
|
// Continue conversation until resolution (same as agent)
|
||||||
|
var finalAIContent string
|
||||||
|
for {
|
||||||
|
tzResp, tzErr := c.agent.SendRequestWithEpisode(messages, episodeID)
|
||||||
|
if tzErr != nil {
|
||||||
|
logging.Warning("TensorZero continuation failed: %v", tzErr)
|
||||||
|
// Fall back to marking completed with command results only
|
||||||
|
c.updateInvestigationStatus(investigation.ID, "completed", resultsForDB, nil)
|
||||||
|
return
|
||||||
|
}
|
||||||
|
|
||||||
|
if len(tzResp.Choices) == 0 {
|
||||||
|
logging.Warning("No choices in TensorZero response")
|
||||||
|
c.updateInvestigationStatus(investigation.ID, "completed", resultsForDB, nil)
|
||||||
|
return
|
||||||
|
}
|
||||||
|
|
||||||
|
aiContent := tzResp.Choices[0].Message.Content
|
||||||
|
if len(aiContent) > 300 {
|
||||||
|
// AI response received successfully
|
||||||
|
} else {
|
||||||
|
logging.Debug("AI Response: %s", aiContent)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Check if this is a resolution response (final)
|
||||||
|
var resolutionResp struct {
|
||||||
|
ResponseType string `json:"response_type"`
|
||||||
|
RootCause string `json:"root_cause"`
|
||||||
|
ResolutionPlan string `json:"resolution_plan"`
|
||||||
|
Confidence string `json:"confidence"`
|
||||||
|
}
|
||||||
|
|
||||||
|
logging.Debug("Analyzing AI response type...")
|
||||||
|
|
||||||
|
if err := json.Unmarshal([]byte(aiContent), &resolutionResp); err == nil && resolutionResp.ResponseType == "resolution" {
|
||||||
|
// This is the final resolution - show summary and complete
|
||||||
|
logging.Info("=== DIAGNOSIS COMPLETE ===")
|
||||||
|
logging.Info("Root Cause: %s", resolutionResp.RootCause)
|
||||||
|
logging.Info("Resolution Plan: %s", resolutionResp.ResolutionPlan)
|
||||||
|
logging.Info("Confidence: %s", resolutionResp.Confidence)
|
||||||
|
finalAIContent = aiContent
|
||||||
|
break
|
||||||
|
}
|
||||||
|
|
||||||
|
// Check if this is another diagnostic response requiring more commands
|
||||||
|
var diagnosticResp struct {
|
||||||
|
ResponseType string `json:"response_type"`
|
||||||
|
Commands []interface{} `json:"commands"`
|
||||||
|
EBPFPrograms []interface{} `json:"ebpf_programs"`
|
||||||
|
}
|
||||||
|
|
||||||
|
if err := json.Unmarshal([]byte(aiContent), &diagnosticResp); err == nil && diagnosticResp.ResponseType == "diagnostic" {
|
||||||
|
logging.Debug("AI requested additional diagnostics, executing...")
|
||||||
|
|
||||||
|
// Execute additional commands if any
|
||||||
|
additionalResults := map[string]interface{}{
|
||||||
|
"command_results": []map[string]interface{}{},
|
||||||
|
}
|
||||||
|
|
||||||
|
if len(diagnosticResp.Commands) > 0 {
|
||||||
|
logging.Debug("Executing %d additional diagnostic commands", len(diagnosticResp.Commands))
|
||||||
|
commandResults := c.executeCommandsFromPayload(diagnosticResp.Commands)
|
||||||
|
additionalResults["command_results"] = commandResults
|
||||||
|
}
|
||||||
|
|
||||||
|
// Execute additional eBPF programs if any
|
||||||
|
if len(diagnosticResp.EBPFPrograms) > 0 {
|
||||||
|
ebpfResults := c.executeEBPFPrograms(diagnosticResp.EBPFPrograms)
|
||||||
|
additionalResults["ebpf_results"] = ebpfResults
|
||||||
|
}
|
||||||
|
|
||||||
|
// Add AI response and additional results to conversation
|
||||||
|
messages = append(messages, openai.ChatCompletionMessage{
|
||||||
|
Role: openai.ChatMessageRoleAssistant,
|
||||||
|
Content: aiContent,
|
||||||
|
})
|
||||||
|
|
||||||
|
additionalResultsJSON, _ := json.MarshalIndent(additionalResults, "", " ")
|
||||||
|
messages = append(messages, openai.ChatCompletionMessage{
|
||||||
|
Role: openai.ChatMessageRoleUser,
|
||||||
|
Content: string(additionalResultsJSON),
|
||||||
|
})
|
||||||
|
|
||||||
|
continue
|
||||||
|
}
|
||||||
|
|
||||||
|
// If neither resolution nor diagnostic, treat as final response
|
||||||
|
logging.Warning("Unknown response type - treating as final response")
|
||||||
|
finalAIContent = aiContent
|
||||||
|
break
|
||||||
|
}
|
||||||
|
|
||||||
|
// Attach final AI response to results for DB and mark as completed_with_analysis
|
||||||
|
resultsForDB["ai_response"] = finalAIContent
|
||||||
|
c.updateInvestigationStatus(investigation.ID, "completed_with_analysis", resultsForDB, nil)
|
||||||
|
}
|
||||||
|
|
||||||
|
// updateInvestigationStatus updates the status of a pending investigation
|
||||||
|
func (c *WebSocketClient) updateInvestigationStatus(id, status string, results map[string]interface{}, errorMsg *string) error {
|
||||||
|
updateData := map[string]interface{}{
|
||||||
|
"status": status,
|
||||||
|
}
|
||||||
|
|
||||||
|
if status == "executing" {
|
||||||
|
updateData["started_at"] = time.Now().UTC().Format(time.RFC3339)
|
||||||
|
} else if status == "completed" {
|
||||||
|
updateData["completed_at"] = time.Now().UTC().Format(time.RFC3339)
|
||||||
|
if results != nil {
|
||||||
|
updateData["command_results"] = results
|
||||||
|
}
|
||||||
|
} else if status == "failed" && errorMsg != nil {
|
||||||
|
updateData["error_message"] = *errorMsg
|
||||||
|
updateData["completed_at"] = time.Now().UTC().Format(time.RFC3339)
|
||||||
|
}
|
||||||
|
|
||||||
|
jsonData, err := json.Marshal(updateData)
|
||||||
|
if err != nil {
|
||||||
|
return fmt.Errorf("failed to marshal update data: %v", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
url := fmt.Sprintf("%s/functions/v1/agent-database-proxy/pending-investigations/%s", c.supabaseURL, id)
|
||||||
|
req, err := http.NewRequest("PATCH", url, strings.NewReader(string(jsonData)))
|
||||||
|
if err != nil {
|
||||||
|
return fmt.Errorf("failed to create request: %v", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Only JWT token needed for proxy - no API keys exposed
|
||||||
|
req.Header.Set("Authorization", fmt.Sprintf("Bearer %s", c.token))
|
||||||
|
req.Header.Set("Content-Type", "application/json")
|
||||||
|
|
||||||
|
client := &http.Client{Timeout: 10 * time.Second}
|
||||||
|
resp, err := client.Do(req)
|
||||||
|
if err != nil {
|
||||||
|
return fmt.Errorf("failed to update investigation: %v", err)
|
||||||
|
}
|
||||||
|
defer resp.Body.Close()
|
||||||
|
|
||||||
|
if resp.StatusCode != 200 && resp.StatusCode != 204 {
|
||||||
|
return fmt.Errorf("supabase update error: %d", resp.StatusCode)
|
||||||
|
}
|
||||||
|
|
||||||
|
return nil
|
||||||
|
}
|
||||||
|
|
||||||
|
// attemptReconnection attempts to reconnect the WebSocket with backoff
|
||||||
|
func (c *WebSocketClient) attemptReconnection() {
|
||||||
|
backoffDurations := []time.Duration{
|
||||||
|
2 * time.Second,
|
||||||
|
5 * time.Second,
|
||||||
|
10 * time.Second,
|
||||||
|
20 * time.Second,
|
||||||
|
30 * time.Second,
|
||||||
|
}
|
||||||
|
|
||||||
|
for i, backoff := range backoffDurations {
|
||||||
|
select {
|
||||||
|
case <-c.ctx.Done():
|
||||||
|
return
|
||||||
|
default:
|
||||||
|
c.consecutiveFailures++
|
||||||
|
|
||||||
|
// Only show messages after 5 consecutive failures
|
||||||
|
if c.consecutiveFailures >= 5 {
|
||||||
|
logging.Info("Attempting WebSocket reconnection (attempt %d/%d) - %d consecutive failures", i+1, len(backoffDurations), c.consecutiveFailures)
|
||||||
|
}
|
||||||
|
|
||||||
|
time.Sleep(backoff)
|
||||||
|
|
||||||
|
if err := c.connect(); err != nil {
|
||||||
|
if c.consecutiveFailures >= 5 {
|
||||||
|
logging.Warning("Reconnection attempt %d failed: %v", i+1, err)
|
||||||
|
}
|
||||||
|
continue
|
||||||
|
}
|
||||||
|
|
||||||
|
// Successfully reconnected - reset failure counter
|
||||||
|
if c.consecutiveFailures >= 5 {
|
||||||
|
logging.Info("WebSocket reconnected successfully after %d failures", c.consecutiveFailures)
|
||||||
|
}
|
||||||
|
c.consecutiveFailures = 0
|
||||||
|
go c.handleMessages() // Restart message handling
|
||||||
|
return
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
logging.Error("Failed to reconnect after %d attempts, giving up", len(backoffDurations))
|
||||||
|
}
|
||||||
265
main.go
265
main.go
@@ -2,6 +2,7 @@ package main
|
|||||||
|
|
||||||
import (
|
import (
|
||||||
"bufio"
|
"bufio"
|
||||||
|
"flag"
|
||||||
"fmt"
|
"fmt"
|
||||||
"log"
|
"log"
|
||||||
"os"
|
"os"
|
||||||
@@ -9,26 +10,74 @@ import (
|
|||||||
"strconv"
|
"strconv"
|
||||||
"strings"
|
"strings"
|
||||||
"syscall"
|
"syscall"
|
||||||
|
"time"
|
||||||
|
|
||||||
|
"nannyagentv2/internal/auth"
|
||||||
|
"nannyagentv2/internal/config"
|
||||||
|
"nannyagentv2/internal/logging"
|
||||||
|
"nannyagentv2/internal/metrics"
|
||||||
|
"nannyagentv2/internal/types"
|
||||||
|
"nannyagentv2/internal/websocket"
|
||||||
)
|
)
|
||||||
|
|
||||||
|
const Version = "0.0.1"
|
||||||
|
|
||||||
|
// showVersion displays the version information
|
||||||
|
func showVersion() {
|
||||||
|
fmt.Printf("nannyagent version %s\n", Version)
|
||||||
|
fmt.Println("Linux diagnostic agent with eBPF capabilities")
|
||||||
|
os.Exit(0)
|
||||||
|
}
|
||||||
|
|
||||||
|
// showHelp displays the help information
|
||||||
|
func showHelp() {
|
||||||
|
fmt.Println("NannyAgent - Linux Diagnostic Agent with eBPF Monitoring")
|
||||||
|
fmt.Printf("Version: %s\n\n", Version)
|
||||||
|
fmt.Println("USAGE:")
|
||||||
|
fmt.Printf(" sudo %s [OPTIONS]\n\n", os.Args[0])
|
||||||
|
fmt.Println("OPTIONS:")
|
||||||
|
fmt.Println(" --version, -v Show version information")
|
||||||
|
fmt.Println(" --help, -h Show this help message")
|
||||||
|
fmt.Println()
|
||||||
|
fmt.Println("DESCRIPTION:")
|
||||||
|
fmt.Println(" NannyAgent is an AI-powered Linux diagnostic tool that uses eBPF")
|
||||||
|
fmt.Println(" for deep system monitoring and analysis. It requires root privileges")
|
||||||
|
fmt.Println(" to run for eBPF functionality.")
|
||||||
|
fmt.Println()
|
||||||
|
fmt.Println("REQUIREMENTS:")
|
||||||
|
fmt.Println(" - Linux kernel 5.x or higher")
|
||||||
|
fmt.Println(" - Root privileges (sudo)")
|
||||||
|
fmt.Println(" - bpftrace and bpfcc-tools installed")
|
||||||
|
fmt.Println(" - Network connectivity to Supabase")
|
||||||
|
fmt.Println()
|
||||||
|
fmt.Println("CONFIGURATION:")
|
||||||
|
fmt.Println(" Configuration file: /etc/nannyagent/config.env")
|
||||||
|
fmt.Println(" Data directory: /var/lib/nannyagent")
|
||||||
|
fmt.Println()
|
||||||
|
fmt.Println("EXAMPLES:")
|
||||||
|
fmt.Printf(" # Run the agent\n")
|
||||||
|
fmt.Printf(" sudo %s\n\n", os.Args[0])
|
||||||
|
fmt.Printf(" # Show version (no sudo required)\n")
|
||||||
|
fmt.Printf(" %s --version\n\n", os.Args[0])
|
||||||
|
fmt.Println("For more information, visit: https://github.com/yourusername/nannyagent")
|
||||||
|
os.Exit(0)
|
||||||
|
}
|
||||||
|
|
||||||
// checkRootPrivileges ensures the program is running as root
|
// checkRootPrivileges ensures the program is running as root
|
||||||
func checkRootPrivileges() {
|
func checkRootPrivileges() {
|
||||||
if os.Geteuid() != 0 {
|
if os.Geteuid() != 0 {
|
||||||
fmt.Fprintf(os.Stderr, "❌ ERROR: This program must be run as root for eBPF functionality.\n")
|
logging.Error("This program must be run as root for eBPF functionality")
|
||||||
fmt.Fprintf(os.Stderr, "Please run with: sudo %s\n", os.Args[0])
|
logging.Error("Please run with: sudo %s", os.Args[0])
|
||||||
fmt.Fprintf(os.Stderr, "Reason: eBPF programs require root privileges to:\n")
|
logging.Error("Reason: eBPF programs require root privileges to:\n - Load programs into the kernel\n - Attach to kernel functions and tracepoints\n - Access kernel memory maps")
|
||||||
fmt.Fprintf(os.Stderr, " - Load programs into the kernel\n")
|
|
||||||
fmt.Fprintf(os.Stderr, " - Attach to kernel functions and tracepoints\n")
|
|
||||||
fmt.Fprintf(os.Stderr, " - Access kernel memory maps\n")
|
|
||||||
os.Exit(1)
|
os.Exit(1)
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
// checkKernelVersionCompatibility ensures kernel version is 4.4 or higher
|
// checkKernelVersionCompatibility ensures kernel version is 5.x or higher
|
||||||
func checkKernelVersionCompatibility() {
|
func checkKernelVersionCompatibility() {
|
||||||
output, err := exec.Command("uname", "-r").Output()
|
output, err := exec.Command("uname", "-r").Output()
|
||||||
if err != nil {
|
if err != nil {
|
||||||
fmt.Fprintf(os.Stderr, "❌ ERROR: Cannot determine kernel version: %v\n", err)
|
logging.Error("Cannot determine kernel version: %v", err)
|
||||||
os.Exit(1)
|
os.Exit(1)
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -37,81 +86,51 @@ func checkKernelVersionCompatibility() {
|
|||||||
// Parse version (e.g., "5.15.0-56-generic" -> major=5, minor=15)
|
// Parse version (e.g., "5.15.0-56-generic" -> major=5, minor=15)
|
||||||
parts := strings.Split(kernelVersion, ".")
|
parts := strings.Split(kernelVersion, ".")
|
||||||
if len(parts) < 2 {
|
if len(parts) < 2 {
|
||||||
fmt.Fprintf(os.Stderr, "❌ ERROR: Cannot parse kernel version: %s\n", kernelVersion)
|
logging.Error("Cannot parse kernel version: %s", kernelVersion)
|
||||||
os.Exit(1)
|
os.Exit(1)
|
||||||
}
|
}
|
||||||
|
|
||||||
major, err := strconv.Atoi(parts[0])
|
major, err := strconv.Atoi(parts[0])
|
||||||
if err != nil {
|
if err != nil {
|
||||||
fmt.Fprintf(os.Stderr, "❌ ERROR: Cannot parse major kernel version: %s\n", parts[0])
|
logging.Error("Cannot parse major kernel version: %s", parts[0])
|
||||||
os.Exit(1)
|
os.Exit(1)
|
||||||
}
|
}
|
||||||
|
|
||||||
minor, err := strconv.Atoi(parts[1])
|
// Check if kernel is 5.x or higher
|
||||||
if err != nil {
|
if major < 5 {
|
||||||
fmt.Fprintf(os.Stderr, "❌ ERROR: Cannot parse minor kernel version: %s\n", parts[1])
|
logging.Error("Kernel version %s is not supported", kernelVersion)
|
||||||
|
logging.Error("Required: Linux kernel 5.x or higher")
|
||||||
|
logging.Error("Current: %s (major version: %d)", kernelVersion, major)
|
||||||
|
logging.Error("Reason: NannyAgent requires modern kernel features:\n - Advanced eBPF capabilities\n - BTF (BPF Type Format) support\n - Enhanced security and stability")
|
||||||
os.Exit(1)
|
os.Exit(1)
|
||||||
}
|
}
|
||||||
|
|
||||||
// Check if kernel is 4.4 or higher
|
|
||||||
if major < 4 || (major == 4 && minor < 4) {
|
|
||||||
fmt.Fprintf(os.Stderr, "❌ ERROR: Kernel version %s is too old for eBPF.\n", kernelVersion)
|
|
||||||
fmt.Fprintf(os.Stderr, "Required: Linux kernel 4.4 or higher\n")
|
|
||||||
fmt.Fprintf(os.Stderr, "Current: %s\n", kernelVersion)
|
|
||||||
fmt.Fprintf(os.Stderr, "Reason: eBPF requires kernel features introduced in 4.4+:\n")
|
|
||||||
fmt.Fprintf(os.Stderr, " - BPF system call support\n")
|
|
||||||
fmt.Fprintf(os.Stderr, " - eBPF program types (kprobe, tracepoint)\n")
|
|
||||||
fmt.Fprintf(os.Stderr, " - BPF maps and helper functions\n")
|
|
||||||
os.Exit(1)
|
|
||||||
}
|
|
||||||
|
|
||||||
fmt.Printf("✅ Kernel version %s is compatible with eBPF\n", kernelVersion)
|
|
||||||
}
|
}
|
||||||
|
|
||||||
// checkEBPFSupport validates eBPF subsystem availability
|
// checkEBPFSupport validates eBPF subsystem availability
|
||||||
func checkEBPFSupport() {
|
func checkEBPFSupport() {
|
||||||
// Check if /sys/kernel/debug/tracing exists (debugfs mounted)
|
// Check if /sys/kernel/debug/tracing exists (debugfs mounted)
|
||||||
if _, err := os.Stat("/sys/kernel/debug/tracing"); os.IsNotExist(err) {
|
if _, err := os.Stat("/sys/kernel/debug/tracing"); os.IsNotExist(err) {
|
||||||
fmt.Fprintf(os.Stderr, "⚠️ WARNING: debugfs not mounted. Some eBPF features may not work.\n")
|
logging.Warning("debugfs not mounted. Some eBPF features may not work")
|
||||||
fmt.Fprintf(os.Stderr, "To fix: sudo mount -t debugfs debugfs /sys/kernel/debug\n")
|
logging.Info("To fix: sudo mount -t debugfs debugfs /sys/kernel/debug")
|
||||||
}
|
}
|
||||||
|
|
||||||
// Check if we can access BPF syscall
|
// Check if we can access BPF syscall
|
||||||
fd, _, errno := syscall.Syscall(321, 0, 0, 0) // BPF syscall number on x86_64
|
fd, _, errno := syscall.Syscall(321, 0, 0, 0) // BPF syscall number on x86_64
|
||||||
if errno != 0 && errno != syscall.EINVAL {
|
if errno != 0 && errno != syscall.EINVAL {
|
||||||
fmt.Fprintf(os.Stderr, "❌ ERROR: BPF syscall not available (errno: %v)\n", errno)
|
logging.Error("BPF syscall not available (errno: %v)", errno)
|
||||||
fmt.Fprintf(os.Stderr, "This may indicate:\n")
|
logging.Error("This may indicate:\n - Kernel compiled without BPF support\n - BPF syscall disabled in kernel config")
|
||||||
fmt.Fprintf(os.Stderr, " - Kernel compiled without BPF support\n")
|
|
||||||
fmt.Fprintf(os.Stderr, " - BPF syscall disabled in kernel config\n")
|
|
||||||
os.Exit(1)
|
os.Exit(1)
|
||||||
}
|
}
|
||||||
if fd > 0 {
|
if fd > 0 {
|
||||||
syscall.Close(int(fd))
|
syscall.Close(int(fd))
|
||||||
}
|
}
|
||||||
|
|
||||||
fmt.Printf("✅ eBPF syscall is available\n")
|
|
||||||
}
|
}
|
||||||
|
|
||||||
func main() {
|
// runInteractiveDiagnostics starts the interactive diagnostic session
|
||||||
fmt.Println("🔍 Linux eBPF-Enhanced Diagnostic Agent")
|
func runInteractiveDiagnostics(agent *LinuxDiagnosticAgent) {
|
||||||
fmt.Println("=======================================")
|
logging.Info("=== Linux eBPF-Enhanced Diagnostic Agent ===")
|
||||||
|
logging.Info("Linux Diagnostic Agent Started")
|
||||||
// Perform system compatibility checks
|
logging.Info("Enter a system issue description (or 'quit' to exit):")
|
||||||
fmt.Println("Performing system compatibility checks...")
|
|
||||||
|
|
||||||
checkRootPrivileges()
|
|
||||||
checkKernelVersionCompatibility()
|
|
||||||
checkEBPFSupport()
|
|
||||||
|
|
||||||
fmt.Println("✅ All system checks passed")
|
|
||||||
fmt.Println("")
|
|
||||||
|
|
||||||
// Initialize the agent
|
|
||||||
agent := NewLinuxDiagnosticAgent()
|
|
||||||
|
|
||||||
// Start the interactive session
|
|
||||||
fmt.Println("Linux Diagnostic Agent Started")
|
|
||||||
fmt.Println("Enter a system issue description (or 'quit' to exit):")
|
|
||||||
|
|
||||||
scanner := bufio.NewScanner(os.Stdin)
|
scanner := bufio.NewScanner(os.Stdin)
|
||||||
for {
|
for {
|
||||||
@@ -129,9 +148,9 @@ func main() {
|
|||||||
continue
|
continue
|
||||||
}
|
}
|
||||||
|
|
||||||
// Process the issue with eBPF capabilities
|
// Process the issue with AI capabilities via TensorZero
|
||||||
if err := agent.DiagnoseWithEBPF(input); err != nil {
|
if err := agent.DiagnoseIssue(input); err != nil {
|
||||||
fmt.Printf("Error: %v\n", err)
|
logging.Error("Diagnosis failed: %v", err)
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -139,5 +158,133 @@ func main() {
|
|||||||
log.Fatal(err)
|
log.Fatal(err)
|
||||||
}
|
}
|
||||||
|
|
||||||
fmt.Println("Goodbye!")
|
logging.Info("Goodbye!")
|
||||||
|
}
|
||||||
|
|
||||||
|
func main() {
|
||||||
|
// Define flags with both long and short versions
|
||||||
|
versionFlag := flag.Bool("version", false, "Show version information")
|
||||||
|
versionFlagShort := flag.Bool("v", false, "Show version information (short)")
|
||||||
|
helpFlag := flag.Bool("help", false, "Show help information")
|
||||||
|
helpFlagShort := flag.Bool("h", false, "Show help information (short)")
|
||||||
|
flag.Parse()
|
||||||
|
|
||||||
|
// Handle --version or -v flag (no root required)
|
||||||
|
if *versionFlag || *versionFlagShort {
|
||||||
|
showVersion()
|
||||||
|
}
|
||||||
|
|
||||||
|
// Handle --help or -h flag (no root required)
|
||||||
|
if *helpFlag || *helpFlagShort {
|
||||||
|
showHelp()
|
||||||
|
}
|
||||||
|
|
||||||
|
logging.Info("NannyAgent v%s starting...", Version)
|
||||||
|
|
||||||
|
// Perform system compatibility checks first
|
||||||
|
logging.Info("Performing system compatibility checks...")
|
||||||
|
checkRootPrivileges()
|
||||||
|
checkKernelVersionCompatibility()
|
||||||
|
checkEBPFSupport()
|
||||||
|
logging.Info("All system checks passed")
|
||||||
|
|
||||||
|
// Load configuration
|
||||||
|
cfg, err := config.LoadConfig()
|
||||||
|
if err != nil {
|
||||||
|
log.Fatalf("❌ Failed to load configuration: %v", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
cfg.PrintConfig()
|
||||||
|
|
||||||
|
// Initialize components
|
||||||
|
authManager := auth.NewAuthManager(cfg)
|
||||||
|
metricsCollector := metrics.NewCollector(Version)
|
||||||
|
|
||||||
|
// Ensure authentication
|
||||||
|
token, err := authManager.EnsureAuthenticated()
|
||||||
|
if err != nil {
|
||||||
|
log.Fatalf("❌ Authentication failed: %v", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
logging.Info("Authentication successful!")
|
||||||
|
|
||||||
|
// Initialize the diagnostic agent for interactive CLI use with authentication
|
||||||
|
agent := NewLinuxDiagnosticAgentWithAuth(authManager)
|
||||||
|
|
||||||
|
// Initialize a separate agent for WebSocket investigations using the application model
|
||||||
|
applicationAgent := NewLinuxDiagnosticAgent()
|
||||||
|
applicationAgent.model = "tensorzero::function_name::diagnose_and_heal_application"
|
||||||
|
|
||||||
|
// Start WebSocket client for backend communications and investigations
|
||||||
|
wsClient := websocket.NewWebSocketClient(applicationAgent, authManager)
|
||||||
|
go func() {
|
||||||
|
if err := wsClient.Start(); err != nil {
|
||||||
|
logging.Error("WebSocket client error: %v", err)
|
||||||
|
}
|
||||||
|
}()
|
||||||
|
|
||||||
|
// Start background metrics collection in a goroutine
|
||||||
|
go func() {
|
||||||
|
logging.Debug("Starting background metrics collection and heartbeat...")
|
||||||
|
|
||||||
|
ticker := time.NewTicker(time.Duration(cfg.MetricsInterval) * time.Second)
|
||||||
|
defer ticker.Stop()
|
||||||
|
|
||||||
|
// Send initial heartbeat
|
||||||
|
if err := sendHeartbeat(cfg, token, metricsCollector); err != nil {
|
||||||
|
logging.Warning("Initial heartbeat failed: %v", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Main heartbeat loop
|
||||||
|
for range ticker.C {
|
||||||
|
// Check if token needs refresh
|
||||||
|
if authManager.IsTokenExpired(token) {
|
||||||
|
logging.Debug("Token expiring soon, refreshing...")
|
||||||
|
newToken, refreshErr := authManager.EnsureAuthenticated()
|
||||||
|
if refreshErr != nil {
|
||||||
|
logging.Warning("Token refresh failed: %v", refreshErr)
|
||||||
|
continue
|
||||||
|
}
|
||||||
|
token = newToken
|
||||||
|
logging.Debug("Token refreshed successfully")
|
||||||
|
}
|
||||||
|
|
||||||
|
// Send heartbeat
|
||||||
|
if err := sendHeartbeat(cfg, token, metricsCollector); err != nil {
|
||||||
|
logging.Warning("Heartbeat failed: %v", err)
|
||||||
|
|
||||||
|
// If unauthorized, try to refresh token
|
||||||
|
if err.Error() == "unauthorized" {
|
||||||
|
logging.Debug("Unauthorized, attempting token refresh...")
|
||||||
|
newToken, refreshErr := authManager.EnsureAuthenticated()
|
||||||
|
if refreshErr != nil {
|
||||||
|
logging.Warning("Token refresh failed: %v", refreshErr)
|
||||||
|
continue
|
||||||
|
}
|
||||||
|
token = newToken
|
||||||
|
|
||||||
|
// Retry heartbeat with new token (silently)
|
||||||
|
if retryErr := sendHeartbeat(cfg, token, metricsCollector); retryErr != nil {
|
||||||
|
logging.Warning("Retry heartbeat failed: %v", retryErr)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
// No logging for successful heartbeats - they should be silent
|
||||||
|
}
|
||||||
|
}()
|
||||||
|
|
||||||
|
// Start the interactive diagnostic session (blocking)
|
||||||
|
runInteractiveDiagnostics(agent)
|
||||||
|
}
|
||||||
|
|
||||||
|
// sendHeartbeat collects metrics and sends heartbeat to the server
|
||||||
|
func sendHeartbeat(cfg *config.Config, token *types.AuthToken, collector *metrics.Collector) error {
|
||||||
|
// Collect system metrics
|
||||||
|
systemMetrics, err := collector.GatherSystemMetrics()
|
||||||
|
if err != nil {
|
||||||
|
return fmt.Errorf("failed to gather system metrics: %w", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Send metrics using the collector with correct agent_id from token
|
||||||
|
return collector.SendMetrics(cfg.AgentAuthURL, token.AccessToken, token.AgentID, systemMetrics)
|
||||||
}
|
}
|
||||||
|
|||||||
Reference in New Issue
Block a user