diff --git a/Dockerfile b/Dockerfile new file mode 100644 index 0000000..473a0f4 diff --git a/Makefile b/Makefile new file mode 100644 index 0000000..d7b060e --- /dev/null +++ b/Makefile @@ -0,0 +1,53 @@ +.PHONY: build run clean test install + +# Build the application +build: + go build -o nanny-agent . + +# Run the application +run: build + ./nanny-agent + +# Clean build artifacts +clean: + rm -f nanny-agent + +# Run tests +test: + go test ./... + +# Install dependencies +install: + go mod tidy + go mod download + +# Build for production with optimizations +build-prod: + CGO_ENABLED=0 GOOS=linux go build -a -installsuffix cgo -ldflags '-w -s' -o nanny-agent . + +# Install system-wide (requires sudo) +install-system: build-prod + sudo cp nanny-agent /usr/local/bin/ + sudo chmod +x /usr/local/bin/nanny-agent + +# Format code +fmt: + go fmt ./... + +# Run linter (if golangci-lint is installed) +lint: + golangci-lint run + +# Show help +help: + @echo "Available commands:" + @echo " build - Build the application" + @echo " run - Build and run the application" + @echo " clean - Clean build artifacts" + @echo " test - Run tests" + @echo " install - Install dependencies" + @echo " build-prod - Build for production" + @echo " install-system- Install system-wide (requires sudo)" + @echo " fmt - Format code" + @echo " lint - Run linter" + @echo " help - Show this help" diff --git a/README.md b/README.md index ffaf065..7f48ec1 100644 --- a/README.md +++ b/README.md @@ -1,3 +1,199 @@ -# nannyagent +# Linux Diagnostic Agent -nannyagent is a Linux AI diagnostic agent built on OpenAPI specifications relying on Tensorzero gateway \ No newline at end of file +A Go-based AI agent that diagnoses Linux system issues using the NannyAPI gateway with OpenAI-compatible SDK. + +## Features + +- Interactive command-line interface for submitting system issues +- **Automatic system information gathering** - Includes OS, kernel, CPU, memory, network info +- Integrates with NannyAPI using OpenAI-compatible Go SDK +- Executes diagnostic commands safely and collects output +- Provides step-by-step resolution plans +- **Comprehensive integration tests** with realistic Linux problem scenarios + +## Setup + +1. Clone this repository +2. Copy `.env.example` to `.env` and configure your NannyAPI endpoint: + ```bash + cp .env.example .env + ``` +3. Install dependencies: + ```bash + go mod tidy + ``` +4. Build and run: + ```bash + make build + ./nanny-agent + ``` + +## Configuration + +The agent can be configured using environment variables: + +- `NANNYAPI_ENDPOINT`: The NannyAPI endpoint (default: `http://nannyapi.local:3000/openai/v1`) +- `NANNYAPI_MODEL`: The model identifier (default: `nannyapi::function_name::diagnose_and_heal`) + +## Installation on Linux VM + +### Direct Installation + +1. **Install Go** (if not already installed): + ```bash + # For Ubuntu/Debian + sudo apt update + sudo apt install golang-go + + # For RHEL/CentOS/Fedora + sudo dnf install golang + # or + sudo yum install golang + ``` + +2. **Clone and build the agent**: + ```bash + git clone + cd nannyagentv2 + go mod tidy + make build + ``` + +3. **Install as system service** (optional): + ```bash + sudo cp nanny-agent /usr/local/bin/ + sudo chmod +x /usr/local/bin/nanny-agent + ``` + +4. **Set environment variables**: + ```bash + export NANNYAPI_ENDPOINT="http://your-nannyapi-endpoint:3000/openai/v1" + export NANNYAPI_MODEL="your-model-identifier" + ``` + +## Usage + +1. Start the agent: + ```bash + ./nanny-agent + ``` + +2. Enter a system issue description when prompted: + ``` + > On /var filesystem I cannot create any file but df -h shows 30% free space available. + ``` + +3. The agent will: + - Send the issue to the AI via NannyAPI using OpenAI SDK + - Execute diagnostic commands as suggested by the AI + - Provide command outputs back to the AI + - Display the final diagnosis and resolution plan + +4. Type `quit` or `exit` to stop the agent + +## How It Works + +1. **System Information Gathering**: Agent automatically collects system details (OS, kernel, CPU, memory, network, etc.) +2. **Initial Issue**: User describes a Linux system problem +3. **Enhanced Prompt**: AI receives both the issue description and comprehensive system information +4. **Diagnostic Phase**: AI responds with diagnostic commands to run +5. **Command Execution**: Agent safely executes read-only commands +6. **Iterative Analysis**: AI analyzes command outputs and may request more commands +7. **Resolution Phase**: AI provides root cause analysis and step-by-step resolution plan + +## Testing & Integration Tests + +The agent includes comprehensive integration tests that simulate realistic Linux problems: + +### Available Test Scenarios: +1. **Disk Space Issues** - Inode exhaustion scenarios +2. **Memory Problems** - OOM killer and memory pressure +3. **Network Issues** - DNS resolution problems +4. **Performance Issues** - High load averages and I/O bottlenecks +5. **Web Server Problems** - Permission and configuration issues +6. **Hardware/Boot Issues** - Kernel module and device problems +7. **Database Performance** - Slow queries and I/O contention +8. **Service Failures** - Startup and configuration problems + +### Run Integration Tests: +```bash +# Interactive test scenarios +./test-examples.sh + +# Automated integration tests +./integration-tests.sh + +# Function discovery (find valid NannyAPI functions) +./discover-functions.sh +``` + +## Safety + +- Only read-only commands are executed automatically +- Commands that modify the system (rm, mv, dd, redirection) are blocked by validation +- The resolution plan is provided for manual execution by the operator +- All commands have execution timeouts to prevent hanging + +## API Integration + +The agent uses the `github.com/sashabaranov/go-openai` SDK to communicate with NannyAPI's OpenAI-compatible API endpoint. This provides: + +- Robust HTTP client with retries and timeouts +- Structured request/response handling +- Automatic JSON marshaling/unmarshaling +- Error handling and validation + +## Example Session + +``` +Linux Diagnostic Agent Started +Enter a system issue description (or 'quit' to exit): +> Cannot create files in /var but df shows space available + +Diagnosing issue: Cannot create files in /var but df shows space available +Gathering system information... + +AI Response: +{ + "response_type": "diagnostic", + "reasoning": "The 'No space left on device' error despite available disk space suggests inode exhaustion...", + "commands": [ + {"id": "check_inodes", "command": "df -i /var", "description": "Check inode usage..."} + ] +} + +Executing command 'check_inodes': df -i /var +Output: +Filesystem Inodes IUsed IFree IUse% Mounted on +/dev/sda1 1000000 999999 1 100% /var + +=== DIAGNOSIS COMPLETE === +Root Cause: The /var filesystem has exhausted all available inodes +Resolution Plan: 1. Find and remove unnecessary files... +Confidence: High +``` + +Note: The AI receives comprehensive system information including: +- Hostname, OS version, kernel version +- CPU cores, memory, system uptime +- Network interfaces and private IPs +- Current load average and disk usage + +## Available Make Commands + +- `make build` - Build the application +- `make run` - Build and run the application +- `make clean` - Clean build artifacts +- `make test` - Run unit tests +- `make install` - Install dependencies +- `make build-prod` - Build for production +- `make install-system` - Install system-wide (requires sudo) +- `make fmt` - Format code +- `make help` - Show available commands + +## Testing Commands + +- `./test-examples.sh` - Show interactive test scenarios +- `./integration-tests.sh` - Run automated integration tests +- `./discover-functions.sh` - Find available NannyAPI functions +- `./install.sh` - Installation script for Linux VMs diff --git a/agent.go b/agent.go new file mode 100644 index 0000000..a1f0088 --- /dev/null +++ b/agent.go @@ -0,0 +1,270 @@ +package main + +import ( + "bytes" + "context" + "encoding/json" + "fmt" + "io" + "net/http" + "os" + "time" + + "github.com/sashabaranov/go-openai" +) + +// DiagnosticResponse represents the diagnostic phase response from AI +type DiagnosticResponse struct { + ResponseType string `json:"response_type"` + Reasoning string `json:"reasoning"` + Commands []Command `json:"commands"` +} + +// ResolutionResponse represents the resolution phase response from AI +type ResolutionResponse struct { + ResponseType string `json:"response_type"` + RootCause string `json:"root_cause"` + ResolutionPlan string `json:"resolution_plan"` + Confidence string `json:"confidence"` +} + +// Command represents a command to be executed +type Command struct { + ID string `json:"id"` + Command string `json:"command"` + Description string `json:"description"` +} + +// CommandResult represents the result of executing a command +type CommandResult struct { + ID string `json:"id"` + Command string `json:"command"` + Output string `json:"output"` + ExitCode int `json:"exit_code"` + Error string `json:"error,omitempty"` +} + +// LinuxDiagnosticAgent represents the main agent +type LinuxDiagnosticAgent struct { + client *openai.Client + model string + executor *CommandExecutor + episodeID string // TensorZero episode ID for conversation continuity +} + +// NewLinuxDiagnosticAgent creates a new diagnostic agent +func NewLinuxDiagnosticAgent() *LinuxDiagnosticAgent { + endpoint := os.Getenv("NANNYAPI_ENDPOINT") + if endpoint == "" { + // Default endpoint - OpenAI SDK will append /chat/completions automatically + endpoint = "http://nannyapi.local:3000/openai/v1" + } + + model := os.Getenv("NANNYAPI_MODEL") + if model == "" { + model = "nannyapi::function_name::diagnose_and_heal" + fmt.Printf("Warning: Using default model '%s'. Set NANNYAPI_MODEL environment variable for your specific function.\n", model) + } + + // Create OpenAI client with custom base URL + // Note: The OpenAI SDK automatically appends "/chat/completions" to the base URL + config := openai.DefaultConfig("") + config.BaseURL = endpoint + client := openai.NewClientWithConfig(config) + + return &LinuxDiagnosticAgent{ + client: client, + model: model, + executor: NewCommandExecutor(10 * time.Second), // 10 second timeout for commands + } +} + +// DiagnoseIssue starts the diagnostic process for a given issue +func (a *LinuxDiagnosticAgent) DiagnoseIssue(issue string) error { + fmt.Printf("Diagnosing issue: %s\n", issue) + fmt.Println("Gathering system information...") + + // Gather system information + systemInfo := GatherSystemInfo() + + // Format the initial prompt with system information + initialPrompt := FormatSystemInfoForPrompt(systemInfo) + "\n" + issue + + // Start conversation with initial issue including system info + messages := []openai.ChatCompletionMessage{ + { + Role: openai.ChatMessageRoleUser, + Content: initialPrompt, + }, + } + + for { + // Send request to TensorZero API via OpenAI SDK + response, err := a.sendRequest(messages) + if err != nil { + return fmt.Errorf("failed to send request: %w", err) + } + + if len(response.Choices) == 0 { + return fmt.Errorf("no choices in response") + } + + content := response.Choices[0].Message.Content + fmt.Printf("\nAI Response:\n%s\n", content) + + // Parse the response to determine next action + var diagnosticResp DiagnosticResponse + var resolutionResp ResolutionResponse + + // Try to parse as diagnostic response first + if err := json.Unmarshal([]byte(content), &diagnosticResp); err == nil && diagnosticResp.ResponseType == "diagnostic" { + // Handle diagnostic phase + fmt.Printf("\nReasoning: %s\n", diagnosticResp.Reasoning) + + if len(diagnosticResp.Commands) == 0 { + fmt.Println("No commands to execute in diagnostic phase") + break + } + + // Execute commands and collect results + commandResults := make([]CommandResult, 0, len(diagnosticResp.Commands)) + for _, cmd := range diagnosticResp.Commands { + fmt.Printf("\nExecuting command '%s': %s\n", cmd.ID, cmd.Command) + result := a.executor.Execute(cmd) + commandResults = append(commandResults, result) + + fmt.Printf("Output:\n%s\n", result.Output) + if result.Error != "" { + fmt.Printf("Error: %s\n", result.Error) + } + } + + // Prepare command results as user message + resultsJSON, err := json.MarshalIndent(commandResults, "", " ") + if err != nil { + return fmt.Errorf("failed to marshal command results: %w", err) + } + + // Add AI response and command results to conversation + messages = append(messages, openai.ChatCompletionMessage{ + Role: openai.ChatMessageRoleAssistant, + Content: content, + }) + messages = append(messages, openai.ChatCompletionMessage{ + Role: openai.ChatMessageRoleUser, + Content: string(resultsJSON), + }) + + continue + } + + // Try to parse as resolution response + if err := json.Unmarshal([]byte(content), &resolutionResp); err == nil && resolutionResp.ResponseType == "resolution" { + // Handle resolution phase + fmt.Printf("\n=== DIAGNOSIS COMPLETE ===\n") + fmt.Printf("Root Cause: %s\n", resolutionResp.RootCause) + fmt.Printf("Resolution Plan: %s\n", resolutionResp.ResolutionPlan) + fmt.Printf("Confidence: %s\n", resolutionResp.Confidence) + break + } + + // If we can't parse the response, treat it as an error or unexpected format + fmt.Printf("Unexpected response format or error from AI:\n%s\n", content) + break + } + + return nil +} + +// TensorZeroRequest represents a request structure compatible with TensorZero's episode_id +type TensorZeroRequest struct { + Model string `json:"model"` + Messages []openai.ChatCompletionMessage `json:"messages"` + EpisodeID string `json:"tensorzero::episode_id,omitempty"` +} + +// TensorZeroResponse represents TensorZero's response with episode_id +type TensorZeroResponse struct { + openai.ChatCompletionResponse + EpisodeID string `json:"episode_id"` +} + +// sendRequest sends a request to the TensorZero API with tensorzero::episode_id support +func (a *LinuxDiagnosticAgent) sendRequest(messages []openai.ChatCompletionMessage) (*openai.ChatCompletionResponse, error) { + ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second) + defer cancel() + + // Create TensorZero-compatible request + tzRequest := TensorZeroRequest{ + Model: a.model, + Messages: messages, + } + + // Include tensorzero::episode_id for conversation continuity (if we have one) + if a.episodeID != "" { + tzRequest.EpisodeID = a.episodeID + } + + fmt.Printf("Debug: Sending request to model: %s", a.model) + if a.episodeID != "" { + fmt.Printf(" (episode: %s)", a.episodeID) + } + fmt.Println() + + // Marshal the request + requestBody, err := json.Marshal(tzRequest) + if err != nil { + return nil, fmt.Errorf("failed to marshal request: %w", err) + } + + // Create HTTP request + endpoint := os.Getenv("NANNYAPI_ENDPOINT") + if endpoint == "" { + endpoint = "http://nannyapi.local:3000/openai/v1" + } + + // Ensure the endpoint ends with /chat/completions + if endpoint[len(endpoint)-1] != '/' { + endpoint += "/" + } + endpoint += "chat/completions" + + req, err := http.NewRequestWithContext(ctx, "POST", endpoint, bytes.NewBuffer(requestBody)) + if err != nil { + return nil, fmt.Errorf("failed to create request: %w", err) + } + + req.Header.Set("Content-Type", "application/json") + + // Make the request + client := &http.Client{Timeout: 30 * time.Second} + resp, err := client.Do(req) + if err != nil { + return nil, fmt.Errorf("failed to send request: %w", err) + } + defer resp.Body.Close() + + // Read response body + body, err := io.ReadAll(resp.Body) + if err != nil { + return nil, fmt.Errorf("failed to read response: %w", err) + } + + if resp.StatusCode != http.StatusOK { + return nil, fmt.Errorf("API request failed with status %d: %s", resp.StatusCode, string(body)) + } + + // Parse TensorZero response + var tzResponse TensorZeroResponse + if err := json.Unmarshal(body, &tzResponse); err != nil { + return nil, fmt.Errorf("failed to unmarshal response: %w", err) + } + + // Extract episode_id from first response + if a.episodeID == "" && tzResponse.EpisodeID != "" { + a.episodeID = tzResponse.EpisodeID + fmt.Printf("Debug: Extracted episode ID: %s\n", a.episodeID) + } + + return &tzResponse.ChatCompletionResponse, nil +} diff --git a/agent_test.go b/agent_test.go new file mode 100644 index 0000000..b06e14e --- /dev/null +++ b/agent_test.go @@ -0,0 +1,107 @@ +package main + +import ( + "testing" + "time" +) + +func TestCommandExecutor_ValidateCommand(t *testing.T) { + executor := NewCommandExecutor(5 * time.Second) + + tests := []struct { + name string + command string + wantErr bool + }{ + { + name: "safe command - ls", + command: "ls -la /var", + wantErr: false, + }, + { + name: "safe command - df", + command: "df -h", + wantErr: false, + }, + { + name: "safe command - ps", + command: "ps aux | grep nginx", + wantErr: false, + }, + { + name: "dangerous command - rm", + command: "rm -rf /tmp/*", + wantErr: true, + }, + { + name: "dangerous command - dd", + command: "dd if=/dev/zero of=/dev/sda", + wantErr: true, + }, + { + name: "dangerous command - sudo", + command: "sudo systemctl stop nginx", + wantErr: true, + }, + { + name: "dangerous command - redirection", + command: "echo 'test' > /etc/passwd", + wantErr: true, + }, + } + + for _, tt := range tests { + t.Run(tt.name, func(t *testing.T) { + err := executor.validateCommand(tt.command) + if (err != nil) != tt.wantErr { + t.Errorf("validateCommand() error = %v, wantErr %v", err, tt.wantErr) + } + }) + } +} + +func TestCommandExecutor_Execute(t *testing.T) { + executor := NewCommandExecutor(5 * time.Second) + + // Test safe command execution + cmd := Command{ + ID: "test_echo", + Command: "echo 'Hello, World!'", + Description: "Test echo command", + } + + result := executor.Execute(cmd) + + if result.ExitCode != 0 { + t.Errorf("Expected exit code 0, got %d", result.ExitCode) + } + + if result.Output != "Hello, World!\n" { + t.Errorf("Expected 'Hello, World!\\n', got '%s'", result.Output) + } + + if result.Error != "" { + t.Errorf("Expected no error, got '%s'", result.Error) + } +} + +func TestCommandExecutor_ExecuteUnsafeCommand(t *testing.T) { + executor := NewCommandExecutor(5 * time.Second) + + // Test unsafe command rejection + cmd := Command{ + ID: "test_rm", + Command: "rm -rf /tmp/test", + Description: "Dangerous rm command", + } + + result := executor.Execute(cmd) + + if result.ExitCode != 1 { + t.Errorf("Expected exit code 1 for unsafe command, got %d", result.ExitCode) + } + + if result.Error == "" { + t.Error("Expected error for unsafe command, got none") + } +} diff --git a/discover-functions.sh b/discover-functions.sh new file mode 100755 index 0000000..0b9a0d2 --- /dev/null +++ b/discover-functions.sh @@ -0,0 +1,51 @@ +#!/bin/bash + +# NannyAPI Function Discovery Script +# This script helps you find the correct function name for your NannyAPI setup + +echo "๐Ÿ” NannyAPI Function Discovery" +echo "==============================" +echo "" + +ENDPOINT="${NANNYAPI_ENDPOINT:-http://nannyapi.local:3000/openai/v1}" + +echo "Testing endpoint: $ENDPOINT/chat/completions" +echo "" + +# Test common function name patterns +test_functions=( + "nannyapi::function_name::diagnose" + "nannyapi::function_name::diagnose_and_heal" + "nannyapi::function_name::linux_diagnostic" + "nannyapi::function_name::system_diagnostic" + "nannyapi::model_name::gpt-4" + "nannyapi::model_name::claude" +) + +for func in "${test_functions[@]}"; do + echo "Testing function: $func" + + response=$(curl -s -X POST "$ENDPOINT/chat/completions" \ + -H "Content-Type: application/json" \ + -d "{\"model\":\"$func\",\"messages\":[{\"role\":\"user\",\"content\":\"test\"}]}") + + if echo "$response" | grep -q "Unknown function"; then + echo " โŒ Function not found" + elif echo "$response" | grep -q "error"; then + echo " โš ๏ธ Error: $(echo "$response" | jq -r '.error' 2>/dev/null || echo "$response")" + else + echo " โœ… Function exists and responding!" + echo " Use this in your environment: export NANNYAPI_MODEL=\"$func\"" + fi + echo "" +done + +echo "๐Ÿ’ก If none of the above work, check your NannyAPI configuration file" +echo " for the correct function names and update NANNYAPI_MODEL accordingly." +echo "" +echo "Example NannyAPI config snippet:" +echo "```yaml" +echo "functions:" +echo " diagnose_and_heal: # This becomes 'nannyapi::function_name::diagnose_and_heal'" +echo " # function definition" +echo "```" diff --git a/executor.go b/executor.go new file mode 100644 index 0000000..199299f --- /dev/null +++ b/executor.go @@ -0,0 +1,108 @@ +package main + +import ( + "context" + "fmt" + "os/exec" + "strings" + "time" +) + +// CommandExecutor handles safe execution of diagnostic commands +type CommandExecutor struct { + timeout time.Duration +} + +// NewCommandExecutor creates a new command executor with specified timeout +func NewCommandExecutor(timeout time.Duration) *CommandExecutor { + return &CommandExecutor{ + timeout: timeout, + } +} + +// Execute executes a command safely with timeout and validation +func (ce *CommandExecutor) Execute(cmd Command) CommandResult { + result := CommandResult{ + ID: cmd.ID, + Command: cmd.Command, + } + + // Validate command safety + if err := ce.validateCommand(cmd.Command); err != nil { + result.Error = fmt.Sprintf("unsafe command: %s", err.Error()) + result.ExitCode = 1 + return result + } + + // Create context with timeout + ctx, cancel := context.WithTimeout(context.Background(), ce.timeout) + defer cancel() + + // Execute command using shell for proper handling of pipes, redirects, etc. + execCmd := exec.CommandContext(ctx, "/bin/bash", "-c", cmd.Command) + + output, err := execCmd.CombinedOutput() + result.Output = string(output) + + if err != nil { + result.Error = err.Error() + if exitError, ok := err.(*exec.ExitError); ok { + result.ExitCode = exitError.ExitCode() + } else { + result.ExitCode = 1 + } + } else { + result.ExitCode = 0 + } + + return result +} + +// validateCommand checks if a command is safe to execute +func (ce *CommandExecutor) validateCommand(command string) error { + // Convert to lowercase for case-insensitive checking + cmd := strings.ToLower(strings.TrimSpace(command)) + + // List of dangerous commands/patterns + dangerousPatterns := []string{ + "rm ", "rm\t", "rm\n", + "mv ", "mv\t", "mv\n", + "dd ", "dd\t", "dd\n", + "mkfs", "fdisk", "parted", + "shutdown", "reboot", "halt", "poweroff", + "passwd", "userdel", "usermod", + "chmod", "chown", "chgrp", + "systemctl stop", "systemctl disable", "systemctl mask", + "service stop", "service disable", + "kill ", "killall", "pkill", + "crontab -r", "crontab -e", + "iptables -F", "iptables -D", "iptables -I", + "umount ", "unmount ", // Allow mount but not umount + "wget ", "curl ", // Prevent network operations + "| dd", "| rm", "| mv", // Prevent piping to dangerous commands + } + + // Check for dangerous patterns + for _, pattern := range dangerousPatterns { + if strings.Contains(cmd, pattern) { + return fmt.Errorf("command contains dangerous pattern: %s", pattern) + } + } + + // Additional checks for commands that start with dangerous operations + if strings.HasPrefix(cmd, "rm ") || strings.HasPrefix(cmd, "rm\t") { + return fmt.Errorf("rm command not allowed") + } + + // Check for sudo usage (we want to avoid automated sudo commands) + if strings.HasPrefix(cmd, "sudo ") { + return fmt.Errorf("sudo commands not allowed for automated execution") + } + + // Check for dangerous redirections (but allow safe ones like 2>/dev/null) + if strings.Contains(cmd, ">") && !strings.Contains(cmd, "2>/dev/null") && !strings.Contains(cmd, ">/dev/null") { + return fmt.Errorf("file redirection not allowed except to /dev/null") + } + + return nil +} diff --git a/go.mod b/go.mod new file mode 100644 index 0000000..c568009 --- /dev/null +++ b/go.mod @@ -0,0 +1,5 @@ +module nannyagentv2 + +go 1.23 + +require github.com/sashabaranov/go-openai v1.32.0 diff --git a/go.sum b/go.sum new file mode 100644 index 0000000..aa58dba --- /dev/null +++ b/go.sum @@ -0,0 +1,2 @@ +github.com/sashabaranov/go-openai v1.32.0 h1:Yk3iE9moX3RBXxrof3OBtUBrE7qZR0zF9ebsoO4zVzI= +github.com/sashabaranov/go-openai v1.32.0/go.mod h1:lj5b/K+zjTSFxVLijLSTDZuP7adOgerWeFyZLUhAKRg= diff --git a/install.sh b/install.sh new file mode 100755 index 0000000..c51649b --- /dev/null +++ b/install.sh @@ -0,0 +1,85 @@ +#!/bin/bash + +# Linux Diagnostic Agent Installation Script +# This script installs the nanny-agent on a Linux system + +set -e + +echo "๐Ÿ”ง Linux Diagnostic Agent Installation Script" +echo "==============================================" + +# Check if Go is installed +if ! command -v go &> /dev/null; then + echo "โŒ Go is not installed. Please install Go first:" + echo "" + echo "For Ubuntu/Debian:" + echo " sudo apt update && sudo apt install golang-go" + echo "" + echo "For RHEL/CentOS/Fedora:" + echo " sudo dnf install golang" + echo " # or" + echo " sudo yum install golang" + echo "" + exit 1 +fi + +echo "โœ… Go is installed: $(go version)" + +# Build the application +echo "๐Ÿ”จ Building the application..." +go mod tidy +make build + +# Check if build was successful +if [ ! -f "./nanny-agent" ]; then + echo "โŒ Build failed! nanny-agent binary not found." + exit 1 +fi + +echo "โœ… Build successful!" + +# Ask for installation preference +echo "" +echo "Installation options:" +echo "1. Install system-wide (/usr/local/bin) - requires sudo" +echo "2. Keep in current directory" +echo "" +read -p "Choose option (1 or 2): " choice + +case $choice in + 1) + echo "๐Ÿ“ฆ Installing system-wide..." + sudo cp nanny-agent /usr/local/bin/ + sudo chmod +x /usr/local/bin/nanny-agent + echo "โœ… Agent installed to /usr/local/bin/nanny-agent" + echo "" + echo "You can now run the agent from anywhere with:" + echo " nanny-agent" + ;; + 2) + echo "โœ… Agent ready in current directory" + echo "" + echo "Run the agent with:" + echo " ./nanny-agent" + ;; + *) + echo "โŒ Invalid choice. Agent is available in current directory." + echo "Run with: ./nanny-agent" + ;; +esac + +# Configuration +echo "" +echo "๐Ÿ“ Configuration:" +echo "Set these environment variables to configure the agent:" +echo "" +echo "export NANNYAPI_ENDPOINT=\"http://your-nannyapi-host:3000/openai/v1\"" +echo "export NANNYAPI_MODEL=\"your-model-identifier\"" +echo "" +echo "Or create a .env file in the working directory." +echo "" +echo "๐ŸŽ‰ Installation complete!" +echo "" +echo "Example usage:" +echo " ./nanny-agent" +echo " > On /var filesystem I cannot create any file but df -h shows 30% free space available." diff --git a/integration-tests.sh b/integration-tests.sh new file mode 100755 index 0000000..507d588 --- /dev/null +++ b/integration-tests.sh @@ -0,0 +1,116 @@ +#!/bin/bash + +# Linux Diagnostic Agent - Integration Tests +# This script creates realistic Linux problem scenarios for testing + +set -e + +AGENT_BINARY="./nanny-agent" +TEST_DIR="/tmp/nanny-agent-tests" +TEST_LOG="$TEST_DIR/integration_test.log" + +# Color codes for output +RED='\033[0;31m' +GREEN='\033[0;32m' +YELLOW='\033[1;33m' +BLUE='\033[0;34m' +NC='\033[0m' # No Color + +# Ensure test directory exists +mkdir -p "$TEST_DIR" + +echo -e "${BLUE}๐Ÿงช Linux Diagnostic Agent - Integration Tests${NC}" +echo "=================================================" +echo "" + +# Check if agent binary exists +if [[ ! -f "$AGENT_BINARY" ]]; then + echo -e "${RED}โŒ Agent binary not found at $AGENT_BINARY${NC}" + echo "Please run: make build" + exit 1 +fi + +# Function to run a test scenario +run_test() { + local test_name="$1" + local scenario="$2" + local expected_keywords="$3" + + echo -e "${YELLOW}๐Ÿ“‹ Test: $test_name${NC}" + echo "Scenario: $scenario" + echo "" + + # Run the agent with the scenario + echo "$scenario" | timeout 120s "$AGENT_BINARY" > "$TEST_LOG" 2>&1 || true + + # Check if any expected keywords are found in the output + local found_keywords=0 + IFS=',' read -ra KEYWORDS <<< "$expected_keywords" + for keyword in "${KEYWORDS[@]}"; do + keyword=$(echo "$keyword" | xargs) # trim whitespace + if grep -qi "$keyword" "$TEST_LOG"; then + echo -e "${GREEN} โœ… Found expected keyword: $keyword${NC}" + ((found_keywords++)) + else + echo -e "${RED} โŒ Missing keyword: $keyword${NC}" + fi + done + + # Show summary + if [[ $found_keywords -gt 0 ]]; then + echo -e "${GREEN} โœ… Test PASSED ($found_keywords keywords found)${NC}" + else + echo -e "${RED} โŒ Test FAILED (no expected keywords found)${NC}" + fi + + echo "" + echo "Full output saved to: $TEST_LOG" + echo "----------------------------------------" + echo "" +} + +# Test Scenario 1: Disk Space Issues (Inode Exhaustion) +run_test "Disk Space - Inode Exhaustion" \ + "I cannot create new files in /home directory even though df -h shows plenty of space available. Getting 'No space left on device' error when trying to touch new files." \ + "inode,df -i,filesystem,inodes,exhausted" + +# Test Scenario 2: Memory Issues +run_test "Memory Issues - OOM Killer" \ + "My applications keep getting killed randomly and I see 'killed' messages in logs. The system becomes unresponsive for a few seconds before recovering. This happens especially when running memory-intensive tasks." \ + "memory,oom,killed,dmesg,free,swap" + +# Test Scenario 3: Network Connectivity Issues +run_test "Network Connectivity - DNS Resolution" \ + "I can ping IP addresses directly (like 8.8.8.8) but cannot resolve domain names. Web browsing fails with DNS resolution errors, but ping 8.8.8.8 works fine." \ + "dns,resolv.conf,nslookup,nameserver,dig" + +# Test Scenario 4: Service/Process Issues +run_test "Service Issues - High Load" \ + "System load average is consistently above 10.0 even when CPU usage appears normal. Applications are responding slowly and I notice high wait times. The server feels sluggish overall." \ + "load,average,cpu,iostat,vmstat,processes" + +# Test Scenario 5: File System Issues +run_test "Filesystem Issues - Permission Problems" \ + "Web server returns 403 Forbidden errors for all pages. Files exist and seem readable, but nginx logs show permission denied errors. SELinux is disabled and file permissions look correct." \ + "permission,403,nginx,chmod,chown,selinux" + +# Test Scenario 6: Boot/System Issues +run_test "Boot Issues - Kernel Module" \ + "System boots but some hardware devices are not working. Network interface shows as down, USB devices are not recognized, and dmesg shows module loading failures." \ + "module,lsmod,dmesg,hardware,interface,usb" + +# Test Scenario 7: Performance Issues +run_test "Performance Issues - I/O Bottleneck" \ + "Database queries are extremely slow, taking 30+ seconds for simple SELECT statements. Disk activity LED is constantly on and system feels unresponsive during database operations." \ + "iostat,iotop,disk,database,slow,performance" + +echo -e "${BLUE}๐Ÿ Integration Tests Complete${NC}" +echo "" +echo "Check individual test logs in: $TEST_DIR" +echo "" +echo -e "${YELLOW}๐Ÿ’ก Tips:${NC}" +echo "- Tests use realistic scenarios that could occur on production systems" +echo "- Each test expects the AI to suggest relevant diagnostic commands" +echo "- Review the full logs to see the complete diagnostic conversation" +echo "- Tests timeout after 120 seconds to prevent hanging" +echo "- Make sure NANNYAPI_ENDPOINT and NANNYAPI_MODEL are set correctly" diff --git a/main.go b/main.go new file mode 100644 index 0000000..26b0715 --- /dev/null +++ b/main.go @@ -0,0 +1,46 @@ +package main + +import ( + "bufio" + "fmt" + "log" + "os" + "strings" +) + +func main() { + // Initialize the agent + agent := NewLinuxDiagnosticAgent() + + // Start the interactive session + fmt.Println("Linux Diagnostic Agent Started") + fmt.Println("Enter a system issue description (or 'quit' to exit):") + + scanner := bufio.NewScanner(os.Stdin) + for { + fmt.Print("> ") + if !scanner.Scan() { + break + } + + input := strings.TrimSpace(scanner.Text()) + if input == "quit" || input == "exit" { + break + } + + if input == "" { + continue + } + + // Process the issue + if err := agent.DiagnoseIssue(input); err != nil { + fmt.Printf("Error: %v\n", err) + } + } + + if err := scanner.Err(); err != nil { + log.Fatal(err) + } + + fmt.Println("Goodbye!") +} diff --git a/system_info.go b/system_info.go new file mode 100644 index 0000000..9328a26 --- /dev/null +++ b/system_info.go @@ -0,0 +1,154 @@ +package main + +import ( + "fmt" + "net" + "runtime" + "strings" + "time" +) + +// SystemInfo represents basic system information +type SystemInfo struct { + Hostname string `json:"hostname"` + OS string `json:"os"` + Kernel string `json:"kernel"` + Architecture string `json:"architecture"` + CPUCores string `json:"cpu_cores"` + Memory string `json:"memory"` + Uptime string `json:"uptime"` + PrivateIPs string `json:"private_ips"` + LoadAverage string `json:"load_average"` + DiskUsage string `json:"disk_usage"` +} + +// GatherSystemInfo collects basic system information +func GatherSystemInfo() *SystemInfo { + info := &SystemInfo{} + executor := NewCommandExecutor(5 * time.Second) + + // Basic system info + if result := executor.Execute(Command{ID: "hostname", Command: "hostname"}); result.ExitCode == 0 { + info.Hostname = strings.TrimSpace(result.Output) + } + + if result := executor.Execute(Command{ID: "os", Command: "lsb_release -d 2>/dev/null | cut -f2 || cat /etc/os-release | grep PRETTY_NAME | cut -d'=' -f2 | tr -d '\"'"}); result.ExitCode == 0 { + info.OS = strings.TrimSpace(result.Output) + } + + if result := executor.Execute(Command{ID: "kernel", Command: "uname -r"}); result.ExitCode == 0 { + info.Kernel = strings.TrimSpace(result.Output) + } + + if result := executor.Execute(Command{ID: "arch", Command: "uname -m"}); result.ExitCode == 0 { + info.Architecture = strings.TrimSpace(result.Output) + } + + if result := executor.Execute(Command{ID: "cores", Command: "nproc"}); result.ExitCode == 0 { + info.CPUCores = strings.TrimSpace(result.Output) + } + + if result := executor.Execute(Command{ID: "memory", Command: "free -h | grep Mem | awk '{print $2}'"}); result.ExitCode == 0 { + info.Memory = strings.TrimSpace(result.Output) + } + + if result := executor.Execute(Command{ID: "uptime", Command: "uptime -p"}); result.ExitCode == 0 { + info.Uptime = strings.TrimSpace(result.Output) + } + + if result := executor.Execute(Command{ID: "load", Command: "uptime | awk -F'load average:' '{print $2}' | xargs"}); result.ExitCode == 0 { + info.LoadAverage = strings.TrimSpace(result.Output) + } + + if result := executor.Execute(Command{ID: "disk", Command: "df -h / | tail -1 | awk '{print \"Root: \" $3 \"/\" $2 \" (\" $5 \" used)\"}'"}); result.ExitCode == 0 { + info.DiskUsage = strings.TrimSpace(result.Output) + } + + // Get private IP addresses + info.PrivateIPs = getPrivateIPs() + + return info +} + +// getPrivateIPs returns private IP addresses +func getPrivateIPs() string { + var privateIPs []string + + interfaces, err := net.Interfaces() + if err != nil { + return "Unable to determine" + } + + for _, iface := range interfaces { + if iface.Flags&net.FlagUp == 0 || iface.Flags&net.FlagLoopback != 0 { + continue // Skip down or loopback interfaces + } + + addrs, err := iface.Addrs() + if err != nil { + continue + } + + for _, addr := range addrs { + if ipnet, ok := addr.(*net.IPNet); ok && !ipnet.IP.IsLoopback() { + if isPrivateIP(ipnet.IP) { + privateIPs = append(privateIPs, fmt.Sprintf("%s (%s)", ipnet.IP.String(), iface.Name)) + } + } + } + } + + if len(privateIPs) == 0 { + return "No private IPs found" + } + + return strings.Join(privateIPs, ", ") +} + +// isPrivateIP checks if an IP address is private +func isPrivateIP(ip net.IP) bool { + // RFC 1918 private address ranges + private := []string{ + "10.0.0.0/8", + "172.16.0.0/12", + "192.168.0.0/16", + } + + for _, cidr := range private { + _, subnet, _ := net.ParseCIDR(cidr) + if subnet.Contains(ip) { + return true + } + } + + return false +} + +// FormatSystemInfoForPrompt formats system information for inclusion in diagnostic prompts +func FormatSystemInfoForPrompt(info *SystemInfo) string { + return fmt.Sprintf(`SYSTEM INFORMATION: +- Hostname: %s +- Operating System: %s +- Kernel Version: %s +- Architecture: %s +- CPU Cores: %s +- Total Memory: %s +- System Uptime: %s +- Current Load Average: %s +- Root Disk Usage: %s +- Private IP Addresses: %s +- Go Runtime: %s + +ISSUE DESCRIPTION:`, + info.Hostname, + info.OS, + info.Kernel, + info.Architecture, + info.CPUCores, + info.Memory, + info.Uptime, + info.LoadAverage, + info.DiskUsage, + info.PrivateIPs, + runtime.Version()) +} diff --git a/test-examples.sh b/test-examples.sh new file mode 100755 index 0000000..6a04f7a --- /dev/null +++ b/test-examples.sh @@ -0,0 +1,82 @@ +#!/bin/bash + +# Linux Diagnostic Agent - Test Scenarios +# Realistic Linux problems for testing the diagnostic agent + +echo "๐Ÿ”ง Linux Diagnostic Agent - Test Scenarios" +echo "===========================================" +echo "" + +echo "๐Ÿ“š Available test scenarios (copy-paste into the agent):" +echo "" + +echo "1. ๐Ÿ’พ DISK SPACE ISSUES (Inode Exhaustion):" +echo "โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€" +echo "I cannot create new files in /home directory even though df -h shows plenty of space available. Getting 'No space left on device' error when trying to touch new files." +echo "" + +echo "2. ๐Ÿง  MEMORY ISSUES (OOM Killer):" +echo "โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€" +echo "My applications keep getting killed randomly and I see 'killed' messages in logs. The system becomes unresponsive for a few seconds before recovering. This happens especially when running memory-intensive tasks." +echo "" + +echo "3. ๐ŸŒ NETWORK CONNECTIVITY (DNS Resolution):" +echo "โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€" +echo "I can ping IP addresses directly (like 8.8.8.8) but cannot resolve domain names. Web browsing fails with DNS resolution errors, but ping 8.8.8.8 works fine." +echo "" + +echo "4. โšก PERFORMANCE ISSUES (High Load):" +echo "โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€" +echo "System load average is consistently above 10.0 even when CPU usage appears normal. Applications are responding slowly and I notice high wait times. The server feels sluggish overall." +echo "" + +echo "5. ๐Ÿšซ WEB SERVER ISSUES (Permission Problems):" +echo "โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€" +echo "Web server returns 403 Forbidden errors for all pages. Files exist and seem readable, but nginx logs show permission denied errors. SELinux is disabled and file permissions look correct." +echo "" + +echo "6. ๐Ÿ–ฅ๏ธ HARDWARE/BOOT ISSUES (Kernel Module):" +echo "โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€" +echo "System boots but some hardware devices are not working. Network interface shows as down, USB devices are not recognized, and dmesg shows module loading failures." +echo "" + +echo "7. ๐ŸŒ DATABASE PERFORMANCE (I/O Bottleneck):" +echo "โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€" +echo "Database queries are extremely slow, taking 30+ seconds for simple SELECT statements. Disk activity LED is constantly on and system feels unresponsive during database operations." +echo "" + +echo "8. ๐Ÿ”ฅ HIGH CPU USAGE (Process Analysis):" +echo "โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€" +echo "System is running slow and CPU usage is constantly at 100%. Top shows high CPU usage but I can't identify which specific process or thread is causing the issue." +echo "" + +echo "9. ๐Ÿ“ FILE SYSTEM CORRUPTION:" +echo "โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€" +echo "Getting 'Input/output error' when accessing certain files and directories. Some files appear corrupted and applications crash when trying to read specific data files." +echo "" + +echo "10. ๐Ÿ”Œ SERVICE STARTUP FAILURES:" +echo "โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€" +echo "Critical services fail to start after system reboot. Systemctl shows services in failed state but error messages are unclear. System appears to boot normally otherwise." +echo "" + +echo "๐Ÿš€ Quick Start:" +echo "โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€" +echo "1. Run: ./nanny-agent" +echo "2. Copy-paste any scenario above when prompted" +echo "3. Watch the AI diagnose the problem step by step" +echo "" + +echo "๐Ÿงช Automated Testing:" +echo "โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€" +echo "Run integration tests: ./integration-tests.sh" +echo "This will test all scenarios automatically" +echo "" + +echo "๐Ÿ’ก Pro Tips:" +echo "โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€" +echo "- Each scenario is based on real-world Linux issues" +echo "- The AI will gather system info automatically" +echo "- Diagnostic commands are executed safely (read-only)" +echo "- You'll get a detailed resolution plan at the end" +echo "- Set NANNYAPI_ENDPOINT and NANNYAPI_MODEL before running"