Initial Commit

This commit is contained in:
Harshavardhan Musanalli
2025-09-27 17:35:24 +02:00
parent 83fa088ed2
commit 1f01c38881
14 changed files with 1277 additions and 2 deletions

0
Dockerfile Normal file
View File

53
Makefile Normal file
View File

@@ -0,0 +1,53 @@
.PHONY: build run clean test install
# Build the application
build:
go build -o nanny-agent .
# Run the application
run: build
./nanny-agent
# Clean build artifacts
clean:
rm -f nanny-agent
# Run tests
test:
go test ./...
# Install dependencies
install:
go mod tidy
go mod download
# Build for production with optimizations
build-prod:
CGO_ENABLED=0 GOOS=linux go build -a -installsuffix cgo -ldflags '-w -s' -o nanny-agent .
# Install system-wide (requires sudo)
install-system: build-prod
sudo cp nanny-agent /usr/local/bin/
sudo chmod +x /usr/local/bin/nanny-agent
# Format code
fmt:
go fmt ./...
# Run linter (if golangci-lint is installed)
lint:
golangci-lint run
# Show help
help:
@echo "Available commands:"
@echo " build - Build the application"
@echo " run - Build and run the application"
@echo " clean - Clean build artifacts"
@echo " test - Run tests"
@echo " install - Install dependencies"
@echo " build-prod - Build for production"
@echo " install-system- Install system-wide (requires sudo)"
@echo " fmt - Format code"
@echo " lint - Run linter"
@echo " help - Show this help"

200
README.md
View File

@@ -1,3 +1,199 @@
# nannyagent # Linux Diagnostic Agent
nannyagent is a Linux AI diagnostic agent built on OpenAPI specifications relying on Tensorzero gateway A Go-based AI agent that diagnoses Linux system issues using the NannyAPI gateway with OpenAI-compatible SDK.
## Features
- Interactive command-line interface for submitting system issues
- **Automatic system information gathering** - Includes OS, kernel, CPU, memory, network info
- Integrates with NannyAPI using OpenAI-compatible Go SDK
- Executes diagnostic commands safely and collects output
- Provides step-by-step resolution plans
- **Comprehensive integration tests** with realistic Linux problem scenarios
## Setup
1. Clone this repository
2. Copy `.env.example` to `.env` and configure your NannyAPI endpoint:
```bash
cp .env.example .env
```
3. Install dependencies:
```bash
go mod tidy
```
4. Build and run:
```bash
make build
./nanny-agent
```
## Configuration
The agent can be configured using environment variables:
- `NANNYAPI_ENDPOINT`: The NannyAPI endpoint (default: `http://nannyapi.local:3000/openai/v1`)
- `NANNYAPI_MODEL`: The model identifier (default: `nannyapi::function_name::diagnose_and_heal`)
## Installation on Linux VM
### Direct Installation
1. **Install Go** (if not already installed):
```bash
# For Ubuntu/Debian
sudo apt update
sudo apt install golang-go
# For RHEL/CentOS/Fedora
sudo dnf install golang
# or
sudo yum install golang
```
2. **Clone and build the agent**:
```bash
git clone <your-repo-url>
cd nannyagentv2
go mod tidy
make build
```
3. **Install as system service** (optional):
```bash
sudo cp nanny-agent /usr/local/bin/
sudo chmod +x /usr/local/bin/nanny-agent
```
4. **Set environment variables**:
```bash
export NANNYAPI_ENDPOINT="http://your-nannyapi-endpoint:3000/openai/v1"
export NANNYAPI_MODEL="your-model-identifier"
```
## Usage
1. Start the agent:
```bash
./nanny-agent
```
2. Enter a system issue description when prompted:
```
> On /var filesystem I cannot create any file but df -h shows 30% free space available.
```
3. The agent will:
- Send the issue to the AI via NannyAPI using OpenAI SDK
- Execute diagnostic commands as suggested by the AI
- Provide command outputs back to the AI
- Display the final diagnosis and resolution plan
4. Type `quit` or `exit` to stop the agent
## How It Works
1. **System Information Gathering**: Agent automatically collects system details (OS, kernel, CPU, memory, network, etc.)
2. **Initial Issue**: User describes a Linux system problem
3. **Enhanced Prompt**: AI receives both the issue description and comprehensive system information
4. **Diagnostic Phase**: AI responds with diagnostic commands to run
5. **Command Execution**: Agent safely executes read-only commands
6. **Iterative Analysis**: AI analyzes command outputs and may request more commands
7. **Resolution Phase**: AI provides root cause analysis and step-by-step resolution plan
## Testing & Integration Tests
The agent includes comprehensive integration tests that simulate realistic Linux problems:
### Available Test Scenarios:
1. **Disk Space Issues** - Inode exhaustion scenarios
2. **Memory Problems** - OOM killer and memory pressure
3. **Network Issues** - DNS resolution problems
4. **Performance Issues** - High load averages and I/O bottlenecks
5. **Web Server Problems** - Permission and configuration issues
6. **Hardware/Boot Issues** - Kernel module and device problems
7. **Database Performance** - Slow queries and I/O contention
8. **Service Failures** - Startup and configuration problems
### Run Integration Tests:
```bash
# Interactive test scenarios
./test-examples.sh
# Automated integration tests
./integration-tests.sh
# Function discovery (find valid NannyAPI functions)
./discover-functions.sh
```
## Safety
- Only read-only commands are executed automatically
- Commands that modify the system (rm, mv, dd, redirection) are blocked by validation
- The resolution plan is provided for manual execution by the operator
- All commands have execution timeouts to prevent hanging
## API Integration
The agent uses the `github.com/sashabaranov/go-openai` SDK to communicate with NannyAPI's OpenAI-compatible API endpoint. This provides:
- Robust HTTP client with retries and timeouts
- Structured request/response handling
- Automatic JSON marshaling/unmarshaling
- Error handling and validation
## Example Session
```
Linux Diagnostic Agent Started
Enter a system issue description (or 'quit' to exit):
> Cannot create files in /var but df shows space available
Diagnosing issue: Cannot create files in /var but df shows space available
Gathering system information...
AI Response:
{
"response_type": "diagnostic",
"reasoning": "The 'No space left on device' error despite available disk space suggests inode exhaustion...",
"commands": [
{"id": "check_inodes", "command": "df -i /var", "description": "Check inode usage..."}
]
}
Executing command 'check_inodes': df -i /var
Output:
Filesystem Inodes IUsed IFree IUse% Mounted on
/dev/sda1 1000000 999999 1 100% /var
=== DIAGNOSIS COMPLETE ===
Root Cause: The /var filesystem has exhausted all available inodes
Resolution Plan: 1. Find and remove unnecessary files...
Confidence: High
```
Note: The AI receives comprehensive system information including:
- Hostname, OS version, kernel version
- CPU cores, memory, system uptime
- Network interfaces and private IPs
- Current load average and disk usage
## Available Make Commands
- `make build` - Build the application
- `make run` - Build and run the application
- `make clean` - Clean build artifacts
- `make test` - Run unit tests
- `make install` - Install dependencies
- `make build-prod` - Build for production
- `make install-system` - Install system-wide (requires sudo)
- `make fmt` - Format code
- `make help` - Show available commands
## Testing Commands
- `./test-examples.sh` - Show interactive test scenarios
- `./integration-tests.sh` - Run automated integration tests
- `./discover-functions.sh` - Find available NannyAPI functions
- `./install.sh` - Installation script for Linux VMs

270
agent.go Normal file
View File

@@ -0,0 +1,270 @@
package main
import (
"bytes"
"context"
"encoding/json"
"fmt"
"io"
"net/http"
"os"
"time"
"github.com/sashabaranov/go-openai"
)
// DiagnosticResponse represents the diagnostic phase response from AI
type DiagnosticResponse struct {
ResponseType string `json:"response_type"`
Reasoning string `json:"reasoning"`
Commands []Command `json:"commands"`
}
// ResolutionResponse represents the resolution phase response from AI
type ResolutionResponse struct {
ResponseType string `json:"response_type"`
RootCause string `json:"root_cause"`
ResolutionPlan string `json:"resolution_plan"`
Confidence string `json:"confidence"`
}
// Command represents a command to be executed
type Command struct {
ID string `json:"id"`
Command string `json:"command"`
Description string `json:"description"`
}
// CommandResult represents the result of executing a command
type CommandResult struct {
ID string `json:"id"`
Command string `json:"command"`
Output string `json:"output"`
ExitCode int `json:"exit_code"`
Error string `json:"error,omitempty"`
}
// LinuxDiagnosticAgent represents the main agent
type LinuxDiagnosticAgent struct {
client *openai.Client
model string
executor *CommandExecutor
episodeID string // TensorZero episode ID for conversation continuity
}
// NewLinuxDiagnosticAgent creates a new diagnostic agent
func NewLinuxDiagnosticAgent() *LinuxDiagnosticAgent {
endpoint := os.Getenv("NANNYAPI_ENDPOINT")
if endpoint == "" {
// Default endpoint - OpenAI SDK will append /chat/completions automatically
endpoint = "http://nannyapi.local:3000/openai/v1"
}
model := os.Getenv("NANNYAPI_MODEL")
if model == "" {
model = "nannyapi::function_name::diagnose_and_heal"
fmt.Printf("Warning: Using default model '%s'. Set NANNYAPI_MODEL environment variable for your specific function.\n", model)
}
// Create OpenAI client with custom base URL
// Note: The OpenAI SDK automatically appends "/chat/completions" to the base URL
config := openai.DefaultConfig("")
config.BaseURL = endpoint
client := openai.NewClientWithConfig(config)
return &LinuxDiagnosticAgent{
client: client,
model: model,
executor: NewCommandExecutor(10 * time.Second), // 10 second timeout for commands
}
}
// DiagnoseIssue starts the diagnostic process for a given issue
func (a *LinuxDiagnosticAgent) DiagnoseIssue(issue string) error {
fmt.Printf("Diagnosing issue: %s\n", issue)
fmt.Println("Gathering system information...")
// Gather system information
systemInfo := GatherSystemInfo()
// Format the initial prompt with system information
initialPrompt := FormatSystemInfoForPrompt(systemInfo) + "\n" + issue
// Start conversation with initial issue including system info
messages := []openai.ChatCompletionMessage{
{
Role: openai.ChatMessageRoleUser,
Content: initialPrompt,
},
}
for {
// Send request to TensorZero API via OpenAI SDK
response, err := a.sendRequest(messages)
if err != nil {
return fmt.Errorf("failed to send request: %w", err)
}
if len(response.Choices) == 0 {
return fmt.Errorf("no choices in response")
}
content := response.Choices[0].Message.Content
fmt.Printf("\nAI Response:\n%s\n", content)
// Parse the response to determine next action
var diagnosticResp DiagnosticResponse
var resolutionResp ResolutionResponse
// Try to parse as diagnostic response first
if err := json.Unmarshal([]byte(content), &diagnosticResp); err == nil && diagnosticResp.ResponseType == "diagnostic" {
// Handle diagnostic phase
fmt.Printf("\nReasoning: %s\n", diagnosticResp.Reasoning)
if len(diagnosticResp.Commands) == 0 {
fmt.Println("No commands to execute in diagnostic phase")
break
}
// Execute commands and collect results
commandResults := make([]CommandResult, 0, len(diagnosticResp.Commands))
for _, cmd := range diagnosticResp.Commands {
fmt.Printf("\nExecuting command '%s': %s\n", cmd.ID, cmd.Command)
result := a.executor.Execute(cmd)
commandResults = append(commandResults, result)
fmt.Printf("Output:\n%s\n", result.Output)
if result.Error != "" {
fmt.Printf("Error: %s\n", result.Error)
}
}
// Prepare command results as user message
resultsJSON, err := json.MarshalIndent(commandResults, "", " ")
if err != nil {
return fmt.Errorf("failed to marshal command results: %w", err)
}
// Add AI response and command results to conversation
messages = append(messages, openai.ChatCompletionMessage{
Role: openai.ChatMessageRoleAssistant,
Content: content,
})
messages = append(messages, openai.ChatCompletionMessage{
Role: openai.ChatMessageRoleUser,
Content: string(resultsJSON),
})
continue
}
// Try to parse as resolution response
if err := json.Unmarshal([]byte(content), &resolutionResp); err == nil && resolutionResp.ResponseType == "resolution" {
// Handle resolution phase
fmt.Printf("\n=== DIAGNOSIS COMPLETE ===\n")
fmt.Printf("Root Cause: %s\n", resolutionResp.RootCause)
fmt.Printf("Resolution Plan: %s\n", resolutionResp.ResolutionPlan)
fmt.Printf("Confidence: %s\n", resolutionResp.Confidence)
break
}
// If we can't parse the response, treat it as an error or unexpected format
fmt.Printf("Unexpected response format or error from AI:\n%s\n", content)
break
}
return nil
}
// TensorZeroRequest represents a request structure compatible with TensorZero's episode_id
type TensorZeroRequest struct {
Model string `json:"model"`
Messages []openai.ChatCompletionMessage `json:"messages"`
EpisodeID string `json:"tensorzero::episode_id,omitempty"`
}
// TensorZeroResponse represents TensorZero's response with episode_id
type TensorZeroResponse struct {
openai.ChatCompletionResponse
EpisodeID string `json:"episode_id"`
}
// sendRequest sends a request to the TensorZero API with tensorzero::episode_id support
func (a *LinuxDiagnosticAgent) sendRequest(messages []openai.ChatCompletionMessage) (*openai.ChatCompletionResponse, error) {
ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
defer cancel()
// Create TensorZero-compatible request
tzRequest := TensorZeroRequest{
Model: a.model,
Messages: messages,
}
// Include tensorzero::episode_id for conversation continuity (if we have one)
if a.episodeID != "" {
tzRequest.EpisodeID = a.episodeID
}
fmt.Printf("Debug: Sending request to model: %s", a.model)
if a.episodeID != "" {
fmt.Printf(" (episode: %s)", a.episodeID)
}
fmt.Println()
// Marshal the request
requestBody, err := json.Marshal(tzRequest)
if err != nil {
return nil, fmt.Errorf("failed to marshal request: %w", err)
}
// Create HTTP request
endpoint := os.Getenv("NANNYAPI_ENDPOINT")
if endpoint == "" {
endpoint = "http://nannyapi.local:3000/openai/v1"
}
// Ensure the endpoint ends with /chat/completions
if endpoint[len(endpoint)-1] != '/' {
endpoint += "/"
}
endpoint += "chat/completions"
req, err := http.NewRequestWithContext(ctx, "POST", endpoint, bytes.NewBuffer(requestBody))
if err != nil {
return nil, fmt.Errorf("failed to create request: %w", err)
}
req.Header.Set("Content-Type", "application/json")
// Make the request
client := &http.Client{Timeout: 30 * time.Second}
resp, err := client.Do(req)
if err != nil {
return nil, fmt.Errorf("failed to send request: %w", err)
}
defer resp.Body.Close()
// Read response body
body, err := io.ReadAll(resp.Body)
if err != nil {
return nil, fmt.Errorf("failed to read response: %w", err)
}
if resp.StatusCode != http.StatusOK {
return nil, fmt.Errorf("API request failed with status %d: %s", resp.StatusCode, string(body))
}
// Parse TensorZero response
var tzResponse TensorZeroResponse
if err := json.Unmarshal(body, &tzResponse); err != nil {
return nil, fmt.Errorf("failed to unmarshal response: %w", err)
}
// Extract episode_id from first response
if a.episodeID == "" && tzResponse.EpisodeID != "" {
a.episodeID = tzResponse.EpisodeID
fmt.Printf("Debug: Extracted episode ID: %s\n", a.episodeID)
}
return &tzResponse.ChatCompletionResponse, nil
}

107
agent_test.go Normal file
View File

@@ -0,0 +1,107 @@
package main
import (
"testing"
"time"
)
func TestCommandExecutor_ValidateCommand(t *testing.T) {
executor := NewCommandExecutor(5 * time.Second)
tests := []struct {
name string
command string
wantErr bool
}{
{
name: "safe command - ls",
command: "ls -la /var",
wantErr: false,
},
{
name: "safe command - df",
command: "df -h",
wantErr: false,
},
{
name: "safe command - ps",
command: "ps aux | grep nginx",
wantErr: false,
},
{
name: "dangerous command - rm",
command: "rm -rf /tmp/*",
wantErr: true,
},
{
name: "dangerous command - dd",
command: "dd if=/dev/zero of=/dev/sda",
wantErr: true,
},
{
name: "dangerous command - sudo",
command: "sudo systemctl stop nginx",
wantErr: true,
},
{
name: "dangerous command - redirection",
command: "echo 'test' > /etc/passwd",
wantErr: true,
},
}
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
err := executor.validateCommand(tt.command)
if (err != nil) != tt.wantErr {
t.Errorf("validateCommand() error = %v, wantErr %v", err, tt.wantErr)
}
})
}
}
func TestCommandExecutor_Execute(t *testing.T) {
executor := NewCommandExecutor(5 * time.Second)
// Test safe command execution
cmd := Command{
ID: "test_echo",
Command: "echo 'Hello, World!'",
Description: "Test echo command",
}
result := executor.Execute(cmd)
if result.ExitCode != 0 {
t.Errorf("Expected exit code 0, got %d", result.ExitCode)
}
if result.Output != "Hello, World!\n" {
t.Errorf("Expected 'Hello, World!\\n', got '%s'", result.Output)
}
if result.Error != "" {
t.Errorf("Expected no error, got '%s'", result.Error)
}
}
func TestCommandExecutor_ExecuteUnsafeCommand(t *testing.T) {
executor := NewCommandExecutor(5 * time.Second)
// Test unsafe command rejection
cmd := Command{
ID: "test_rm",
Command: "rm -rf /tmp/test",
Description: "Dangerous rm command",
}
result := executor.Execute(cmd)
if result.ExitCode != 1 {
t.Errorf("Expected exit code 1 for unsafe command, got %d", result.ExitCode)
}
if result.Error == "" {
t.Error("Expected error for unsafe command, got none")
}
}

51
discover-functions.sh Executable file
View File

@@ -0,0 +1,51 @@
#!/bin/bash
# NannyAPI Function Discovery Script
# This script helps you find the correct function name for your NannyAPI setup
echo "🔍 NannyAPI Function Discovery"
echo "=============================="
echo ""
ENDPOINT="${NANNYAPI_ENDPOINT:-http://nannyapi.local:3000/openai/v1}"
echo "Testing endpoint: $ENDPOINT/chat/completions"
echo ""
# Test common function name patterns
test_functions=(
"nannyapi::function_name::diagnose"
"nannyapi::function_name::diagnose_and_heal"
"nannyapi::function_name::linux_diagnostic"
"nannyapi::function_name::system_diagnostic"
"nannyapi::model_name::gpt-4"
"nannyapi::model_name::claude"
)
for func in "${test_functions[@]}"; do
echo "Testing function: $func"
response=$(curl -s -X POST "$ENDPOINT/chat/completions" \
-H "Content-Type: application/json" \
-d "{\"model\":\"$func\",\"messages\":[{\"role\":\"user\",\"content\":\"test\"}]}")
if echo "$response" | grep -q "Unknown function"; then
echo " ❌ Function not found"
elif echo "$response" | grep -q "error"; then
echo " ⚠️ Error: $(echo "$response" | jq -r '.error' 2>/dev/null || echo "$response")"
else
echo " ✅ Function exists and responding!"
echo " Use this in your environment: export NANNYAPI_MODEL=\"$func\""
fi
echo ""
done
echo "💡 If none of the above work, check your NannyAPI configuration file"
echo " for the correct function names and update NANNYAPI_MODEL accordingly."
echo ""
echo "Example NannyAPI config snippet:"
echo "```yaml"
echo "functions:"
echo " diagnose_and_heal: # This becomes 'nannyapi::function_name::diagnose_and_heal'"
echo " # function definition"
echo "```"

108
executor.go Normal file
View File

@@ -0,0 +1,108 @@
package main
import (
"context"
"fmt"
"os/exec"
"strings"
"time"
)
// CommandExecutor handles safe execution of diagnostic commands
type CommandExecutor struct {
timeout time.Duration
}
// NewCommandExecutor creates a new command executor with specified timeout
func NewCommandExecutor(timeout time.Duration) *CommandExecutor {
return &CommandExecutor{
timeout: timeout,
}
}
// Execute executes a command safely with timeout and validation
func (ce *CommandExecutor) Execute(cmd Command) CommandResult {
result := CommandResult{
ID: cmd.ID,
Command: cmd.Command,
}
// Validate command safety
if err := ce.validateCommand(cmd.Command); err != nil {
result.Error = fmt.Sprintf("unsafe command: %s", err.Error())
result.ExitCode = 1
return result
}
// Create context with timeout
ctx, cancel := context.WithTimeout(context.Background(), ce.timeout)
defer cancel()
// Execute command using shell for proper handling of pipes, redirects, etc.
execCmd := exec.CommandContext(ctx, "/bin/bash", "-c", cmd.Command)
output, err := execCmd.CombinedOutput()
result.Output = string(output)
if err != nil {
result.Error = err.Error()
if exitError, ok := err.(*exec.ExitError); ok {
result.ExitCode = exitError.ExitCode()
} else {
result.ExitCode = 1
}
} else {
result.ExitCode = 0
}
return result
}
// validateCommand checks if a command is safe to execute
func (ce *CommandExecutor) validateCommand(command string) error {
// Convert to lowercase for case-insensitive checking
cmd := strings.ToLower(strings.TrimSpace(command))
// List of dangerous commands/patterns
dangerousPatterns := []string{
"rm ", "rm\t", "rm\n",
"mv ", "mv\t", "mv\n",
"dd ", "dd\t", "dd\n",
"mkfs", "fdisk", "parted",
"shutdown", "reboot", "halt", "poweroff",
"passwd", "userdel", "usermod",
"chmod", "chown", "chgrp",
"systemctl stop", "systemctl disable", "systemctl mask",
"service stop", "service disable",
"kill ", "killall", "pkill",
"crontab -r", "crontab -e",
"iptables -F", "iptables -D", "iptables -I",
"umount ", "unmount ", // Allow mount but not umount
"wget ", "curl ", // Prevent network operations
"| dd", "| rm", "| mv", // Prevent piping to dangerous commands
}
// Check for dangerous patterns
for _, pattern := range dangerousPatterns {
if strings.Contains(cmd, pattern) {
return fmt.Errorf("command contains dangerous pattern: %s", pattern)
}
}
// Additional checks for commands that start with dangerous operations
if strings.HasPrefix(cmd, "rm ") || strings.HasPrefix(cmd, "rm\t") {
return fmt.Errorf("rm command not allowed")
}
// Check for sudo usage (we want to avoid automated sudo commands)
if strings.HasPrefix(cmd, "sudo ") {
return fmt.Errorf("sudo commands not allowed for automated execution")
}
// Check for dangerous redirections (but allow safe ones like 2>/dev/null)
if strings.Contains(cmd, ">") && !strings.Contains(cmd, "2>/dev/null") && !strings.Contains(cmd, ">/dev/null") {
return fmt.Errorf("file redirection not allowed except to /dev/null")
}
return nil
}

5
go.mod Normal file
View File

@@ -0,0 +1,5 @@
module nannyagentv2
go 1.23
require github.com/sashabaranov/go-openai v1.32.0

2
go.sum Normal file
View File

@@ -0,0 +1,2 @@
github.com/sashabaranov/go-openai v1.32.0 h1:Yk3iE9moX3RBXxrof3OBtUBrE7qZR0zF9ebsoO4zVzI=
github.com/sashabaranov/go-openai v1.32.0/go.mod h1:lj5b/K+zjTSFxVLijLSTDZuP7adOgerWeFyZLUhAKRg=

85
install.sh Executable file
View File

@@ -0,0 +1,85 @@
#!/bin/bash
# Linux Diagnostic Agent Installation Script
# This script installs the nanny-agent on a Linux system
set -e
echo "🔧 Linux Diagnostic Agent Installation Script"
echo "=============================================="
# Check if Go is installed
if ! command -v go &> /dev/null; then
echo "❌ Go is not installed. Please install Go first:"
echo ""
echo "For Ubuntu/Debian:"
echo " sudo apt update && sudo apt install golang-go"
echo ""
echo "For RHEL/CentOS/Fedora:"
echo " sudo dnf install golang"
echo " # or"
echo " sudo yum install golang"
echo ""
exit 1
fi
echo "✅ Go is installed: $(go version)"
# Build the application
echo "🔨 Building the application..."
go mod tidy
make build
# Check if build was successful
if [ ! -f "./nanny-agent" ]; then
echo "❌ Build failed! nanny-agent binary not found."
exit 1
fi
echo "✅ Build successful!"
# Ask for installation preference
echo ""
echo "Installation options:"
echo "1. Install system-wide (/usr/local/bin) - requires sudo"
echo "2. Keep in current directory"
echo ""
read -p "Choose option (1 or 2): " choice
case $choice in
1)
echo "📦 Installing system-wide..."
sudo cp nanny-agent /usr/local/bin/
sudo chmod +x /usr/local/bin/nanny-agent
echo "✅ Agent installed to /usr/local/bin/nanny-agent"
echo ""
echo "You can now run the agent from anywhere with:"
echo " nanny-agent"
;;
2)
echo "✅ Agent ready in current directory"
echo ""
echo "Run the agent with:"
echo " ./nanny-agent"
;;
*)
echo "❌ Invalid choice. Agent is available in current directory."
echo "Run with: ./nanny-agent"
;;
esac
# Configuration
echo ""
echo "📝 Configuration:"
echo "Set these environment variables to configure the agent:"
echo ""
echo "export NANNYAPI_ENDPOINT=\"http://your-nannyapi-host:3000/openai/v1\""
echo "export NANNYAPI_MODEL=\"your-model-identifier\""
echo ""
echo "Or create a .env file in the working directory."
echo ""
echo "🎉 Installation complete!"
echo ""
echo "Example usage:"
echo " ./nanny-agent"
echo " > On /var filesystem I cannot create any file but df -h shows 30% free space available."

116
integration-tests.sh Executable file
View File

@@ -0,0 +1,116 @@
#!/bin/bash
# Linux Diagnostic Agent - Integration Tests
# This script creates realistic Linux problem scenarios for testing
set -e
AGENT_BINARY="./nanny-agent"
TEST_DIR="/tmp/nanny-agent-tests"
TEST_LOG="$TEST_DIR/integration_test.log"
# Color codes for output
RED='\033[0;31m'
GREEN='\033[0;32m'
YELLOW='\033[1;33m'
BLUE='\033[0;34m'
NC='\033[0m' # No Color
# Ensure test directory exists
mkdir -p "$TEST_DIR"
echo -e "${BLUE}🧪 Linux Diagnostic Agent - Integration Tests${NC}"
echo "================================================="
echo ""
# Check if agent binary exists
if [[ ! -f "$AGENT_BINARY" ]]; then
echo -e "${RED}❌ Agent binary not found at $AGENT_BINARY${NC}"
echo "Please run: make build"
exit 1
fi
# Function to run a test scenario
run_test() {
local test_name="$1"
local scenario="$2"
local expected_keywords="$3"
echo -e "${YELLOW}📋 Test: $test_name${NC}"
echo "Scenario: $scenario"
echo ""
# Run the agent with the scenario
echo "$scenario" | timeout 120s "$AGENT_BINARY" > "$TEST_LOG" 2>&1 || true
# Check if any expected keywords are found in the output
local found_keywords=0
IFS=',' read -ra KEYWORDS <<< "$expected_keywords"
for keyword in "${KEYWORDS[@]}"; do
keyword=$(echo "$keyword" | xargs) # trim whitespace
if grep -qi "$keyword" "$TEST_LOG"; then
echo -e "${GREEN} ✅ Found expected keyword: $keyword${NC}"
((found_keywords++))
else
echo -e "${RED} ❌ Missing keyword: $keyword${NC}"
fi
done
# Show summary
if [[ $found_keywords -gt 0 ]]; then
echo -e "${GREEN} ✅ Test PASSED ($found_keywords keywords found)${NC}"
else
echo -e "${RED} ❌ Test FAILED (no expected keywords found)${NC}"
fi
echo ""
echo "Full output saved to: $TEST_LOG"
echo "----------------------------------------"
echo ""
}
# Test Scenario 1: Disk Space Issues (Inode Exhaustion)
run_test "Disk Space - Inode Exhaustion" \
"I cannot create new files in /home directory even though df -h shows plenty of space available. Getting 'No space left on device' error when trying to touch new files." \
"inode,df -i,filesystem,inodes,exhausted"
# Test Scenario 2: Memory Issues
run_test "Memory Issues - OOM Killer" \
"My applications keep getting killed randomly and I see 'killed' messages in logs. The system becomes unresponsive for a few seconds before recovering. This happens especially when running memory-intensive tasks." \
"memory,oom,killed,dmesg,free,swap"
# Test Scenario 3: Network Connectivity Issues
run_test "Network Connectivity - DNS Resolution" \
"I can ping IP addresses directly (like 8.8.8.8) but cannot resolve domain names. Web browsing fails with DNS resolution errors, but ping 8.8.8.8 works fine." \
"dns,resolv.conf,nslookup,nameserver,dig"
# Test Scenario 4: Service/Process Issues
run_test "Service Issues - High Load" \
"System load average is consistently above 10.0 even when CPU usage appears normal. Applications are responding slowly and I notice high wait times. The server feels sluggish overall." \
"load,average,cpu,iostat,vmstat,processes"
# Test Scenario 5: File System Issues
run_test "Filesystem Issues - Permission Problems" \
"Web server returns 403 Forbidden errors for all pages. Files exist and seem readable, but nginx logs show permission denied errors. SELinux is disabled and file permissions look correct." \
"permission,403,nginx,chmod,chown,selinux"
# Test Scenario 6: Boot/System Issues
run_test "Boot Issues - Kernel Module" \
"System boots but some hardware devices are not working. Network interface shows as down, USB devices are not recognized, and dmesg shows module loading failures." \
"module,lsmod,dmesg,hardware,interface,usb"
# Test Scenario 7: Performance Issues
run_test "Performance Issues - I/O Bottleneck" \
"Database queries are extremely slow, taking 30+ seconds for simple SELECT statements. Disk activity LED is constantly on and system feels unresponsive during database operations." \
"iostat,iotop,disk,database,slow,performance"
echo -e "${BLUE}🏁 Integration Tests Complete${NC}"
echo ""
echo "Check individual test logs in: $TEST_DIR"
echo ""
echo -e "${YELLOW}💡 Tips:${NC}"
echo "- Tests use realistic scenarios that could occur on production systems"
echo "- Each test expects the AI to suggest relevant diagnostic commands"
echo "- Review the full logs to see the complete diagnostic conversation"
echo "- Tests timeout after 120 seconds to prevent hanging"
echo "- Make sure NANNYAPI_ENDPOINT and NANNYAPI_MODEL are set correctly"

46
main.go Normal file
View File

@@ -0,0 +1,46 @@
package main
import (
"bufio"
"fmt"
"log"
"os"
"strings"
)
func main() {
// Initialize the agent
agent := NewLinuxDiagnosticAgent()
// Start the interactive session
fmt.Println("Linux Diagnostic Agent Started")
fmt.Println("Enter a system issue description (or 'quit' to exit):")
scanner := bufio.NewScanner(os.Stdin)
for {
fmt.Print("> ")
if !scanner.Scan() {
break
}
input := strings.TrimSpace(scanner.Text())
if input == "quit" || input == "exit" {
break
}
if input == "" {
continue
}
// Process the issue
if err := agent.DiagnoseIssue(input); err != nil {
fmt.Printf("Error: %v\n", err)
}
}
if err := scanner.Err(); err != nil {
log.Fatal(err)
}
fmt.Println("Goodbye!")
}

154
system_info.go Normal file
View File

@@ -0,0 +1,154 @@
package main
import (
"fmt"
"net"
"runtime"
"strings"
"time"
)
// SystemInfo represents basic system information
type SystemInfo struct {
Hostname string `json:"hostname"`
OS string `json:"os"`
Kernel string `json:"kernel"`
Architecture string `json:"architecture"`
CPUCores string `json:"cpu_cores"`
Memory string `json:"memory"`
Uptime string `json:"uptime"`
PrivateIPs string `json:"private_ips"`
LoadAverage string `json:"load_average"`
DiskUsage string `json:"disk_usage"`
}
// GatherSystemInfo collects basic system information
func GatherSystemInfo() *SystemInfo {
info := &SystemInfo{}
executor := NewCommandExecutor(5 * time.Second)
// Basic system info
if result := executor.Execute(Command{ID: "hostname", Command: "hostname"}); result.ExitCode == 0 {
info.Hostname = strings.TrimSpace(result.Output)
}
if result := executor.Execute(Command{ID: "os", Command: "lsb_release -d 2>/dev/null | cut -f2 || cat /etc/os-release | grep PRETTY_NAME | cut -d'=' -f2 | tr -d '\"'"}); result.ExitCode == 0 {
info.OS = strings.TrimSpace(result.Output)
}
if result := executor.Execute(Command{ID: "kernel", Command: "uname -r"}); result.ExitCode == 0 {
info.Kernel = strings.TrimSpace(result.Output)
}
if result := executor.Execute(Command{ID: "arch", Command: "uname -m"}); result.ExitCode == 0 {
info.Architecture = strings.TrimSpace(result.Output)
}
if result := executor.Execute(Command{ID: "cores", Command: "nproc"}); result.ExitCode == 0 {
info.CPUCores = strings.TrimSpace(result.Output)
}
if result := executor.Execute(Command{ID: "memory", Command: "free -h | grep Mem | awk '{print $2}'"}); result.ExitCode == 0 {
info.Memory = strings.TrimSpace(result.Output)
}
if result := executor.Execute(Command{ID: "uptime", Command: "uptime -p"}); result.ExitCode == 0 {
info.Uptime = strings.TrimSpace(result.Output)
}
if result := executor.Execute(Command{ID: "load", Command: "uptime | awk -F'load average:' '{print $2}' | xargs"}); result.ExitCode == 0 {
info.LoadAverage = strings.TrimSpace(result.Output)
}
if result := executor.Execute(Command{ID: "disk", Command: "df -h / | tail -1 | awk '{print \"Root: \" $3 \"/\" $2 \" (\" $5 \" used)\"}'"}); result.ExitCode == 0 {
info.DiskUsage = strings.TrimSpace(result.Output)
}
// Get private IP addresses
info.PrivateIPs = getPrivateIPs()
return info
}
// getPrivateIPs returns private IP addresses
func getPrivateIPs() string {
var privateIPs []string
interfaces, err := net.Interfaces()
if err != nil {
return "Unable to determine"
}
for _, iface := range interfaces {
if iface.Flags&net.FlagUp == 0 || iface.Flags&net.FlagLoopback != 0 {
continue // Skip down or loopback interfaces
}
addrs, err := iface.Addrs()
if err != nil {
continue
}
for _, addr := range addrs {
if ipnet, ok := addr.(*net.IPNet); ok && !ipnet.IP.IsLoopback() {
if isPrivateIP(ipnet.IP) {
privateIPs = append(privateIPs, fmt.Sprintf("%s (%s)", ipnet.IP.String(), iface.Name))
}
}
}
}
if len(privateIPs) == 0 {
return "No private IPs found"
}
return strings.Join(privateIPs, ", ")
}
// isPrivateIP checks if an IP address is private
func isPrivateIP(ip net.IP) bool {
// RFC 1918 private address ranges
private := []string{
"10.0.0.0/8",
"172.16.0.0/12",
"192.168.0.0/16",
}
for _, cidr := range private {
_, subnet, _ := net.ParseCIDR(cidr)
if subnet.Contains(ip) {
return true
}
}
return false
}
// FormatSystemInfoForPrompt formats system information for inclusion in diagnostic prompts
func FormatSystemInfoForPrompt(info *SystemInfo) string {
return fmt.Sprintf(`SYSTEM INFORMATION:
- Hostname: %s
- Operating System: %s
- Kernel Version: %s
- Architecture: %s
- CPU Cores: %s
- Total Memory: %s
- System Uptime: %s
- Current Load Average: %s
- Root Disk Usage: %s
- Private IP Addresses: %s
- Go Runtime: %s
ISSUE DESCRIPTION:`,
info.Hostname,
info.OS,
info.Kernel,
info.Architecture,
info.CPUCores,
info.Memory,
info.Uptime,
info.LoadAverage,
info.DiskUsage,
info.PrivateIPs,
runtime.Version())
}

82
test-examples.sh Executable file
View File

@@ -0,0 +1,82 @@
#!/bin/bash
# Linux Diagnostic Agent - Test Scenarios
# Realistic Linux problems for testing the diagnostic agent
echo "🔧 Linux Diagnostic Agent - Test Scenarios"
echo "==========================================="
echo ""
echo "📚 Available test scenarios (copy-paste into the agent):"
echo ""
echo "1. 💾 DISK SPACE ISSUES (Inode Exhaustion):"
echo "────────────────────────────────────────────"
echo "I cannot create new files in /home directory even though df -h shows plenty of space available. Getting 'No space left on device' error when trying to touch new files."
echo ""
echo "2. 🧠 MEMORY ISSUES (OOM Killer):"
echo "─────────────────────────────────"
echo "My applications keep getting killed randomly and I see 'killed' messages in logs. The system becomes unresponsive for a few seconds before recovering. This happens especially when running memory-intensive tasks."
echo ""
echo "3. 🌐 NETWORK CONNECTIVITY (DNS Resolution):"
echo "─────────────────────────────────────────────"
echo "I can ping IP addresses directly (like 8.8.8.8) but cannot resolve domain names. Web browsing fails with DNS resolution errors, but ping 8.8.8.8 works fine."
echo ""
echo "4. ⚡ PERFORMANCE ISSUES (High Load):"
echo "───────────────────────────────────"
echo "System load average is consistently above 10.0 even when CPU usage appears normal. Applications are responding slowly and I notice high wait times. The server feels sluggish overall."
echo ""
echo "5. 🚫 WEB SERVER ISSUES (Permission Problems):"
echo "──────────────────────────────────────────────"
echo "Web server returns 403 Forbidden errors for all pages. Files exist and seem readable, but nginx logs show permission denied errors. SELinux is disabled and file permissions look correct."
echo ""
echo "6. 🖥️ HARDWARE/BOOT ISSUES (Kernel Module):"
echo "─────────────────────────────────────────────"
echo "System boots but some hardware devices are not working. Network interface shows as down, USB devices are not recognized, and dmesg shows module loading failures."
echo ""
echo "7. 🐌 DATABASE PERFORMANCE (I/O Bottleneck):"
echo "─────────────────────────────────────────────"
echo "Database queries are extremely slow, taking 30+ seconds for simple SELECT statements. Disk activity LED is constantly on and system feels unresponsive during database operations."
echo ""
echo "8. 🔥 HIGH CPU USAGE (Process Analysis):"
echo "────────────────────────────────────────"
echo "System is running slow and CPU usage is constantly at 100%. Top shows high CPU usage but I can't identify which specific process or thread is causing the issue."
echo ""
echo "9. 📁 FILE SYSTEM CORRUPTION:"
echo "────────────────────────────"
echo "Getting 'Input/output error' when accessing certain files and directories. Some files appear corrupted and applications crash when trying to read specific data files."
echo ""
echo "10. 🔌 SERVICE STARTUP FAILURES:"
echo "───────────────────────────────"
echo "Critical services fail to start after system reboot. Systemctl shows services in failed state but error messages are unclear. System appears to boot normally otherwise."
echo ""
echo "🚀 Quick Start:"
echo "──────────────"
echo "1. Run: ./nanny-agent"
echo "2. Copy-paste any scenario above when prompted"
echo "3. Watch the AI diagnose the problem step by step"
echo ""
echo "🧪 Automated Testing:"
echo "────────────────────"
echo "Run integration tests: ./integration-tests.sh"
echo "This will test all scenarios automatically"
echo ""
echo "💡 Pro Tips:"
echo "───────────"
echo "- Each scenario is based on real-world Linux issues"
echo "- The AI will gather system info automatically"
echo "- Diagnostic commands are executed safely (read-only)"
echo "- You'll get a detailed resolution plan at the end"
echo "- Set NANNYAPI_ENDPOINT and NANNYAPI_MODEL before running"