Building an AI Code Interpreter Agent

Build a production-ready AI code interpreter that safely executes AI-generated code in isolated sandboxes. This cookbook shows you how to create an agent that can execute code, capture rich outputs like plots and dataframes, handle errors gracefully, and maintain security.

Overview

An AI code interpreter agent allows AI models to generate and execute code safely. This pattern is used by platforms like OpenAI’s Code Interpreter and other AI agent systems. HopX provides the secure execution environment needed for this use case.

Prerequisites

HopX API key (Get one here)
Python 3.8+ or Node.js 16+
Basic understanding of async programming
Familiarity with AI/LLM integration patterns

Architecture

The AI code interpreter follows this architecture:

┌─────────────┐
│   AI Model  │ Generates code
└──────┬──────┘
       │
       ▼
┌─────────────────┐
│  Interpreter    │ Validates & executes
│     Agent       │
└──────┬──────────┘
       │
       ▼
┌─────────────────┐
│  HopX Sandbox   │ Secure execution
└──────┬──────────┘
       │
       ▼
┌─────────────────┐
│  Rich Outputs   │ Plots, DataFrames, etc.
└─────────────────┘

Implementation

Step 1: Basic Code Execution

Start with a simple code execution function that safely runs AI-generated code:

from hopx_ai import Sandbox
from hopx_ai.errors import HopxError, CodeExecutionError
import os

class AICodeInterpreter:
    def __init__(self, api_key: str):
        self.api_key = api_key
        self.sandbox = None
    
    def execute_code(self, code: str, language: str = "python", timeout: int = 30):
        """Execute AI-generated code safely"""
        try:
            # Create sandbox if needed
            if not self.sandbox:
                self.sandbox = Sandbox.create(
                    template="code-interpreter",
                    api_key=self.api_key,
                    timeout_seconds=600
                )
            
            # Execute code with timeout
            result = self.sandbox.run_code(
                code,
                language=language,
                timeout=timeout
            )
            
            return {
                "success": result.success,
                "stdout": result.stdout,
                "stderr": result.stderr,
                "exit_code": result.exit_code,
                "execution_time": result.execution_time
            }
            
        except CodeExecutionError as e:
            return {
                "success": False,
                "error": f"Execution failed: {e.message}",
                "stderr": str(e)
            }
        except Exception as e:
            return {
                "success": False,
                "error": f"Unexpected error: {str(e)}"
            }
    
    def cleanup(self):
        """Clean up sandbox resources"""
        if self.sandbox:
            self.sandbox.kill()
            self.sandbox = None

# Usage
interpreter = AICodeInterpreter(api_key=os.getenv("HOPX_API_KEY"))
result = interpreter.execute_code("print('Hello from AI!')")
print(result)
interpreter.cleanup()

Step 2: Rich Output Capture

Capture plots, dataframes, and other rich outputs that AI models generate:

def execute_with_rich_outputs(self, code: str, language: str = "python"):
    """Execute code and capture rich outputs (plots, dataframes)"""
    try:
        if not self.sandbox:
            self.sandbox = Sandbox.create(
                template="code-interpreter",
                api_key=self.api_key
            )
        
        # Execute with rich output capture
        result = self.sandbox.run_code(code, language=language)
        
        rich_outputs = []
        if result.rich_outputs:
            for output in result.rich_outputs:
                rich_outputs.append({
                    "type": output.type,
                    "data": output.data
                })
        
        return {
            "success": result.success,
            "stdout": result.stdout,
            "stderr": result.stderr,
            "rich_outputs": rich_outputs,
            "rich_count": len(rich_outputs) if result.rich_outputs else 0
        }
        
    except Exception as e:
        return {
            "success": False,
            "error": str(e)
        }

# Example: Generate plot
code = """
import matplotlib.pyplot as plt
import numpy as np

x = np.linspace(0, 10, 100)
y = np.sin(x)
plt.plot(x, y)
plt.title('Sine Wave')
plt.savefig('/tmp/plot.png')
plt.show()
"""

result = interpreter.execute_with_rich_outputs(code)
print(f"Captured {result['rich_count']} rich outputs")

Step 3: Multi-Turn Conversation

Handle multi-turn conversations where the AI builds on previous execution results:

class MultiTurnInterpreter:
    def __init__(self, api_key: str):
        self.api_key = api_key
        self.sandbox = None
        self.conversation_history = []
    
    def execute_turn(self, code: str, context: dict = None):
        """Execute code in conversation context"""
        try:
            if not self.sandbox:
                self.sandbox = Sandbox.create(
                    template="code-interpreter",
                    api_key=self.api_key
                )
            
            # Add context from previous turns
            if context:
                # Set environment variables from context
                env_vars = context.get('env_vars', {})
                if env_vars:
                    self.sandbox.env.set_all(env_vars)
                
                # Upload files from context
                files = context.get('files', {})
                for path, content in files.items():
                    self.sandbox.files.write(path, content)
            
            # Execute code
            result = self.sandbox.run_code(code)
            
            # Capture outputs for next turn
            outputs = {
                "stdout": result.stdout,
                "stderr": result.stderr,
                "files": {}
            }
            
            # List generated files
            if result.success:
                files_list = self.sandbox.files.list("/workspace")
                for file in files_list:
                    if not file.is_dir:
                        outputs["files"][file.path] = self.sandbox.files.read(file.path)
            
            # Store in history
            self.conversation_history.append({
                "code": code,
                "result": outputs,
                "success": result.success
            })
            
            return {
                "success": result.success,
                "outputs": outputs,
                "history_length": len(self.conversation_history)
            }
            
        except Exception as e:
            return {
                "success": False,
                "error": str(e)
            }
    
    def get_context_for_next_turn(self):
        """Get context to pass to next turn"""
        if not self.conversation_history:
            return {}
        
        last_result = self.conversation_history[-1]["result"]
        return {
            "stdout": last_result["stdout"],
            "files": last_result["files"],
            "env_vars": self.sandbox.env.get_all() if self.sandbox else {}
        }

# Usage: Multi-turn conversation
interpreter = MultiTurnInterpreter(api_key=os.getenv("HOPX_API_KEY"))

# Turn 1: Create data
result1 = interpreter.execute_turn("""
import pandas as pd
df = pd.DataFrame({'x': [1, 2, 3], 'y': [4, 5, 6]})
df.to_csv('/workspace/data.csv', index=False)
print("Data created")
""")

# Turn 2: Use previous data
context = interpreter.get_context_for_next_turn()
result2 = interpreter.execute_turn("""
import pandas as pd
df = pd.read_csv('/workspace/data.csv')
print(f"Data shape: {df.shape}")
print(df.describe())
""", context)

interpreter.sandbox.kill()

Step 4: Error Handling and Validation

Implement robust error handling and code validation:

import re
from typing import Dict, Any

class SecureAICodeInterpreter:
    def __init__(self, api_key: str):
        self.api_key = api_key
        self.sandbox = None
        self.max_code_length = 100000  # 100KB limit
        self.blocked_patterns = [
            r'import\s+os\.system',
            r'import\s+subprocess',
            r'eval\s*\(',
            r'exec\s*\(',
            r'__import__',
            r'open\s*\([^)]*[\'"]/etc',
        ]
    
    def validate_code(self, code: str) -> Dict[str, Any]:
        """Validate code before execution"""
        # Check length
        if len(code) > self.max_code_length:
            return {
                "valid": False,
                "error": f"Code exceeds maximum length of {self.max_code_length} characters"
            }
        
        # Check for blocked patterns
        for pattern in self.blocked_patterns:
            if re.search(pattern, code, re.IGNORECASE):
                return {
                    "valid": False,
                    "error": f"Code contains blocked pattern: {pattern}"
                }
        
        return {"valid": True}
    
    def execute_safely(self, code: str, timeout: int = 30):
        """Execute code with validation and error handling"""
        # Validate code
        validation = self.validate_code(code)
        if not validation["valid"]:
            return {
                "success": False,
                "error": validation["error"],
                "validation_failed": True
            }
        
        try:
            if not self.sandbox:
                self.sandbox = Sandbox.create(
                    template="code-interpreter",
                    api_key=self.api_key,
                    timeout_seconds=600
                )
            
            # Execute with timeout
            result = self.sandbox.run_code(code, timeout=timeout)
            
            # Check for suspicious output
            if result.stderr and any(keyword in result.stderr.lower() 
                                    for keyword in ['permission denied', 'access denied', 'unauthorized']):
                return {
                    "success": False,
                    "error": "Security violation detected in execution",
                    "stderr": result.stderr
                }
            
            return {
                "success": result.success,
                "stdout": result.stdout,
                "stderr": result.stderr,
                "exit_code": result.exit_code
            }
            
        except Exception as e:
            return {
                "success": False,
                "error": f"Execution error: {str(e)}"
            }
    
    def cleanup(self):
        if self.sandbox:
            self.sandbox.kill()
            self.sandbox = None

# Usage
interpreter = SecureAICodeInterpreter(api_key=os.getenv("HOPX_API_KEY"))

# This will be blocked
result = interpreter.execute_safely("import subprocess; subprocess.call(['rm', '-rf', '/'])")
print(result)  # {"success": False, "error": "Code contains blocked pattern...", "validation_failed": True}

# This will execute
result = interpreter.execute_safely("print('Hello, safe code!')")
print(result)

interpreter.cleanup()

Best Practices

Security

Always validate AI-generated code before execution. Never trust user input or AI output without validation.

Code Validation: Check for dangerous patterns before execution
Resource Limits: Set appropriate timeouts and memory limits
Sandbox Isolation: Each execution should be in a fresh or properly isolated sandbox
Output Sanitization: Validate outputs before returning to users

Performance

Reuse sandboxes for multi-turn conversations to maintain state, but create fresh sandboxes for unrelated executions to ensure isolation.

Sandbox Reuse: Reuse sandboxes within a conversation session
Timeout Management: Set appropriate timeouts based on expected execution time
Parallel Execution: Use background execution for long-running tasks
Caching: Cache environment variables and common setup code

Error Handling

Graceful Degradation: Always return structured error responses
Error Logging: Log errors with context for debugging
User-Friendly Messages: Transform technical errors into user-friendly messages
Retry Logic: Implement retry for transient failures

Real-World Examples

This pattern is used by:

OpenAI Code Interpreter: Executes Python code in isolated environments
AI Agent Platforms: Various platforms that execute code generated by AI models
LangChain Code Execution: Agent frameworks that execute code

Multi-Agent Execution - Multi-agent workflows
Rich Output Capture - Handling plots and dataframes

Next Steps

Implement code validation based on your security requirements
Add rich output handling for your specific use case
Integrate with your AI model API
Set up monitoring and logging
Test with various code scenarios

AI & LLM Integration

Educational Platforms

Development Tools

Data Science & Analytics

Testing & CI/CD

Automation & Workflows

Serverless & Edge

Marketplace & Plugins

Enterprise & SaaS

Building an AI Code Interpreter Agent

Overview

Prerequisites

Architecture

Implementation

Step 1: Basic Code Execution

Step 2: Rich Output Capture

Step 3: Multi-Turn Conversation

Step 4: Error Handling and Validation

Best Practices

Security

Performance

Error Handling

Real-World Examples

Next Steps

AI & LLM Integration

Educational Platforms

Development Tools

Data Science & Analytics

Testing & CI/CD

Automation & Workflows

Serverless & Edge

Marketplace & Plugins

Enterprise & SaaS

​Overview

​Prerequisites

​Architecture

​Implementation

​Step 1: Basic Code Execution

​Step 2: Rich Output Capture

​Step 3: Multi-Turn Conversation

​Step 4: Error Handling and Validation

​Best Practices

​Security

​Performance

​Error Handling

​Real-World Examples

​Related Cookbooks

​Next Steps

Overview

Prerequisites

Architecture

Implementation

Step 1: Basic Code Execution

Step 2: Rich Output Capture

Step 3: Multi-Turn Conversation

Step 4: Error Handling and Validation

Best Practices

Security

Performance

Error Handling

Real-World Examples

Related Cookbooks

Next Steps