Cloud Jupyter Notebook Service

Build a production-ready cloud Jupyter notebook service that provides notebook execution, rich output rendering, and data science workflows. This cookbook demonstrates how to create a service similar to Kaggle or Google Colab using HopX.

Overview

Cloud Jupyter notebook services allow data scientists to run notebooks in the cloud without local setup. The service executes notebook cells, captures rich outputs (plots, dataframes), handles large datasets, and supports model training workflows. HopX provides the secure execution environment needed for this use case.

Prerequisites

HopX API key (Get one here)
Python 3.8+ or Node.js 16+
Understanding of Jupyter notebook format
Basic knowledge of data science workflows

Architecture

┌──────────────┐
│   Notebook   │ Cell execution requests
│     UI       │
└──────┬───────┘
       │
       ▼
┌─────────────────┐
│  Notebook       │ Parse, execute, capture
│    Service      │
└──────┬──────────┘
       │
       ▼
┌─────────────────┐
│  HopX Sandbox   │ Secure execution
└──────┬──────────┘
       │
       ▼
┌─────────────────┐
│  Rich Outputs   │ Plots, DataFrames, HTML
└─────────────────┘

Implementation

Step 1: Notebook Cell Execution

Execute individual notebook cells and capture outputs:

from hopx_ai import Sandbox
import json
import os
from typing import Dict, List, Any

class JupyterNotebookService:
    def __init__(self, api_key: str):
        self.api_key = api_key
        self.sandbox = None
        self.cell_state = {}  # Store variables between cells
    
    def initialize_notebook(self, notebook_id: str) -> Dict[str, Any]:
        """Initialize a new notebook session"""
        try:
            self.sandbox = Sandbox.create(
                template="code-interpreter",
                api_key=self.api_key,
                timeout_seconds=3600  # 1 hour session
            )
            
            # Set up data science environment
            self.sandbox.env.set_all({
                "JUPYTER_MODE": "true",
                "PYTHONPATH": "/workspace"
            })
            
            return {
                "success": True,
                "notebook_id": notebook_id,
                "sandbox_id": self.sandbox.sandbox_id
            }
        except Exception as e:
            return {
                "success": False,
                "error": str(e)
            }
    
    def execute_cell(self, cell_code: str, cell_id: str = None) -> Dict[str, Any]:
        """Execute a notebook cell"""
        try:
            # Use IPython execution for notebook-like behavior
            result = self.sandbox.run_ipython(cell_code)
            
            # Capture outputs
            outputs = []
            if result.rich_outputs:
                for output in result.rich_outputs:
                    outputs.append({
                        "type": output.type,
                        "data": output.data
                    })
            
            # Store cell state
            if cell_id:
                self.cell_state[cell_id] = {
                    "code": cell_code,
                    "outputs": outputs,
                    "stdout": result.stdout,
                    "stderr": result.stderr
                }
            
            return {
                "success": result.success,
                "stdout": result.stdout,
                "stderr": result.stderr,
                "outputs": outputs,
                "output_count": len(outputs),
                "execution_time": result.execution_time
            }
        except Exception as e:
            return {
                "success": False,
                "error": str(e),
                "stderr": str(e)
            }
    
    def execute_notebook(self, notebook_json: Dict) -> Dict[str, Any]:
        """Execute entire notebook"""
        cells = notebook_json.get("cells", [])
        results = []
        
        for i, cell in enumerate(cells):
            if cell.get("cell_type") != "code":
                continue
            
            source = "".join(cell.get("source", []))
            cell_result = self.execute_cell(source, cell_id=f"cell_{i}")
            
            results.append({
                "cell_index": i,
                "result": cell_result
            })
        
        return {
            "success": True,
            "cells_executed": len(results),
            "results": results
        }
    
    def cleanup(self):
        """Clean up notebook session"""
        if self.sandbox:
            self.sandbox.kill()
            self.sandbox = None

# Usage
service = JupyterNotebookService(api_key=os.getenv("HOPX_API_KEY"))
service.initialize_notebook("my-notebook")

# Execute a cell
result = service.execute_cell("""
import pandas as pd
import numpy as np

df = pd.DataFrame({
    'x': np.random.rand(10),
    'y': np.random.rand(10)
})

df  # Display dataframe
""")

print(f"Captured {result['output_count']} outputs")
print(result)

service.cleanup()

Step 2: Rich Output Rendering

Handle plots, dataframes, and other rich outputs:

class RichOutputRenderer:
    def __init__(self, sandbox: Sandbox):
        self.sandbox = sandbox
    
    def render_outputs(self, outputs: List[Dict]) -> List[Dict[str, Any]]:
        """Render rich outputs for display"""
        rendered = []
        
        for output in outputs:
            output_type = output.get("type", "")
            data = output.get("data", {})
            
            if "image/png" in output_type or "image/jpeg" in output_type:
                # Image output
                rendered.append({
                    "type": "image",
                    "format": "png" if "png" in output_type else "jpeg",
                    "data": data.get("image/png") or data.get("image/jpeg"),
                    "encoding": "base64"
                })
            
            elif "text/html" in output_type:
                # HTML output (DataFrames, etc.)
                rendered.append({
                    "type": "html",
                    "data": data.get("text/html", ""),
                    "mime_type": "text/html"
                })
            
            elif "application/json" in output_type:
                # JSON output
                rendered.append({
                    "type": "json",
                    "data": data.get("application/json", {}),
                    "mime_type": "application/json"
                })
            
            else:
                # Plain text
                rendered.append({
                    "type": "text",
                    "data": str(data),
                    "mime_type": "text/plain"
                })
        
        return rendered

# Usage
service = JupyterNotebookService(api_key=os.getenv("HOPX_API_KEY"))
service.initialize_notebook("notebook-1")

# Execute cell with plot
result = service.execute_cell("""
import matplotlib.pyplot as plt
import numpy as np

x = np.linspace(0, 10, 100)
y = np.sin(x)
plt.plot(x, y)
plt.title('Sine Wave')
plt.show()
""")

# Render outputs
renderer = RichOutputRenderer(service.sandbox)
rendered = renderer.render_outputs(result["outputs"])

for output in rendered:
    print(f"Output type: {output['type']}")
    if output['type'] == 'image':
        print(f"  Image data length: {len(output['data'])} bytes")

service.cleanup()

Step 3: Large Dataset Handling

Handle large datasets efficiently:

class LargeDatasetHandler:
    def __init__(self, sandbox: Sandbox):
        self.sandbox = sandbox
    
    def upload_dataset(self, file_path: str, data: bytes) -> Dict[str, Any]:
        """Upload large dataset to sandbox"""
        try:
            # For large files, use upload method
            self.sandbox.files.write(f"/workspace/data/{file_path}", data)
            
            return {
                "success": True,
                "path": f"/workspace/data/{file_path}",
                "size": len(data)
            }
        except Exception as e:
            return {
                "success": False,
                "error": str(e)
            }
    
    def process_large_dataset(self, dataset_path: str, chunk_size: int = 10000) -> Dict[str, Any]:
        """Process large dataset in chunks"""
        try:
            # Process in chunks to avoid memory issues
            code = f"""
import pandas as pd
import os

# Read dataset in chunks
chunk_size = {chunk_size}
dataset_path = '{dataset_path}'

chunks = []
for chunk in pd.read_csv(dataset_path, chunksize=chunk_size):
    # Process chunk
    processed = chunk.describe()
    chunks.append(processed)

# Combine results
result = pd.concat(chunks)
print(f"Processed {{len(chunks)}} chunks")
print(result)
"""
            
            result = self.sandbox.run_code(code, timeout=300)  # 5 minute timeout
            
            return {
                "success": result.success,
                "stdout": result.stdout,
                "chunks_processed": result.stdout.count("chunks") if result.success else 0
            }
        except Exception as e:
            return {
                "success": False,
                "error": str(e)
            }

# Usage
sandbox = Sandbox.create(template="code-interpreter", api_key=os.getenv("HOPX_API_KEY"))
handler = LargeDatasetHandler(sandbox)

# Upload dataset
with open("large_dataset.csv", "rb") as f:
    data = f.read()
    result = handler.upload_dataset("large_dataset.csv", data)
    print(result)

# Process
result = handler.process_large_dataset("/workspace/data/large_dataset.csv")
print(result)

sandbox.kill()

Step 4: Model Training Workflows

Support ML model training:

class ModelTrainingService:
    def __init__(self, sandbox: Sandbox):
        self.sandbox = sandbox
    
    def train_model(self, training_code: str, timeout: int = 1800) -> Dict[str, Any]:
        """Train ML model with progress monitoring"""
        try:
            # Use background execution for long training
            execution_id = self.sandbox.run_code_background(training_code)
            
            # Monitor progress
            import time
            max_wait = timeout
            waited = 0
            
            while waited < max_wait:
                time.sleep(5)  # Check every 5 seconds
                waited += 5
                
                # Check if process is still running
                processes = self.sandbox.list_processes()
                training_process = any(
                    'python' in p.get('name', '').lower() or 
                    'train' in p.get('name', '').lower()
                    for p in processes
                )
                
                if not training_process:
                    # Training completed
                    break
            
            # Check for model file
            if self.sandbox.files.exists("/workspace/model.pkl"):
                model_data = self.sandbox.files.read("/workspace/model.pkl")
                return {
                    "success": True,
                    "model_saved": True,
                    "model_size": len(model_data),
                    "execution_id": execution_id
                }
            else:
                return {
                    "success": True,
                    "model_saved": False,
                    "execution_id": execution_id
                }
                
        except Exception as e:
            return {
                "success": False,
                "error": str(e)
            }

# Usage
sandbox = Sandbox.create(template="code-interpreter", api_key=os.getenv("HOPX_API_KEY"))
trainer = ModelTrainingService(sandbox)

training_code = """
from sklearn.datasets import load_iris
from sklearn.ensemble import RandomForestClassifier
import pickle

# Load data
iris = load_iris()
X, y = iris.data, iris.target

# Train model
model = RandomForestClassifier(n_estimators=100)
model.fit(X, y)

# Save model
with open('/workspace/model.pkl', 'wb') as f:
    pickle.dump(model, f)

print("Model trained and saved!")
"""

result = trainer.train_model(training_code)
print(result)

sandbox.kill()

Best Practices

Performance

Use IPython execution mode for notebook cells to automatically capture rich outputs like DataFrames and plots.

Cell State Management: Maintain state between cells for interactive workflows
Output Caching: Cache rendered outputs to avoid re-rendering
Chunked Processing: Process large datasets in chunks
Background Execution: Use background execution for long-running training

Resource Management

Session Timeouts: Set appropriate session timeouts
Memory Monitoring: Monitor memory usage for large datasets
Cleanup: Clean up temporary files and models
Resource Limits: Set limits based on user tier

User Experience

Progress Indicators: Show execution progress for long operations
Error Messages: Provide clear, actionable error messages
Output Formatting: Format outputs for easy reading
Auto-Save: Auto-save notebook state

Real-World Examples

This pattern is used by:

Kaggle Notebooks: Data science competition platform
Google Colab: Free Jupyter notebook environment
Azure Notebooks: Cloud-based Jupyter service
Binder: Turn GitHub repos into interactive notebooks

Data Analysis Pipeline - Analysis workflows
ML Model Training Service - Machine learning workflows

Next Steps

Implement notebook format parsing (Jupyter .ipynb format)
Add support for markdown and code cells
Create a web UI for notebook editing
Implement cell execution queue
Add collaboration features

AI & LLM Integration

Educational Platforms

Development Tools

Data Science & Analytics

Testing & CI/CD

Automation & Workflows

Serverless & Edge

Marketplace & Plugins

Enterprise & SaaS

Overview

Prerequisites

Architecture

Implementation

Step 1: Notebook Cell Execution

Step 2: Rich Output Rendering

Step 3: Large Dataset Handling

Step 4: Model Training Workflows

Best Practices

Performance

Resource Management

User Experience

Real-World Examples

Next Steps

AI & LLM Integration

Educational Platforms

Development Tools

Data Science & Analytics

Testing & CI/CD

Automation & Workflows

Serverless & Edge

Marketplace & Plugins

Enterprise & SaaS

​Overview

​Prerequisites

​Architecture

​Implementation

​Step 1: Notebook Cell Execution

​Step 2: Rich Output Rendering

​Step 3: Large Dataset Handling

​Step 4: Model Training Workflows

​Best Practices

​Performance

​Resource Management

​User Experience

​Real-World Examples

​Related Cookbooks

​Next Steps

Overview

Prerequisites

Architecture

Implementation

Step 1: Notebook Cell Execution

Step 2: Rich Output Rendering

Step 3: Large Dataset Handling

Step 4: Model Training Workflows

Best Practices

Performance

Resource Management

User Experience

Real-World Examples

Related Cookbooks

Next Steps