> ## Documentation Index
> Fetch the complete documentation index at: https://docs.hopx.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Cloud Jupyter Notebook Service

> Build a Kaggle/Colab-style notebook hosting service with rich output rendering, large dataset handling, and model training workflows

Build a production-ready cloud Jupyter notebook service that provides notebook execution, rich output rendering, and data science workflows. This cookbook demonstrates how to create a service similar to Kaggle or Google Colab using HopX.

## Overview

Cloud Jupyter notebook services allow data scientists to run notebooks in the cloud without local setup. The service executes notebook cells, captures rich outputs (plots, dataframes), handles large datasets, and supports model training workflows. HopX provides the secure execution environment needed for this use case.

## Prerequisites

* HopX API key ([Get one here](https://console.hopx.dev/api-keys))
* Python 3.8+ or Node.js 16+
* Understanding of Jupyter notebook format
* Basic knowledge of data science workflows

## Architecture

```
┌──────────────┐
│   Notebook   │ Cell execution requests
│     UI       │
└──────┬───────┘
       │
       ▼
┌─────────────────┐
│  Notebook       │ Parse, execute, capture
│    Service      │
└──────┬──────────┘
       │
       ▼
┌─────────────────┐
│  HopX Sandbox   │ Secure execution
└──────┬──────────┘
       │
       ▼
┌─────────────────┐
│  Rich Outputs   │ Plots, DataFrames, HTML
└─────────────────┘
```

## Implementation

### Step 1: Notebook Cell Execution

Execute individual notebook cells and capture outputs:

<CodeGroup>
  ```python Python theme={null}
  from hopx_ai import Sandbox
  import json
  import os
  from typing import Dict, List, Any

  class JupyterNotebookService:
      def __init__(self, api_key: str):
          self.api_key = api_key
          self.sandbox = None
          self.cell_state = {}  # Store variables between cells
      
      def initialize_notebook(self, notebook_id: str) -> Dict[str, Any]:
          """Initialize a new notebook session"""
          try:
              self.sandbox = Sandbox.create(
                  template="code-interpreter",
                  api_key=self.api_key,
                  timeout_seconds=3600  # 1 hour session
              )
              
              # Set up data science environment
              self.sandbox.env.set_all({
                  "JUPYTER_MODE": "true",
                  "PYTHONPATH": "/workspace"
              })
              
              return {
                  "success": True,
                  "notebook_id": notebook_id,
                  "sandbox_id": self.sandbox.sandbox_id
              }
          except Exception as e:
              return {
                  "success": False,
                  "error": str(e)
              }
      
      def execute_cell(self, cell_code: str, cell_id: str = None) -> Dict[str, Any]:
          """Execute a notebook cell"""
          try:
              # Use IPython execution for notebook-like behavior
              result = self.sandbox.run_ipython(cell_code)
              
              # Capture outputs
              outputs = []
              if result.rich_outputs:
                  for output in result.rich_outputs:
                      outputs.append({
                          "type": output.type,
                          "data": output.data
                      })
              
              # Store cell state
              if cell_id:
                  self.cell_state[cell_id] = {
                      "code": cell_code,
                      "outputs": outputs,
                      "stdout": result.stdout,
                      "stderr": result.stderr
                  }
              
              return {
                  "success": result.success,
                  "stdout": result.stdout,
                  "stderr": result.stderr,
                  "outputs": outputs,
                  "output_count": len(outputs),
                  "execution_time": result.execution_time
              }
          except Exception as e:
              return {
                  "success": False,
                  "error": str(e),
                  "stderr": str(e)
              }
      
      def execute_notebook(self, notebook_json: Dict) -> Dict[str, Any]:
          """Execute entire notebook"""
          cells = notebook_json.get("cells", [])
          results = []
          
          for i, cell in enumerate(cells):
              if cell.get("cell_type") != "code":
                  continue
              
              source = "".join(cell.get("source", []))
              cell_result = self.execute_cell(source, cell_id=f"cell_{i}")
              
              results.append({
                  "cell_index": i,
                  "result": cell_result
              })
          
          return {
              "success": True,
              "cells_executed": len(results),
              "results": results
          }
      
      def cleanup(self):
          """Clean up notebook session"""
          if self.sandbox:
              self.sandbox.kill()
              self.sandbox = None

  # Usage
  service = JupyterNotebookService(api_key=os.getenv("HOPX_API_KEY"))
  service.initialize_notebook("my-notebook")

  # Execute a cell
  result = service.execute_cell("""
  import pandas as pd
  import numpy as np

  df = pd.DataFrame({
      'x': np.random.rand(10),
      'y': np.random.rand(10)
  })

  df  # Display dataframe
  """)

  print(f"Captured {result['output_count']} outputs")
  print(result)

  service.cleanup()
  ```

  ```javascript JavaScript theme={null}
  import { Sandbox } from '@hopx-ai/sdk';

  class JupyterNotebookService {
      constructor(apiKey) {
          this.apiKey = apiKey;
          this.sandbox = null;
          this.cellState = {};  // Store variables between cells
      }
      
      async initializeNotebook(notebookId) {
          try {
              this.sandbox = await Sandbox.create({
                  template: 'code-interpreter',
                  apiKey: this.apiKey,
                  timeoutSeconds: 3600  // 1 hour session
              });
              
              // Set up data science environment
              await this.sandbox.env.setAll({
                  JUPYTER_MODE: 'true',
                  PYTHONPATH: '/workspace'
              });
              
              return {
                  success: true,
                  notebookId,
                  sandboxId: this.sandbox.sandboxId
              };
          } catch (error) {
              return {
                  success: false,
                  error: error.message
              };
          }
      }
      
      async executeCell(cellCode, cellId = null) {
          try {
              // Use IPython execution for notebook-like behavior
              const result = await this.sandbox.runIpython(cellCode);
              
              // Capture outputs
              const outputs = [];
              if (result.richOutputs) {
                  for (const output of result.richOutputs) {
                      outputs.push({
                          type: output.type,
                          data: output.data
                      });
                  }
              }
              
              // Store cell state
              if (cellId) {
                  this.cellState[cellId] = {
                      code: cellCode,
                      outputs,
                      stdout: result.stdout,
                      stderr: result.stderr
                  };
              }
              
              return {
                  success: result.success,
                  stdout: result.stdout,
                  stderr: result.stderr,
                  outputs,
                  outputCount: outputs.length,
                  executionTime: result.execution_time
              };
          } catch (error) {
              return {
                  success: false,
                  error: error.message,
                  stderr: error.toString()
              };
          }
      }
      
      async executeNotebook(notebookJson) {
          const cells = notebookJson.cells || [];
          const results = [];
          
          for (let i = 0; i < cells.length; i++) {
              const cell = cells[i];
              if (cell.cell_type !== 'code') {
                  continue;
              }
              
              const source = cell.source.join('');
              const cellResult = await this.executeCell(source, `cell_${i}`);
              
              results.push({
                  cellIndex: i,
                  result: cellResult
              });
          }
          
          return {
              success: true,
              cellsExecuted: results.length,
              results
          };
      }
      
      async cleanup() {
          if (this.sandbox) {
              await this.sandbox.kill();
              this.sandbox = null;
          }
      }
  }

  // Usage
  const service = new JupyterNotebookService(process.env.HOPX_API_KEY);
  await service.initializeNotebook('my-notebook');

  // Execute a cell
  const result = await service.executeCell(`
  import pandas as pd
  import numpy as np

  df = pd.DataFrame({
      'x': np.random.rand(10),
      'y': np.random.rand(10)
  })

  df  # Display dataframe
  `);

  console.log(`Captured ${result.outputCount} outputs`);
  console.log(result);

  await service.cleanup();
  ```
</CodeGroup>

### Step 2: Rich Output Rendering

Handle plots, dataframes, and other rich outputs:

<CodeGroup>
  ```python Python theme={null}
  class RichOutputRenderer:
      def __init__(self, sandbox: Sandbox):
          self.sandbox = sandbox
      
      def render_outputs(self, outputs: List[Dict]) -> List[Dict[str, Any]]:
          """Render rich outputs for display"""
          rendered = []
          
          for output in outputs:
              output_type = output.get("type", "")
              data = output.get("data", {})
              
              if "image/png" in output_type or "image/jpeg" in output_type:
                  # Image output
                  rendered.append({
                      "type": "image",
                      "format": "png" if "png" in output_type else "jpeg",
                      "data": data.get("image/png") or data.get("image/jpeg"),
                      "encoding": "base64"
                  })
              
              elif "text/html" in output_type:
                  # HTML output (DataFrames, etc.)
                  rendered.append({
                      "type": "html",
                      "data": data.get("text/html", ""),
                      "mime_type": "text/html"
                  })
              
              elif "application/json" in output_type:
                  # JSON output
                  rendered.append({
                      "type": "json",
                      "data": data.get("application/json", {}),
                      "mime_type": "application/json"
                  })
              
              else:
                  # Plain text
                  rendered.append({
                      "type": "text",
                      "data": str(data),
                      "mime_type": "text/plain"
                  })
          
          return rendered

  # Usage
  service = JupyterNotebookService(api_key=os.getenv("HOPX_API_KEY"))
  service.initialize_notebook("notebook-1")

  # Execute cell with plot
  result = service.execute_cell("""
  import matplotlib.pyplot as plt
  import numpy as np

  x = np.linspace(0, 10, 100)
  y = np.sin(x)
  plt.plot(x, y)
  plt.title('Sine Wave')
  plt.show()
  """)

  # Render outputs
  renderer = RichOutputRenderer(service.sandbox)
  rendered = renderer.render_outputs(result["outputs"])

  for output in rendered:
      print(f"Output type: {output['type']}")
      if output['type'] == 'image':
          print(f"  Image data length: {len(output['data'])} bytes")

  service.cleanup()
  ```

  ```javascript JavaScript theme={null}
  class RichOutputRenderer {
      constructor(sandbox) {
          this.sandbox = sandbox;
      }
      
      renderOutputs(outputs) {
          const rendered = [];
          
          for (const output of outputs) {
              const outputType = output.type || '';
              const data = output.data || {};
              
              if (outputType.includes('image/png') || outputType.includes('image/jpeg')) {
                  // Image output
                  rendered.push({
                      type: 'image',
                      format: outputType.includes('png') ? 'png' : 'jpeg',
                      data: data['image/png'] || data['image/jpeg'],
                      encoding: 'base64'
                  });
              } else if (outputType.includes('text/html')) {
                  // HTML output (DataFrames, etc.)
                  rendered.push({
                      type: 'html',
                      data: data['text/html'] || '',
                      mimeType: 'text/html'
                  });
              } else if (outputType.includes('application/json')) {
                  // JSON output
                  rendered.push({
                      type: 'json',
                      data: data['application/json'] || {},
                      mimeType: 'application/json'
                  });
              } else {
                  // Plain text
                  rendered.push({
                      type: 'text',
                      data: String(data),
                      mimeType: 'text/plain'
                  });
              }
          }
          
          return rendered;
      }
  }

  // Usage
  const service = new JupyterNotebookService(process.env.HOPX_API_KEY);
  await service.initializeNotebook('notebook-1');

  // Execute cell with plot
  const result = await service.executeCell(`
  import matplotlib.pyplot as plt
  import numpy as np

  x = np.linspace(0, 10, 100)
  y = np.sin(x)
  plt.plot(x, y)
  plt.title('Sine Wave')
  plt.show()
  `);

  // Render outputs
  const renderer = new RichOutputRenderer(service.sandbox);
  const rendered = renderer.renderOutputs(result.outputs);

  for (const output of rendered) {
      console.log(`Output type: ${output.type}`);
      if (output.type === 'image') {
          console.log(`  Image data length: ${output.data.length} bytes`);
      }
  }

  await service.cleanup();
  ```
</CodeGroup>

### Step 3: Large Dataset Handling

Handle large datasets efficiently:

<CodeGroup>
  ```python Python theme={null}
  class LargeDatasetHandler:
      def __init__(self, sandbox: Sandbox):
          self.sandbox = sandbox
      
      def upload_dataset(self, file_path: str, data: bytes) -> Dict[str, Any]:
          """Upload large dataset to sandbox"""
          try:
              # For large files, use upload method
              self.sandbox.files.write(f"/workspace/data/{file_path}", data)
              
              return {
                  "success": True,
                  "path": f"/workspace/data/{file_path}",
                  "size": len(data)
              }
          except Exception as e:
              return {
                  "success": False,
                  "error": str(e)
              }
      
      def process_large_dataset(self, dataset_path: str, chunk_size: int = 10000) -> Dict[str, Any]:
          """Process large dataset in chunks"""
          try:
              # Process in chunks to avoid memory issues
              code = f"""
  import pandas as pd
  import os

  # Read dataset in chunks
  chunk_size = {chunk_size}
  dataset_path = '{dataset_path}'

  chunks = []
  for chunk in pd.read_csv(dataset_path, chunksize=chunk_size):
      # Process chunk
      processed = chunk.describe()
      chunks.append(processed)

  # Combine results
  result = pd.concat(chunks)
  print(f"Processed {{len(chunks)}} chunks")
  print(result)
  """
              
              result = self.sandbox.run_code(code, timeout=300)  # 5 minute timeout
              
              return {
                  "success": result.success,
                  "stdout": result.stdout,
                  "chunks_processed": result.stdout.count("chunks") if result.success else 0
              }
          except Exception as e:
              return {
                  "success": False,
                  "error": str(e)
              }

  # Usage
  sandbox = Sandbox.create(template="code-interpreter", api_key=os.getenv("HOPX_API_KEY"))
  handler = LargeDatasetHandler(sandbox)

  # Upload dataset
  with open("large_dataset.csv", "rb") as f:
      data = f.read()
      result = handler.upload_dataset("large_dataset.csv", data)
      print(result)

  # Process
  result = handler.process_large_dataset("/workspace/data/large_dataset.csv")
  print(result)

  sandbox.kill()
  ```

  ```javascript JavaScript theme={null}
  class LargeDatasetHandler {
      constructor(sandbox) {
          this.sandbox = sandbox;
      }
      
      async uploadDataset(filePath, data) {
          try {
              // For large files, use write method
              await this.sandbox.files.write(`/workspace/data/${filePath}`, data);
              
              return {
                  success: true,
                  path: `/workspace/data/${filePath}`,
                  size: data.length
              };
          } catch (error) {
              return {
                  success: false,
                  error: error.message
              };
          }
      }
      
      async processLargeDataset(datasetPath, chunkSize = 10000) {
          try {
              // Process in chunks to avoid memory issues
              const code = `
  import pandas as pd
  import os

  # Read dataset in chunks
  chunk_size = ${chunkSize}
  dataset_path = '${datasetPath}'

  chunks = []
  for chunk in pd.read_csv(dataset_path, chunksize=chunk_size):
      # Process chunk
      processed = chunk.describe()
      chunks.append(processed)

  # Combine results
  result = pd.concat(chunks)
  print(f"Processed {len(chunks)} chunks")
  print(result)
  `;
              
              const result = await this.sandbox.runCode(code, { timeout: 300 });  // 5 minute timeout
              
              return {
                  success: result.success,
                  stdout: result.stdout,
                  chunksProcessed: result.success ? (result.stdout.match(/chunks/g) || []).length : 0
              };
          } catch (error) {
              return {
                  success: false,
                  error: error.message
              };
          }
      }
  }

  // Usage
  const sandbox = await Sandbox.create({
      template: 'code-interpreter',
      apiKey: process.env.HOPX_API_KEY
  });
  const handler = new LargeDatasetHandler(sandbox);

  // Upload dataset (assuming you have the file data)
  const fs = require('fs');
  const data = fs.readFileSync('large_dataset.csv');
  const uploadResult = await handler.uploadDataset('large_dataset.csv', data);
  console.log(uploadResult);

  // Process
  const processResult = await handler.processLargeDataset('/workspace/data/large_dataset.csv');
  console.log(processResult);

  await sandbox.kill();
  ```
</CodeGroup>

### Step 4: Model Training Workflows

Support ML model training:

<CodeGroup>
  ```python Python theme={null}
  class ModelTrainingService:
      def __init__(self, sandbox: Sandbox):
          self.sandbox = sandbox
      
      def train_model(self, training_code: str, timeout: int = 1800) -> Dict[str, Any]:
          """Train ML model with progress monitoring"""
          try:
              # Use background execution for long training
              execution_id = self.sandbox.run_code_background(training_code)
              
              # Monitor progress
              import time
              max_wait = timeout
              waited = 0
              
              while waited < max_wait:
                  time.sleep(5)  # Check every 5 seconds
                  waited += 5
                  
                  # Check if process is still running
                  processes = self.sandbox.list_processes()
                  training_process = any(
                      'python' in p.get('name', '').lower() or 
                      'train' in p.get('name', '').lower()
                      for p in processes
                  )
                  
                  if not training_process:
                      # Training completed
                      break
              
              # Check for model file
              if self.sandbox.files.exists("/workspace/model.pkl"):
                  model_data = self.sandbox.files.read("/workspace/model.pkl")
                  return {
                      "success": True,
                      "model_saved": True,
                      "model_size": len(model_data),
                      "execution_id": execution_id
                  }
              else:
                  return {
                      "success": True,
                      "model_saved": False,
                      "execution_id": execution_id
                  }
                  
          except Exception as e:
              return {
                  "success": False,
                  "error": str(e)
              }

  # Usage
  sandbox = Sandbox.create(template="code-interpreter", api_key=os.getenv("HOPX_API_KEY"))
  trainer = ModelTrainingService(sandbox)

  training_code = """
  from sklearn.datasets import load_iris
  from sklearn.ensemble import RandomForestClassifier
  import pickle

  # Load data
  iris = load_iris()
  X, y = iris.data, iris.target

  # Train model
  model = RandomForestClassifier(n_estimators=100)
  model.fit(X, y)

  # Save model
  with open('/workspace/model.pkl', 'wb') as f:
      pickle.dump(model, f)

  print("Model trained and saved!")
  """

  result = trainer.train_model(training_code)
  print(result)

  sandbox.kill()
  ```

  ```javascript JavaScript theme={null}
  class ModelTrainingService {
      constructor(sandbox) {
          this.sandbox = sandbox;
      }
      
      async trainModel(trainingCode, timeout = 1800) {
          try {
              // Use background execution for long training
              const executionId = await this.sandbox.runCodeBackground(trainingCode);
              
              // Monitor progress
              const maxWait = timeout;
              let waited = 0;
              
              while (waited < maxWait) {
                  await new Promise(resolve => setTimeout(resolve, 5000));  // Wait 5 seconds
                  waited += 5;
                  
                  // Check if process is still running
                  const processes = await this.sandbox.listProcesses();
                  const trainingProcess = processes.some(p => {
                      const name = (p.name || '').toLowerCase();
                      return name.includes('python') || name.includes('train');
                  });
                  
                  if (!trainingProcess) {
                      // Training completed
                      break;
                  }
              }
              
              // Check for model file
              const modelExists = await this.sandbox.files.exists('/workspace/model.pkl');
              if (modelExists) {
                  const modelData = await this.sandbox.files.read('/workspace/model.pkl');
                  return {
                      success: true,
                      modelSaved: true,
                      modelSize: modelData.length,
                      executionId
                  };
              } else {
                  return {
                      success: true,
                      modelSaved: false,
                      executionId
                  };
              }
              
          } catch (error) {
              return {
                  success: false,
                  error: error.message
              };
          }
      }
  }

  // Usage
  const sandbox = await Sandbox.create({
      template: 'code-interpreter',
      apiKey: process.env.HOPX_API_KEY
  });
  const trainer = new ModelTrainingService(sandbox);

  const trainingCode = `
  from sklearn.datasets import load_iris
  from sklearn.ensemble import RandomForestClassifier
  import pickle

  # Load data
  iris = load_iris()
  X, y = iris.data, iris.target

  # Train model
  model = RandomForestClassifier(n_estimators=100)
  model.fit(X, y)

  # Save model
  with open('/workspace/model.pkl', 'wb') as f:
      pickle.dump(model, f)

  print("Model trained and saved!")
  `;

  const result = await trainer.trainModel(trainingCode);
  console.log(result);

  await sandbox.kill();
  ```
</CodeGroup>

## Best Practices

### Performance

<Tip>
  Use IPython execution mode for notebook cells to automatically capture rich outputs like DataFrames and plots.
</Tip>

1. **Cell State Management**: Maintain state between cells for interactive workflows
2. **Output Caching**: Cache rendered outputs to avoid re-rendering
3. **Chunked Processing**: Process large datasets in chunks
4. **Background Execution**: Use background execution for long-running training

### Resource Management

1. **Session Timeouts**: Set appropriate session timeouts
2. **Memory Monitoring**: Monitor memory usage for large datasets
3. **Cleanup**: Clean up temporary files and models
4. **Resource Limits**: Set limits based on user tier

### User Experience

1. **Progress Indicators**: Show execution progress for long operations
2. **Error Messages**: Provide clear, actionable error messages
3. **Output Formatting**: Format outputs for easy reading
4. **Auto-Save**: Auto-save notebook state

## Real-World Examples

This pattern is used by:

* **Kaggle Notebooks**: Data science competition platform
* **Google Colab**: Free Jupyter notebook environment
* **Azure Notebooks**: Cloud-based Jupyter service
* **Binder**: Turn GitHub repos into interactive notebooks

## Related Cookbooks

* [Data Analysis Pipeline](/cookbooks/data-science/analysis-pipeline) - Analysis workflows
* [ML Model Training Service](/cookbooks/data-science/ml-training-service) - Machine learning workflows

## Next Steps

1. Implement notebook format parsing (Jupyter .ipynb format)
2. Add support for markdown and code cells
3. Create a web UI for notebook editing
4. Implement cell execution queue
5. Add collaboration features
