Skip to main content
Advanced X11 desktop automation features including OCR, element finding, and advanced interactions.

Prerequisites

Before you begin, make sure you have:
  • VNC server running - A VNC server must be started (see VNC Server)
  • Desktop template - A sandbox with desktop support enabled
  • Active sandbox - A running sandbox with desktop capabilities
  • Basic desktop automation - Familiarity with basic desktop operations is helpful

Overview

X11 advanced features enable:
  • OCR (Optical Character Recognition) text extraction
  • Finding UI elements by text
  • Waiting for elements to appear
  • Advanced drag and drop
  • Window capture
  • Hotkey execution
Desktop automation requires a template with desktop support. Ensure your sandbox has desktop capabilities enabled.

OCR (Optical Character Recognition)

Extract text from screen regions using OCR:
  • Python
  • JavaScript
from hopx_ai import Sandbox

sandbox = Sandbox.create(template="desktop")

# Extract text from region
text = sandbox.desktop.ocr(100, 100, 400, 200)
print(f"Extracted text: {text}")

# OCR with custom language
text = sandbox.desktop.ocr(100, 100, 400, 200, language="eng")
print(f"Text: {text}")
Expected Output:
Extracted text: Hello World
Text: Hello World

Finding Elements

Find UI elements by text:
  • Python
  • JavaScript
# Find element by text
element = sandbox.desktop.find_element("Submit")

if element:
    print(f"Found at: ({element['x']}, {element['y']})")
    print(f"Size: {element['width']}x{element['height']}")
    
    # Click the element
    sandbox.desktop.click(element['x'], element['y'])
else:
    print("Element not found")
Expected Output:
Found at: (150, 200)
Size: 100x30

Waiting for Elements

Wait for an element to appear:
  • Python
  • JavaScript
# Wait for element to appear (default: 30 seconds)
element = sandbox.desktop.wait_for("Loading complete", timeout=60)

print(f"Element found at: ({element['x']}, {element['y']})")

# Click when found
sandbox.desktop.click(element['x'], element['y'])
Expected Output:
Element found at: (200, 300)

Getting Element Bounds

Get bounding box of an element:
  • Python
  • JavaScript
# Get element bounds
bounds = sandbox.desktop.get_bounds("OK Button")

print(f"Button at: {bounds['x']}, {bounds['y']}")
print(f"Size: {bounds['width']}x{bounds['height']}")

# Click center of button
center_x = bounds['x'] + bounds['width'] // 2
center_y = bounds['y'] + bounds['height'] // 2
sandbox.desktop.click(center_x, center_y)
Expected Output:
Button at: 300, 400
Size: 80x25

Advanced Drag and Drop

Drag and drop operations:
  • Python
  • JavaScript
# Drag and drop
sandbox.desktop.drag_drop(100, 200, 500, 300)

# Drag file to folder
sandbox.desktop.drag_drop(50, 50, 400, 300)
Expected Output:
(Drag and drop operation completed)

Window Capture

Capture specific window:
  • Python
  • JavaScript
# Capture active window
img_bytes = sandbox.desktop.capture_window()

# Capture specific window
windows = sandbox.desktop.get_windows()
if windows:
    window_id = windows[0].id
    img_bytes = sandbox.desktop.capture_window(window_id)

# Save to file
with open('window.png', 'wb') as f:
    f.write(img_bytes)
Expected Output:
(Window captured and saved to window.png)

Hotkeys

Execute hotkey combinations:
  • Python
  • JavaScript
# Copy: Ctrl+C
sandbox.desktop.hotkey(['ctrl'], 'c')

# Paste: Ctrl+V
sandbox.desktop.hotkey(['ctrl'], 'v')

# Switch window: Alt+Tab
sandbox.desktop.hotkey(['alt'], 'tab')

# Screenshot: Ctrl+Shift+P
sandbox.desktop.hotkey(['ctrl', 'shift'], 'p')
Expected Output:
(Hotkey combinations executed)

Complete Example

Complete workflow using advanced features:
  • Python
  • JavaScript
from hopx_ai import Sandbox
import time

sandbox = Sandbox.create(template="desktop")

try:
    # Wait for application to load
    element = sandbox.desktop.wait_for("Application Ready", timeout=30)
    print("Application loaded")
    
    # Find and click button
    button = sandbox.desktop.find_element("Start")
    if button:
        sandbox.desktop.click(button['x'], button['y'])
    
    # Wait for dialog
    dialog = sandbox.desktop.wait_for("Confirm", timeout=10)
    
    # Extract text from dialog using OCR
    dialog_text = sandbox.desktop.ocr(
        dialog['x'], dialog['y'],
        dialog['width'], dialog['height']
    )
    print(f"Dialog text: {dialog_text}")
    
    # Find OK button
    ok_bounds = sandbox.desktop.get_bounds("OK")
    center_x = ok_bounds['x'] + ok_bounds['width'] // 2
    center_y = ok_bounds['y'] + ok_bounds['height'] // 2
    sandbox.desktop.click(center_x, center_y)
    
finally:
    sandbox.kill()
Expected Output:
Application loaded
Dialog text: Please confirm this action

OCR Languages

Supported OCR languages:
  • eng: English (default)
  • spa: Spanish
  • fra: French
  • deu: German
  • And more (check Tesseract language support)

Next Steps