Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.hopx.ai/llms.txt

Use this file to discover all available pages before exploring further.

Advanced X11 desktop automation features including OCR, element finding, and advanced interactions.

Prerequisites

Before you begin, make sure you have:
  • VNC server running - A VNC server must be started (see VNC Server)
  • Desktop template - A sandbox with desktop support enabled
  • Active sandbox - A running sandbox with desktop capabilities
  • Basic desktop automation - Familiarity with basic desktop operations is helpful

Overview

X11 advanced features enable:
  • OCR (Optical Character Recognition) text extraction
  • Finding UI elements by text
  • Waiting for elements to appear
  • Advanced drag and drop
  • Window capture
  • Hotkey execution
Desktop automation requires a template with desktop support. Ensure your sandbox has desktop capabilities enabled.

OCR (Optical Character Recognition)

Extract text from screen regions using OCR:
from hopx_ai import Sandbox

sandbox = Sandbox.create(template="desktop")

# Extract text from region
text = sandbox.desktop.ocr(100, 100, 400, 200)
print(f"Extracted text: {text}")

# OCR with custom language
text = sandbox.desktop.ocr(100, 100, 400, 200, language="eng")
print(f"Text: {text}")
Expected Output:
Extracted text: Hello World
Text: Hello World

Finding Elements

Find UI elements by text:
# Find element by text
element = sandbox.desktop.find_element("Submit")

if element:
    print(f"Found at: ({element['x']}, {element['y']})")
    print(f"Size: {element['width']}x{element['height']}")
    
    # Click the element
    sandbox.desktop.click(element['x'], element['y'])
else:
    print("Element not found")
Expected Output:
Found at: (150, 200)
Size: 100x30

Waiting for Elements

Wait for an element to appear:
# Wait for element to appear (default: 30 seconds)
element = sandbox.desktop.wait_for("Loading complete", timeout=60)

print(f"Element found at: ({element['x']}, {element['y']})")

# Click when found
sandbox.desktop.click(element['x'], element['y'])
Expected Output:
Element found at: (200, 300)

Getting Element Bounds

Get bounding box of an element:
# Get element bounds
bounds = sandbox.desktop.get_bounds("OK Button")

print(f"Button at: {bounds['x']}, {bounds['y']}")
print(f"Size: {bounds['width']}x{bounds['height']}")

# Click center of button
center_x = bounds['x'] + bounds['width'] // 2
center_y = bounds['y'] + bounds['height'] // 2
sandbox.desktop.click(center_x, center_y)
Expected Output:
Button at: 300, 400
Size: 80x25

Advanced Drag and Drop

Drag and drop operations:
# Drag and drop
sandbox.desktop.drag_drop(100, 200, 500, 300)

# Drag file to folder
sandbox.desktop.drag_drop(50, 50, 400, 300)
Expected Output:
(Drag and drop operation completed)

Window Capture

Capture specific window:
# Capture active window
img_bytes = sandbox.desktop.capture_window()

# Capture specific window
windows = sandbox.desktop.get_windows()
if windows:
    window_id = windows[0].id
    img_bytes = sandbox.desktop.capture_window(window_id)

# Save to file
with open('window.png', 'wb') as f:
    f.write(img_bytes)
Expected Output:
(Window captured and saved to window.png)

Hotkeys

Execute hotkey combinations:
# Copy: Ctrl+C
sandbox.desktop.hotkey(['ctrl'], 'c')

# Paste: Ctrl+V
sandbox.desktop.hotkey(['ctrl'], 'v')

# Switch window: Alt+Tab
sandbox.desktop.hotkey(['alt'], 'tab')

# Screenshot: Ctrl+Shift+P
sandbox.desktop.hotkey(['ctrl', 'shift'], 'p')
Expected Output:
(Hotkey combinations executed)

Complete Example

Complete workflow using advanced features:
from hopx_ai import Sandbox
import time

sandbox = Sandbox.create(template="desktop")

try:
    # Wait for application to load
    element = sandbox.desktop.wait_for("Application Ready", timeout=30)
    print("Application loaded")
    
    # Find and click button
    button = sandbox.desktop.find_element("Start")
    if button:
        sandbox.desktop.click(button['x'], button['y'])
    
    # Wait for dialog
    dialog = sandbox.desktop.wait_for("Confirm", timeout=10)
    
    # Extract text from dialog using OCR
    dialog_text = sandbox.desktop.ocr(
        dialog['x'], dialog['y'],
        dialog['width'], dialog['height']
    )
    print(f"Dialog text: {dialog_text}")
    
    # Find OK button
    ok_bounds = sandbox.desktop.get_bounds("OK")
    center_x = ok_bounds['x'] + ok_bounds['width'] // 2
    center_y = ok_bounds['y'] + ok_bounds['height'] // 2
    sandbox.desktop.click(center_x, center_y)
    
finally:
    sandbox.kill()
Expected Output:
Application loaded
Dialog text: Please confirm this action

OCR Languages

Supported OCR languages:
  • eng: English (default)
  • spa: Spanish
  • fra: French
  • deu: German
  • And more (check Tesseract language support)

Next Steps