X11 Advanced Features

Advanced X11 desktop automation features including OCR, element finding, and advanced interactions.

Prerequisites

Before you begin, make sure you have:

VNC server running - A VNC server must be started (see VNC Server)
Desktop template - A sandbox with desktop support enabled
Active sandbox - A running sandbox with desktop capabilities
Basic desktop automation - Familiarity with basic desktop operations is helpful

Overview

X11 advanced features enable:

OCR (Optical Character Recognition) text extraction
Finding UI elements by text
Waiting for elements to appear
Advanced drag and drop
Window capture
Hotkey execution

Desktop automation requires a template with desktop support. Ensure your sandbox has desktop capabilities enabled.

OCR (Optical Character Recognition)

Extract text from screen regions using OCR:

Python
JavaScript

from hopx_ai import Sandbox

sandbox = Sandbox.create(template="desktop")

# Extract text from region
text = sandbox.desktop.ocr(100, 100, 400, 200)
print(f"Extracted text: {text}")

# OCR with custom language
text = sandbox.desktop.ocr(100, 100, 400, 200, language="eng")
print(f"Text: {text}")

Expected Output:

Extracted text: Hello World
Text: Hello World

Finding Elements

Find UI elements by text:

Python
JavaScript

# Find element by text
element = sandbox.desktop.find_element("Submit")

if element:
    print(f"Found at: ({element['x']}, {element['y']})")
    print(f"Size: {element['width']}x{element['height']}")
    
    # Click the element
    sandbox.desktop.click(element['x'], element['y'])
else:
    print("Element not found")

Expected Output:

Found at: (150, 200)
Size: 100x30

Waiting for Elements

Wait for an element to appear:

Python
JavaScript

# Wait for element to appear (default: 30 seconds)
element = sandbox.desktop.wait_for("Loading complete", timeout=60)

print(f"Element found at: ({element['x']}, {element['y']})")

# Click when found
sandbox.desktop.click(element['x'], element['y'])

Expected Output:

Element found at: (200, 300)

Getting Element Bounds

Get bounding box of an element:

Python
JavaScript

# Get element bounds
bounds = sandbox.desktop.get_bounds("OK Button")

print(f"Button at: {bounds['x']}, {bounds['y']}")
print(f"Size: {bounds['width']}x{bounds['height']}")

# Click center of button
center_x = bounds['x'] + bounds['width'] // 2
center_y = bounds['y'] + bounds['height'] // 2
sandbox.desktop.click(center_x, center_y)

Expected Output:

Button at: 300, 400
Size: 80x25

Advanced Drag and Drop

Drag and drop operations:

Python
JavaScript

# Drag and drop
sandbox.desktop.drag_drop(100, 200, 500, 300)

# Drag file to folder
sandbox.desktop.drag_drop(50, 50, 400, 300)

Expected Output:

(Drag and drop operation completed)

Window Capture

Capture specific window:

Python
JavaScript

# Capture active window
img_bytes = sandbox.desktop.capture_window()

# Capture specific window
windows = sandbox.desktop.get_windows()
if windows:
    window_id = windows[0].id
    img_bytes = sandbox.desktop.capture_window(window_id)

# Save to file
with open('window.png', 'wb') as f:
    f.write(img_bytes)

Expected Output:

(Window captured and saved to window.png)

Hotkeys

Execute hotkey combinations:

Python
JavaScript

# Copy: Ctrl+C
sandbox.desktop.hotkey(['ctrl'], 'c')

# Paste: Ctrl+V
sandbox.desktop.hotkey(['ctrl'], 'v')

# Switch window: Alt+Tab
sandbox.desktop.hotkey(['alt'], 'tab')

# Screenshot: Ctrl+Shift+P
sandbox.desktop.hotkey(['ctrl', 'shift'], 'p')

Expected Output:

(Hotkey combinations executed)

Complete Example

Complete workflow using advanced features:

Python
JavaScript

from hopx_ai import Sandbox
import time

sandbox = Sandbox.create(template="desktop")

try:
    # Wait for application to load
    element = sandbox.desktop.wait_for("Application Ready", timeout=30)
    print("Application loaded")
    
    # Find and click button
    button = sandbox.desktop.find_element("Start")
    if button:
        sandbox.desktop.click(button['x'], button['y'])
    
    # Wait for dialog
    dialog = sandbox.desktop.wait_for("Confirm", timeout=10)
    
    # Extract text from dialog using OCR
    dialog_text = sandbox.desktop.ocr(
        dialog['x'], dialog['y'],
        dialog['width'], dialog['height']
    )
    print(f"Dialog text: {dialog_text}")
    
    # Find OK button
    ok_bounds = sandbox.desktop.get_bounds("OK")
    center_x = ok_bounds['x'] + ok_bounds['width'] // 2
    center_y = ok_bounds['y'] + ok_bounds['height'] // 2
    sandbox.desktop.click(center_x, center_y)
    
finally:
    sandbox.kill()

Expected Output:

Application loaded
Dialog text: Please confirm this action

OCR Languages

Supported OCR languages:

eng: English (default)
spa: Spanish
fra: French
deu: German
And more (check Tesseract language support)

Screenshots - Basic screenshot capture
Mouse Control - Mouse operations
Keyboard Control - Keyboard operations
SDK: sandbox.desktop.ocr() - Python SDK method

Next Steps

Learn about Screenshots for basic image capture
Explore Mouse Control and Keyboard Control for interactions
Review VNC Server for remote desktop access
Mouse Control - Basic mouse operations
Keyboard Control - Basic keyboard operations
Screenshots - Screenshot capture
CLI System Commands - System operations from CLI

Home

Get Started

Sandboxes

Code Execution

Filesystem

Commands

Environment Variables

Templates

Desktop Automation

Terminal

Observability

Cache

Prerequisites

Overview

OCR (Optical Character Recognition)

Finding Elements

Waiting for Elements

Getting Element Bounds

Advanced Drag and Drop

Window Capture

Hotkeys

Complete Example

OCR Languages

Next Steps

Home

Get Started

Sandboxes

Code Execution

Filesystem

Commands

Environment Variables

Templates

Desktop Automation

Terminal

Observability

Cache

​Prerequisites

​Overview

​OCR (Optical Character Recognition)

​Finding Elements

​Waiting for Elements

​Getting Element Bounds

​Advanced Drag and Drop

​Window Capture

​Hotkeys

​Complete Example

​OCR Languages

​Related

​Next Steps

Prerequisites

Overview

OCR (Optical Character Recognition)

Finding Elements

Waiting for Elements

Getting Element Bounds

Advanced Drag and Drop

Window Capture

Hotkeys

Complete Example

OCR Languages

Related

Next Steps