Advanced X11 desktop automation features including OCR, element finding, and advanced interactions.
Prerequisites
Before you begin, make sure you have:
- VNC server running - A VNC server must be started (see VNC Server)
- Desktop template - A sandbox with desktop support enabled
- Active sandbox - A running sandbox with desktop capabilities
- Basic desktop automation - Familiarity with basic desktop operations is helpful
Overview
X11 advanced features enable:
- OCR (Optical Character Recognition) text extraction
- Finding UI elements by text
- Waiting for elements to appear
- Advanced drag and drop
- Window capture
- Hotkey execution
Desktop automation requires a template with desktop support. Ensure your sandbox has desktop capabilities enabled.
OCR (Optical Character Recognition)
Extract text from screen regions using OCR:
from hopx_ai import Sandbox
sandbox = Sandbox.create(template="desktop")
# Extract text from region
text = sandbox.desktop.ocr(100, 100, 400, 200)
print(f"Extracted text: {text}")
# OCR with custom language
text = sandbox.desktop.ocr(100, 100, 400, 200, language="eng")
print(f"Text: {text}")
Expected Output:Extracted text: Hello World
Text: Hello World
Finding Elements
Find UI elements by text:
# Find element by text
element = sandbox.desktop.find_element("Submit")
if element:
print(f"Found at: ({element['x']}, {element['y']})")
print(f"Size: {element['width']}x{element['height']}")
# Click the element
sandbox.desktop.click(element['x'], element['y'])
else:
print("Element not found")
Expected Output:Found at: (150, 200)
Size: 100x30
Waiting for Elements
Wait for an element to appear:
# Wait for element to appear (default: 30 seconds)
element = sandbox.desktop.wait_for("Loading complete", timeout=60)
print(f"Element found at: ({element['x']}, {element['y']})")
# Click when found
sandbox.desktop.click(element['x'], element['y'])
Expected Output:Element found at: (200, 300)
Getting Element Bounds
Get bounding box of an element:
# Get element bounds
bounds = sandbox.desktop.get_bounds("OK Button")
print(f"Button at: {bounds['x']}, {bounds['y']}")
print(f"Size: {bounds['width']}x{bounds['height']}")
# Click center of button
center_x = bounds['x'] + bounds['width'] // 2
center_y = bounds['y'] + bounds['height'] // 2
sandbox.desktop.click(center_x, center_y)
Expected Output:Button at: 300, 400
Size: 80x25
Advanced Drag and Drop
Drag and drop operations:
# Drag and drop
sandbox.desktop.drag_drop(100, 200, 500, 300)
# Drag file to folder
sandbox.desktop.drag_drop(50, 50, 400, 300)
Expected Output:(Drag and drop operation completed)
Window Capture
Capture specific window:
# Capture active window
img_bytes = sandbox.desktop.capture_window()
# Capture specific window
windows = sandbox.desktop.get_windows()
if windows:
window_id = windows[0].id
img_bytes = sandbox.desktop.capture_window(window_id)
# Save to file
with open('window.png', 'wb') as f:
f.write(img_bytes)
Expected Output:(Window captured and saved to window.png)
Hotkeys
Execute hotkey combinations:
# Copy: Ctrl+C
sandbox.desktop.hotkey(['ctrl'], 'c')
# Paste: Ctrl+V
sandbox.desktop.hotkey(['ctrl'], 'v')
# Switch window: Alt+Tab
sandbox.desktop.hotkey(['alt'], 'tab')
# Screenshot: Ctrl+Shift+P
sandbox.desktop.hotkey(['ctrl', 'shift'], 'p')
Expected Output:(Hotkey combinations executed)
Complete Example
Complete workflow using advanced features:
from hopx_ai import Sandbox
import time
sandbox = Sandbox.create(template="desktop")
try:
# Wait for application to load
element = sandbox.desktop.wait_for("Application Ready", timeout=30)
print("Application loaded")
# Find and click button
button = sandbox.desktop.find_element("Start")
if button:
sandbox.desktop.click(button['x'], button['y'])
# Wait for dialog
dialog = sandbox.desktop.wait_for("Confirm", timeout=10)
# Extract text from dialog using OCR
dialog_text = sandbox.desktop.ocr(
dialog['x'], dialog['y'],
dialog['width'], dialog['height']
)
print(f"Dialog text: {dialog_text}")
# Find OK button
ok_bounds = sandbox.desktop.get_bounds("OK")
center_x = ok_bounds['x'] + ok_bounds['width'] // 2
center_y = ok_bounds['y'] + ok_bounds['height'] // 2
sandbox.desktop.click(center_x, center_y)
finally:
sandbox.kill()
Expected Output:Application loaded
Dialog text: Please confirm this action
OCR Languages
Supported OCR languages:
- eng: English (default)
- spa: Spanish
- fra: French
- deu: German
- And more (check Tesseract language support)
Next Steps