Advanced X11 desktop automation features including OCR, element finding, and advanced interactions.
Prerequisites
Before you begin, make sure you have:
- VNC server running - A VNC server must be started (see VNC Server)
- Desktop template - A sandbox with desktop support enabled
- Active sandbox - A running sandbox with desktop capabilities
- Basic desktop automation - Familiarity with basic desktop operations is helpful
Overview
X11 advanced features enable:
- OCR (Optical Character Recognition) text extraction
- Finding UI elements by text
- Waiting for elements to appear
- Advanced drag and drop
- Window capture
- Hotkey execution
Desktop automation requires a template with desktop support. Ensure your sandbox has desktop capabilities enabled.
OCR (Optical Character Recognition)
Extract text from screen regions using OCR:
from hopx_ai import Sandbox
sandbox = Sandbox.create(template="desktop")
# Extract text from region
text = sandbox.desktop.ocr(100, 100, 400, 200)
print(f"Extracted text: {text}")
# OCR with custom language
text = sandbox.desktop.ocr(100, 100, 400, 200, language="eng")
print(f"Text: {text}")
Expected Output:Extracted text: Hello World
Text: Hello World
import { Sandbox } from '@hopx-ai/sdk';
const sandbox = await Sandbox.create({ template: 'desktop' });
// Extract text from region
const text = await sandbox.desktop.ocr(100, 100, 400, 200);
console.log(`Extracted text: ${text}`);
// OCR with custom language
const text2 = await sandbox.desktop.ocr(100, 100, 400, 200, { language: 'eng' });
console.log(`Text: ${text2}`);
Expected Output:Extracted text: Hello World
Text: Hello World
Finding Elements
Find UI elements by text:
# Find element by text
element = sandbox.desktop.find_element("Submit")
if element:
print(f"Found at: ({element['x']}, {element['y']})")
print(f"Size: {element['width']}x{element['height']}")
# Click the element
sandbox.desktop.click(element['x'], element['y'])
else:
print("Element not found")
Expected Output:Found at: (150, 200)
Size: 100x30
// Find element by text
const element = await sandbox.desktop.findElement('Submit');
if (element) {
console.log(`Found at: (${element.x}, ${element.y})`);
console.log(`Size: ${element.width}x${element.height}`);
// Click the element
await sandbox.desktop.mouseClick(element.x, element.y);
} else {
console.log('Element not found');
}
Expected Output:Found at: (150, 200)
Size: 100x30
Waiting for Elements
Wait for an element to appear:
# Wait for element to appear (default: 30 seconds)
element = sandbox.desktop.wait_for("Loading complete", timeout=60)
print(f"Element found at: ({element['x']}, {element['y']})")
# Click when found
sandbox.desktop.click(element['x'], element['y'])
Expected Output:Element found at: (200, 300)
// Wait for element to appear (default: 30 seconds)
const element = await sandbox.desktop.waitFor('Loading complete', 60);
console.log(`Element found at: (${element.x}, ${element.y})`);
// Click when found
await sandbox.desktop.mouseClick(element.x, element.y);
Expected Output:Element found at: (200, 300)
Getting Element Bounds
Get bounding box of an element:
# Get element bounds
bounds = sandbox.desktop.get_bounds("OK Button")
print(f"Button at: {bounds['x']}, {bounds['y']}")
print(f"Size: {bounds['width']}x{bounds['height']}")
# Click center of button
center_x = bounds['x'] + bounds['width'] // 2
center_y = bounds['y'] + bounds['height'] // 2
sandbox.desktop.click(center_x, center_y)
Expected Output:Button at: 300, 400
Size: 80x25
// Get element bounds
const bounds = await sandbox.desktop.getBounds('OK Button');
console.log(`Button at: ${bounds.x}, ${bounds.y}`);
console.log(`Size: ${bounds.width}x${bounds.height}`);
// Click center of button
const centerX = bounds.x + bounds.width / 2;
const centerY = bounds.y + bounds.height / 2;
await sandbox.desktop.mouseClick(centerX, centerY);
Expected Output:Button at: 300, 400
Size: 80x25
Advanced Drag and Drop
Drag and drop operations:
# Drag and drop
sandbox.desktop.drag_drop(100, 200, 500, 300)
# Drag file to folder
sandbox.desktop.drag_drop(50, 50, 400, 300)
Expected Output:(Drag and drop operation completed)
// Drag and drop
await sandbox.desktop.dragDrop(100, 200, 500, 300);
// Drag file to folder
await sandbox.desktop.dragDrop(50, 50, 400, 300);
Expected Output:(Drag and drop operation completed)
Window Capture
Capture specific window:
# Capture active window
img_bytes = sandbox.desktop.capture_window()
# Capture specific window
windows = sandbox.desktop.get_windows()
if windows:
window_id = windows[0].id
img_bytes = sandbox.desktop.capture_window(window_id)
# Save to file
with open('window.png', 'wb') as f:
f.write(img_bytes)
Expected Output:(Window captured and saved to window.png)
// Capture active window
const windowImg = await sandbox.desktop.captureWindow();
// Capture specific window
const windows = await sandbox.desktop.listWindows();
if (windows.length > 0) {
const windowId = windows[0].id;
const windowImg = await sandbox.desktop.captureWindow(windowId);
}
// Save to file
fs.writeFileSync('window.png', windowImg);
Expected Output:(Window captured and saved to window.png)
Hotkeys
Execute hotkey combinations:
# Copy: Ctrl+C
sandbox.desktop.hotkey(['ctrl'], 'c')
# Paste: Ctrl+V
sandbox.desktop.hotkey(['ctrl'], 'v')
# Switch window: Alt+Tab
sandbox.desktop.hotkey(['alt'], 'tab')
# Screenshot: Ctrl+Shift+P
sandbox.desktop.hotkey(['ctrl', 'shift'], 'p')
Expected Output:(Hotkey combinations executed)
// Note: JavaScript SDK may use keyboardCombination
// Copy: Ctrl+C
await sandbox.desktop.keyboardCombination(['ctrl', 'c']);
// Paste: Ctrl+V
await sandbox.desktop.keyboardCombination(['ctrl', 'v']);
Expected Output:(Hotkey combinations executed)
Complete Example
Complete workflow using advanced features:
from hopx_ai import Sandbox
import time
sandbox = Sandbox.create(template="desktop")
try:
# Wait for application to load
element = sandbox.desktop.wait_for("Application Ready", timeout=30)
print("Application loaded")
# Find and click button
button = sandbox.desktop.find_element("Start")
if button:
sandbox.desktop.click(button['x'], button['y'])
# Wait for dialog
dialog = sandbox.desktop.wait_for("Confirm", timeout=10)
# Extract text from dialog using OCR
dialog_text = sandbox.desktop.ocr(
dialog['x'], dialog['y'],
dialog['width'], dialog['height']
)
print(f"Dialog text: {dialog_text}")
# Find OK button
ok_bounds = sandbox.desktop.get_bounds("OK")
center_x = ok_bounds['x'] + ok_bounds['width'] // 2
center_y = ok_bounds['y'] + ok_bounds['height'] // 2
sandbox.desktop.click(center_x, center_y)
finally:
sandbox.kill()
Expected Output:Application loaded
Dialog text: Please confirm this action
import { Sandbox } from '@hopx-ai/sdk';
const sandbox = await Sandbox.create({ template: 'desktop' });
try {
// Wait for application to load
const element = await sandbox.desktop.waitFor('Application Ready', 30);
console.log('Application loaded');
// Find and click button
const button = await sandbox.desktop.findElement('Start');
if (button) {
await sandbox.desktop.mouseClick(button.x, button.y);
}
// Wait for dialog
const dialog = await sandbox.desktop.waitFor('Confirm', 10);
// Extract text from dialog using OCR
const dialogText = await sandbox.desktop.ocr(
dialog.x, dialog.y,
dialog.width, dialog.height
);
console.log(`Dialog text: ${dialogText}`);
// Find OK button
const okBounds = await sandbox.desktop.getBounds('OK');
const centerX = okBounds.x + okBounds.width / 2;
const centerY = okBounds.y + okBounds.height / 2;
await sandbox.desktop.mouseClick(centerX, centerY);
} finally {
await sandbox.kill();
}
Expected Output:Application loaded
Dialog text: Please confirm this action
OCR Languages
Supported OCR languages:
- eng: English (default)
- spa: Spanish
- fra: French
- deu: German
- And more (check Tesseract language support)
Next Steps