Documentation Index
Fetch the complete documentation index at: https://docs.hopx.ai/llms.txt
Use this file to discover all available pages before exploring further.
Advanced X11 desktop automation features including OCR, element finding, and advanced interactions.
Prerequisites
Before you begin, make sure you have:
- VNC server running - A VNC server must be started (see VNC Server)
- Desktop template - A sandbox with desktop support enabled
- Active sandbox - A running sandbox with desktop capabilities
- Basic desktop automation - Familiarity with basic desktop operations is helpful
Overview
X11 advanced features enable:
- OCR (Optical Character Recognition) text extraction
- Finding UI elements by text
- Waiting for elements to appear
- Advanced drag and drop
- Window capture
- Hotkey execution
Desktop automation requires a template with desktop support. Ensure your sandbox has desktop capabilities enabled.
OCR (Optical Character Recognition)
Extract text from screen regions using OCR:
from hopx_ai import Sandbox
sandbox = Sandbox.create(template="desktop")
# Extract text from region
text = sandbox.desktop.ocr(100, 100, 400, 200)
print(f"Extracted text: {text}")
# OCR with custom language
text = sandbox.desktop.ocr(100, 100, 400, 200, language="eng")
print(f"Text: {text}")
Expected Output:Extracted text: Hello World
Text: Hello World
import { Sandbox } from '@hopx-ai/sdk';
const sandbox = await Sandbox.create({ template: 'desktop' });
// Extract text from region
const text = await sandbox.desktop.ocr(100, 100, 400, 200);
console.log(`Extracted text: ${text}`);
// OCR with custom language
const text2 = await sandbox.desktop.ocr(100, 100, 400, 200, { language: 'eng' });
console.log(`Text: ${text2}`);
Expected Output:Extracted text: Hello World
Text: Hello World
Finding Elements
Find UI elements by text:
# Find element by text
element = sandbox.desktop.find_element("Submit")
if element:
print(f"Found at: ({element['x']}, {element['y']})")
print(f"Size: {element['width']}x{element['height']}")
# Click the element
sandbox.desktop.click(element['x'], element['y'])
else:
print("Element not found")
Expected Output:Found at: (150, 200)
Size: 100x30
// Find element by text
const element = await sandbox.desktop.findElement('Submit');
if (element) {
console.log(`Found at: (${element.x}, ${element.y})`);
console.log(`Size: ${element.width}x${element.height}`);
// Click the element
await sandbox.desktop.mouseClick(element.x, element.y);
} else {
console.log('Element not found');
}
Expected Output:Found at: (150, 200)
Size: 100x30
Waiting for Elements
Wait for an element to appear:
# Wait for element to appear (default: 30 seconds)
element = sandbox.desktop.wait_for("Loading complete", timeout=60)
print(f"Element found at: ({element['x']}, {element['y']})")
# Click when found
sandbox.desktop.click(element['x'], element['y'])
Expected Output:Element found at: (200, 300)
// Wait for element to appear (default: 30 seconds)
const element = await sandbox.desktop.waitFor('Loading complete', 60);
console.log(`Element found at: (${element.x}, ${element.y})`);
// Click when found
await sandbox.desktop.mouseClick(element.x, element.y);
Expected Output:Element found at: (200, 300)
Getting Element Bounds
Get bounding box of an element:
# Get element bounds
bounds = sandbox.desktop.get_bounds("OK Button")
print(f"Button at: {bounds['x']}, {bounds['y']}")
print(f"Size: {bounds['width']}x{bounds['height']}")
# Click center of button
center_x = bounds['x'] + bounds['width'] // 2
center_y = bounds['y'] + bounds['height'] // 2
sandbox.desktop.click(center_x, center_y)
Expected Output:Button at: 300, 400
Size: 80x25
// Get element bounds
const bounds = await sandbox.desktop.getBounds('OK Button');
console.log(`Button at: ${bounds.x}, ${bounds.y}`);
console.log(`Size: ${bounds.width}x${bounds.height}`);
// Click center of button
const centerX = bounds.x + bounds.width / 2;
const centerY = bounds.y + bounds.height / 2;
await sandbox.desktop.mouseClick(centerX, centerY);
Expected Output:Button at: 300, 400
Size: 80x25
Advanced Drag and Drop
Drag and drop operations:
# Drag and drop
sandbox.desktop.drag_drop(100, 200, 500, 300)
# Drag file to folder
sandbox.desktop.drag_drop(50, 50, 400, 300)
Expected Output:(Drag and drop operation completed)
// Drag and drop
await sandbox.desktop.dragDrop(100, 200, 500, 300);
// Drag file to folder
await sandbox.desktop.dragDrop(50, 50, 400, 300);
Expected Output:(Drag and drop operation completed)
Window Capture
Capture specific window:
# Capture active window
img_bytes = sandbox.desktop.capture_window()
# Capture specific window
windows = sandbox.desktop.get_windows()
if windows:
window_id = windows[0].id
img_bytes = sandbox.desktop.capture_window(window_id)
# Save to file
with open('window.png', 'wb') as f:
f.write(img_bytes)
Expected Output:(Window captured and saved to window.png)
// Capture active window
const windowImg = await sandbox.desktop.captureWindow();
// Capture specific window
const windows = await sandbox.desktop.listWindows();
if (windows.length > 0) {
const windowId = windows[0].id;
const windowImg = await sandbox.desktop.captureWindow(windowId);
}
// Save to file
fs.writeFileSync('window.png', windowImg);
Expected Output:(Window captured and saved to window.png)
Hotkeys
Execute hotkey combinations:
# Copy: Ctrl+C
sandbox.desktop.hotkey(['ctrl'], 'c')
# Paste: Ctrl+V
sandbox.desktop.hotkey(['ctrl'], 'v')
# Switch window: Alt+Tab
sandbox.desktop.hotkey(['alt'], 'tab')
# Screenshot: Ctrl+Shift+P
sandbox.desktop.hotkey(['ctrl', 'shift'], 'p')
Expected Output:(Hotkey combinations executed)
// Note: JavaScript SDK may use keyboardCombination
// Copy: Ctrl+C
await sandbox.desktop.keyboardCombination(['ctrl', 'c']);
// Paste: Ctrl+V
await sandbox.desktop.keyboardCombination(['ctrl', 'v']);
Expected Output:(Hotkey combinations executed)
Complete Example
Complete workflow using advanced features:
from hopx_ai import Sandbox
import time
sandbox = Sandbox.create(template="desktop")
try:
# Wait for application to load
element = sandbox.desktop.wait_for("Application Ready", timeout=30)
print("Application loaded")
# Find and click button
button = sandbox.desktop.find_element("Start")
if button:
sandbox.desktop.click(button['x'], button['y'])
# Wait for dialog
dialog = sandbox.desktop.wait_for("Confirm", timeout=10)
# Extract text from dialog using OCR
dialog_text = sandbox.desktop.ocr(
dialog['x'], dialog['y'],
dialog['width'], dialog['height']
)
print(f"Dialog text: {dialog_text}")
# Find OK button
ok_bounds = sandbox.desktop.get_bounds("OK")
center_x = ok_bounds['x'] + ok_bounds['width'] // 2
center_y = ok_bounds['y'] + ok_bounds['height'] // 2
sandbox.desktop.click(center_x, center_y)
finally:
sandbox.kill()
Expected Output:Application loaded
Dialog text: Please confirm this action
import { Sandbox } from '@hopx-ai/sdk';
const sandbox = await Sandbox.create({ template: 'desktop' });
try {
// Wait for application to load
const element = await sandbox.desktop.waitFor('Application Ready', 30);
console.log('Application loaded');
// Find and click button
const button = await sandbox.desktop.findElement('Start');
if (button) {
await sandbox.desktop.mouseClick(button.x, button.y);
}
// Wait for dialog
const dialog = await sandbox.desktop.waitFor('Confirm', 10);
// Extract text from dialog using OCR
const dialogText = await sandbox.desktop.ocr(
dialog.x, dialog.y,
dialog.width, dialog.height
);
console.log(`Dialog text: ${dialogText}`);
// Find OK button
const okBounds = await sandbox.desktop.getBounds('OK');
const centerX = okBounds.x + okBounds.width / 2;
const centerY = okBounds.y + okBounds.height / 2;
await sandbox.desktop.mouseClick(centerX, centerY);
} finally {
await sandbox.kill();
}
Expected Output:Application loaded
Dialog text: Please confirm this action
OCR Languages
Supported OCR languages:
- eng: English (default)
- spa: Spanish
- fra: French
- deu: German
- And more (check Tesseract language support)
Next Steps