> ## Documentation Index
> Fetch the complete documentation index at: https://docs.hopx.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# X11 Advanced Features

> Advanced X11 desktop automation features for sandbox environments in HopX. Access advanced X11 capabilities for complex desktop automation, window management, and GUI interactions. Learn about X11-specific features, advanced automation techniques, and low-level desktop control. Includes Python and JavaScript SDK examples and X11 API endpoints.

Advanced X11 desktop automation features including OCR, element finding, and advanced interactions.

## Prerequisites

Before you begin, make sure you have:

* **VNC server running** - A VNC server must be started (see [VNC Server](/core-concepts/desktop/vnc-server))
* **Desktop template** - A sandbox with desktop support enabled
* **Active sandbox** - A running sandbox with desktop capabilities
* **Basic desktop automation** - Familiarity with basic desktop operations is helpful

## Overview

X11 advanced features enable:

* OCR (Optical Character Recognition) text extraction
* Finding UI elements by text
* Waiting for elements to appear
* Advanced drag and drop
* Window capture
* Hotkey execution

<Note>
  Desktop automation requires a template with desktop support. Ensure your sandbox has desktop capabilities enabled.
</Note>

## OCR (Optical Character Recognition)

Extract text from screen regions using OCR:

<Tabs>
  <Tab title="Python">
    ```python theme={null}
    from hopx_ai import Sandbox

    sandbox = Sandbox.create(template="desktop")

    # Extract text from region
    text = sandbox.desktop.ocr(100, 100, 400, 200)
    print(f"Extracted text: {text}")

    # OCR with custom language
    text = sandbox.desktop.ocr(100, 100, 400, 200, language="eng")
    print(f"Text: {text}")
    ```

    **Expected Output:**

    ```
    Extracted text: Hello World
    Text: Hello World
    ```
  </Tab>

  <Tab title="JavaScript">
    ```javascript theme={null}
    import { Sandbox } from '@hopx-ai/sdk';

    const sandbox = await Sandbox.create({ template: 'desktop' });

    // Extract text from region
    const text = await sandbox.desktop.ocr(100, 100, 400, 200);
    console.log(`Extracted text: ${text}`);

    // OCR with custom language
    const text2 = await sandbox.desktop.ocr(100, 100, 400, 200, { language: 'eng' });
    console.log(`Text: ${text2}`);
    ```

    **Expected Output:**

    ```
    Extracted text: Hello World
    Text: Hello World
    ```
  </Tab>
</Tabs>

## Finding Elements

Find UI elements by text:

<Tabs>
  <Tab title="Python">
    ```python theme={null}
    # Find element by text
    element = sandbox.desktop.find_element("Submit")

    if element:
        print(f"Found at: ({element['x']}, {element['y']})")
        print(f"Size: {element['width']}x{element['height']}")
        
        # Click the element
        sandbox.desktop.click(element['x'], element['y'])
    else:
        print("Element not found")
    ```

    **Expected Output:**

    ```
    Found at: (150, 200)
    Size: 100x30
    ```
  </Tab>

  <Tab title="JavaScript">
    ```javascript theme={null}
    // Find element by text
    const element = await sandbox.desktop.findElement('Submit');

    if (element) {
      console.log(`Found at: (${element.x}, ${element.y})`);
      console.log(`Size: ${element.width}x${element.height}`);
      
      // Click the element
      await sandbox.desktop.mouseClick(element.x, element.y);
    } else {
      console.log('Element not found');
    }
    ```

    **Expected Output:**

    ```
    Found at: (150, 200)
    Size: 100x30
    ```
  </Tab>
</Tabs>

## Waiting for Elements

Wait for an element to appear:

<Tabs>
  <Tab title="Python">
    ```python theme={null}
    # Wait for element to appear (default: 30 seconds)
    element = sandbox.desktop.wait_for("Loading complete", timeout=60)

    print(f"Element found at: ({element['x']}, {element['y']})")

    # Click when found
    sandbox.desktop.click(element['x'], element['y'])
    ```

    **Expected Output:**

    ```
    Element found at: (200, 300)
    ```
  </Tab>

  <Tab title="JavaScript">
    ```javascript theme={null}
    // Wait for element to appear (default: 30 seconds)
    const element = await sandbox.desktop.waitFor('Loading complete', 60);

    console.log(`Element found at: (${element.x}, ${element.y})`);

    // Click when found
    await sandbox.desktop.mouseClick(element.x, element.y);
    ```

    **Expected Output:**

    ```
    Element found at: (200, 300)
    ```
  </Tab>
</Tabs>

## Getting Element Bounds

Get bounding box of an element:

<Tabs>
  <Tab title="Python">
    ```python theme={null}
    # Get element bounds
    bounds = sandbox.desktop.get_bounds("OK Button")

    print(f"Button at: {bounds['x']}, {bounds['y']}")
    print(f"Size: {bounds['width']}x{bounds['height']}")

    # Click center of button
    center_x = bounds['x'] + bounds['width'] // 2
    center_y = bounds['y'] + bounds['height'] // 2
    sandbox.desktop.click(center_x, center_y)
    ```

    **Expected Output:**

    ```
    Button at: 300, 400
    Size: 80x25
    ```
  </Tab>

  <Tab title="JavaScript">
    ```javascript theme={null}
    // Get element bounds
    const bounds = await sandbox.desktop.getBounds('OK Button');

    console.log(`Button at: ${bounds.x}, ${bounds.y}`);
    console.log(`Size: ${bounds.width}x${bounds.height}`);

    // Click center of button
    const centerX = bounds.x + bounds.width / 2;
    const centerY = bounds.y + bounds.height / 2;
    await sandbox.desktop.mouseClick(centerX, centerY);
    ```

    **Expected Output:**

    ```
    Button at: 300, 400
    Size: 80x25
    ```
  </Tab>
</Tabs>

## Advanced Drag and Drop

Drag and drop operations:

<Tabs>
  <Tab title="Python">
    ```python theme={null}
    # Drag and drop
    sandbox.desktop.drag_drop(100, 200, 500, 300)

    # Drag file to folder
    sandbox.desktop.drag_drop(50, 50, 400, 300)
    ```

    **Expected Output:**

    ```
    (Drag and drop operation completed)
    ```
  </Tab>

  <Tab title="JavaScript">
    ```javascript theme={null}
    // Drag and drop
    await sandbox.desktop.dragDrop(100, 200, 500, 300);

    // Drag file to folder
    await sandbox.desktop.dragDrop(50, 50, 400, 300);
    ```

    **Expected Output:**

    ```
    (Drag and drop operation completed)
    ```
  </Tab>
</Tabs>

## Window Capture

Capture specific window:

<Tabs>
  <Tab title="Python">
    ```python theme={null}
    # Capture active window
    img_bytes = sandbox.desktop.capture_window()

    # Capture specific window
    windows = sandbox.desktop.get_windows()
    if windows:
        window_id = windows[0].id
        img_bytes = sandbox.desktop.capture_window(window_id)

    # Save to file
    with open('window.png', 'wb') as f:
        f.write(img_bytes)
    ```

    **Expected Output:**

    ```
    (Window captured and saved to window.png)
    ```
  </Tab>

  <Tab title="JavaScript">
    ```javascript theme={null}
    // Capture active window
    const windowImg = await sandbox.desktop.captureWindow();

    // Capture specific window
    const windows = await sandbox.desktop.listWindows();
    if (windows.length > 0) {
      const windowId = windows[0].id;
      const windowImg = await sandbox.desktop.captureWindow(windowId);
    }

    // Save to file
    fs.writeFileSync('window.png', windowImg);
    ```

    **Expected Output:**

    ```
    (Window captured and saved to window.png)
    ```
  </Tab>
</Tabs>

## Hotkeys

Execute hotkey combinations:

<Tabs>
  <Tab title="Python">
    ```python theme={null}
    # Copy: Ctrl+C
    sandbox.desktop.hotkey(['ctrl'], 'c')

    # Paste: Ctrl+V
    sandbox.desktop.hotkey(['ctrl'], 'v')

    # Switch window: Alt+Tab
    sandbox.desktop.hotkey(['alt'], 'tab')

    # Screenshot: Ctrl+Shift+P
    sandbox.desktop.hotkey(['ctrl', 'shift'], 'p')
    ```

    **Expected Output:**

    ```
    (Hotkey combinations executed)
    ```
  </Tab>

  <Tab title="JavaScript">
    ```javascript theme={null}
    // Note: JavaScript SDK may use keyboardCombination
    // Copy: Ctrl+C
    await sandbox.desktop.keyboardCombination(['ctrl', 'c']);

    // Paste: Ctrl+V
    await sandbox.desktop.keyboardCombination(['ctrl', 'v']);
    ```

    **Expected Output:**

    ```
    (Hotkey combinations executed)
    ```
  </Tab>
</Tabs>

## Complete Example

Complete workflow using advanced features:

<Tabs>
  <Tab title="Python">
    ```python theme={null}
    from hopx_ai import Sandbox
    import time

    sandbox = Sandbox.create(template="desktop")

    try:
        # Wait for application to load
        element = sandbox.desktop.wait_for("Application Ready", timeout=30)
        print("Application loaded")
        
        # Find and click button
        button = sandbox.desktop.find_element("Start")
        if button:
            sandbox.desktop.click(button['x'], button['y'])
        
        # Wait for dialog
        dialog = sandbox.desktop.wait_for("Confirm", timeout=10)
        
        # Extract text from dialog using OCR
        dialog_text = sandbox.desktop.ocr(
            dialog['x'], dialog['y'],
            dialog['width'], dialog['height']
        )
        print(f"Dialog text: {dialog_text}")
        
        # Find OK button
        ok_bounds = sandbox.desktop.get_bounds("OK")
        center_x = ok_bounds['x'] + ok_bounds['width'] // 2
        center_y = ok_bounds['y'] + ok_bounds['height'] // 2
        sandbox.desktop.click(center_x, center_y)
        
    finally:
        sandbox.kill()
    ```

    **Expected Output:**

    ```
    Application loaded
    Dialog text: Please confirm this action
    ```
  </Tab>

  <Tab title="JavaScript">
    ```javascript theme={null}
    import { Sandbox } from '@hopx-ai/sdk';

    const sandbox = await Sandbox.create({ template: 'desktop' });

    try {
      // Wait for application to load
      const element = await sandbox.desktop.waitFor('Application Ready', 30);
      console.log('Application loaded');
      
      // Find and click button
      const button = await sandbox.desktop.findElement('Start');
      if (button) {
        await sandbox.desktop.mouseClick(button.x, button.y);
      }
      
      // Wait for dialog
      const dialog = await sandbox.desktop.waitFor('Confirm', 10);
      
      // Extract text from dialog using OCR
      const dialogText = await sandbox.desktop.ocr(
        dialog.x, dialog.y,
        dialog.width, dialog.height
      );
      console.log(`Dialog text: ${dialogText}`);
      
      // Find OK button
      const okBounds = await sandbox.desktop.getBounds('OK');
      const centerX = okBounds.x + okBounds.width / 2;
      const centerY = okBounds.y + okBounds.height / 2;
      await sandbox.desktop.mouseClick(centerX, centerY);
      
    } finally {
      await sandbox.kill();
    }
    ```

    **Expected Output:**

    ```
    Application loaded
    Dialog text: Please confirm this action
    ```
  </Tab>
</Tabs>

## OCR Languages

Supported OCR languages:

* **eng**: English (default)
* **spa**: Spanish
* **fra**: French
* **deu**: German
* And more (check Tesseract language support)

## Related

* [Screenshots](/core-concepts/desktop/screenshots) - Basic screenshot capture
* [Mouse Control](/core-concepts/desktop/mouse-control) - Mouse operations
* [Keyboard Control](/core-concepts/desktop/keyboard-control) - Keyboard operations
* **SDK**: [sandbox.desktop.ocr()](/sdk/python/desktop#ocr) - Python SDK method

## Next Steps

* Learn about [Screenshots](/core-concepts/desktop/screenshots) for basic image capture

* Explore [Mouse Control](/core-concepts/desktop/mouse-control) and [Keyboard Control](/core-concepts/desktop/keyboard-control) for interactions

* Review [VNC Server](/core-concepts/desktop/vnc-server) for remote desktop access

* [Mouse Control](/core-concepts/desktop/mouse-control) - Basic mouse operations

* [Keyboard Control](/core-concepts/desktop/keyboard-control) - Basic keyboard operations

* [Screenshots](/core-concepts/desktop/screenshots) - Screenshot capture

* **[CLI System Commands](/cli/commands/system)** - System operations from CLI
