This repository has been archived on 2026-05-20. You can view files and clone it. You cannot open issues or pull requests or push a commit.
Files
clickthrough/docs/API.md
space 9e816e0417
All checks were successful
python-syntax / syntax-check (push) Successful in 6s
Add pytesseract OCR, click_text interact action, and interact verify endpoint
2026-05-03 20:57:34 +02:00

4.2 KiB

API Reference

Base URL: http://127.0.0.1:8123

Auth header when enabled:

x-clickthrough-token: <token>

This API is intended for AI computer control through these methods:

  • see
  • interact
  • interact/verify
  • exec

All responses use one envelope.

Response Envelope

Success:

{
  "ok": true,
  "request_id": "...",
  "time_ms": 1710000000000,
  "data": {},
  "error": null
}

Error:

{
  "ok": false,
  "request_id": "...",
  "time_ms": 1710000000000,
  "data": null,
  "error": {
    "code": "validation_error",
    "message": "request validation failed",
    "details": []
  }
}

1) See

POST /see

Capture a full screen or a region. Optional grid overlay returns coordinate metadata for click mapping.

{
  "screen": 0,
  "region_x": null,
  "region_y": null,
  "region_width": null,
  "region_height": null,
  "with_grid": true,
  "grid_rows": 12,
  "grid_cols": 12,
  "include_labels": true,
  "image_format": "png",
  "jpeg_quality": 85,
  "ocr": false,
  "ocr_min_confidence": 0,
  "ocr_lang": "eng",
  "ocr_psm": null
}

Returns:

  • data.image.base64
  • data.meta.region (global desktop coords)
  • data.meta.grid (rows/cols/cell size + formula)
  • data.meta.ocr (when ocr=true)

OCR item shape:

  • text
  • confidence
  • bbox (global coords)
  • center
  • region_relative_bbox

POST /see/zoom

Capture a tighter crop around a global point and draw another grid over that crop.

{
  "screen": 0,
  "center_x": 1200,
  "center_y": 720,
  "width": 500,
  "height": 350,
  "with_grid": true,
  "grid_rows": 20,
  "grid_cols": 20,
  "include_labels": true,
  "image_format": "png",
  "jpeg_quality": 90
}

Use this for precision before clicking tiny controls.

2) Interact

POST /interact

Mouse/keyboard action execution.

{
  "screen": 0,
  "action": {
    "action": "click",
    "target": {
      "mode": "grid",
      "region_x": 0,
      "region_y": 0,
      "region_width": 1920,
      "region_height": 1080,
      "rows": 12,
      "cols": 12,
      "row": 7,
      "col": 3,
      "dx": 0.0,
      "dy": 0.0
    },
    "button": "left",
    "clicks": 1
  }
}

Supported actions:

  • move, click, right_click, double_click, middle_click
  • scroll (scroll_amount)
  • type (text, interval_ms)
  • hotkey (keys)
  • click_text (OCR-driven text click with optional region)

Target modes:

  • pixel: absolute global x,y
  • grid: grid cell from a see/see/zoom response

click_text example (full screen OCR)

{
  "screen": 0,
  "action": {
    "action": "click_text",
    "click_text": {
      "text": "Sign in",
      "match": "contains",
      "case_sensitive": false,
      "min_confidence": 45,
      "occurrence": "best"
    }
  }
}

click_text example (region OCR)

{
  "screen": 0,
  "action": {
    "action": "click_text",
    "click_text": {
      "text": "Continue",
      "match": "exact",
      "region": { "x": 940, "y": 520, "width": 400, "height": 260 },
      "occurrence": "first"
    }
  }
}

3) Interact Verify

POST /interact/verify

Execute one interact action, then poll quick OCR verification checks until success or timeout.

{
  "action": {
    "screen": 0,
    "action": {
      "action": "click_text",
      "click_text": {
        "text": "Apply",
        "match": "contains"
      }
    }
  },
  "verify": {
    "type": "ocr_text_near_point",
    "text": "Applied",
    "x": 1180,
    "y": 640,
    "radius": 120,
    "screen": 0,
    "match": "contains"
  },
  "check_interval_ms": 250,
  "timeout_ms": 3000
}

Response includes:

  • action_result
  • verified
  • attempts
  • last_check
  • duration_ms

4) Exec

POST /exec

Run host shell commands (PowerShell/Bash/CMD).

{
  "command": "Get-Process | Select-Object -First 5",
  "shell": "powershell",
  "timeout_s": 20,
  "cwd": "C:/Users/Paul",
  "dry_run": false
}

Required header:

x-clickthrough-exec-secret: <secret>

Minimal Procedure for Agents

  1. see full screen with coarse grid.
  2. If uncertain, see/zoom target area with denser grid.
  3. interact one action.
  4. see again to confirm state change.
  5. Use exec only when GUI interaction is not the right tool.