Archived

This repository has been archived on 2026-05-20. You can view files and clone it. You cannot open issues or pull requests or push a commit.

Files

space 9e816e0417

python-syntax / syntax-check (push) Successful in 6s

Details

Add pytesseract OCR, click_text interact action, and interact verify endpoint

2026-05-03 20:57:34 +02:00

4.2 KiB

Raw Blame History

API Reference

Base URL: http://127.0.0.1:8123

Auth header when enabled:

x-clickthrough-token: <token>

This API is intended for AI computer control through these methods:

see
interact
interact/verify
exec

All responses use one envelope.

Response Envelope

Success:

{
  "ok": true,
  "request_id": "...",
  "time_ms": 1710000000000,
  "data": {},
  "error": null
}

Error:

{
  "ok": false,
  "request_id": "...",
  "time_ms": 1710000000000,
  "data": null,
  "error": {
    "code": "validation_error",
    "message": "request validation failed",
    "details": []
  }
}

1) See

`POST /see`

Capture a full screen or a region. Optional grid overlay returns coordinate metadata for click mapping.

{
  "screen": 0,
  "region_x": null,
  "region_y": null,
  "region_width": null,
  "region_height": null,
  "with_grid": true,
  "grid_rows": 12,
  "grid_cols": 12,
  "include_labels": true,
  "image_format": "png",
  "jpeg_quality": 85,
  "ocr": false,
  "ocr_min_confidence": 0,
  "ocr_lang": "eng",
  "ocr_psm": null
}

Returns:

data.image.base64
data.meta.region (global desktop coords)
data.meta.grid (rows/cols/cell size + formula)
data.meta.ocr (when ocr=true)

OCR item shape:

text
confidence
bbox (global coords)
center
region_relative_bbox

`POST /see/zoom`

Capture a tighter crop around a global point and draw another grid over that crop.

{
  "screen": 0,
  "center_x": 1200,
  "center_y": 720,
  "width": 500,
  "height": 350,
  "with_grid": true,
  "grid_rows": 20,
  "grid_cols": 20,
  "include_labels": true,
  "image_format": "png",
  "jpeg_quality": 90
}

Use this for precision before clicking tiny controls.

2) Interact

`POST /interact`

Mouse/keyboard action execution.

{
  "screen": 0,
  "action": {
    "action": "click",
    "target": {
      "mode": "grid",
      "region_x": 0,
      "region_y": 0,
      "region_width": 1920,
      "region_height": 1080,
      "rows": 12,
      "cols": 12,
      "row": 7,
      "col": 3,
      "dx": 0.0,
      "dy": 0.0
    },
    "button": "left",
    "clicks": 1
  }
}

Supported actions:

move, click, right_click, double_click, middle_click
scroll (scroll_amount)
type (text, interval_ms)
hotkey (keys)
click_text (OCR-driven text click with optional region)

Target modes:

pixel: absolute global x,y
grid: grid cell from a see/see/zoom response

`click_text` example (full screen OCR)

{
  "screen": 0,
  "action": {
    "action": "click_text",
    "click_text": {
      "text": "Sign in",
      "match": "contains",
      "case_sensitive": false,
      "min_confidence": 45,
      "occurrence": "best"
    }
  }
}

`click_text` example (region OCR)

{
  "screen": 0,
  "action": {
    "action": "click_text",
    "click_text": {
      "text": "Continue",
      "match": "exact",
      "region": { "x": 940, "y": 520, "width": 400, "height": 260 },
      "occurrence": "first"
    }
  }
}

3) Interact Verify

`POST /interact/verify`

Execute one interact action, then poll quick OCR verification checks until success or timeout.

{
  "action": {
    "screen": 0,
    "action": {
      "action": "click_text",
      "click_text": {
        "text": "Apply",
        "match": "contains"
      }
    }
  },
  "verify": {
    "type": "ocr_text_near_point",
    "text": "Applied",
    "x": 1180,
    "y": 640,
    "radius": 120,
    "screen": 0,
    "match": "contains"
  },
  "check_interval_ms": 250,
  "timeout_ms": 3000
}

Response includes:

action_result
verified
attempts
last_check
duration_ms

4) Exec

`POST /exec`

Run host shell commands (PowerShell/Bash/CMD).

{
  "command": "Get-Process | Select-Object -First 5",
  "shell": "powershell",
  "timeout_s": 20,
  "cwd": "C:/Users/Paul",
  "dry_run": false
}

Required header:

x-clickthrough-exec-secret: <secret>

Minimal Procedure for Agents

see full screen with coarse grid.
If uncertain, see/zoom target area with denser grid.
interact one action.
see again to confirm state change.
Use exec only when GUI interaction is not the right tool.

4.2 KiB Raw Blame History

API Reference

Response Envelope

1) See

POST /see

POST /see/zoom