Archived

This repository has been archived on 2026-05-20. You can view files and clone it. You cannot open issues or pull requests or push a commit.

Files

Space-Banane 22ca0097d1

python-syntax / syntax-check (push) Successful in 31s

Details

Remove interact verify endpoint

2026-05-04 15:59:43 +02:00

3.6 KiB

Raw Blame History

API Reference

Base URL: http://127.0.0.1:8123

Auth header when enabled:

x-clickthrough-token: <token>

This API is intended for AI computer control through these methods:

see
interact
exec

All responses use one envelope.

Response Envelope

Success:

{
  "ok": true,
  "request_id": "...",
  "time_ms": 1710000000000,
  "data": {},
  "error": null
}

Error:

{
  "ok": false,
  "request_id": "...",
  "time_ms": 1710000000000,
  "data": null,
  "error": {
    "code": "validation_error",
    "message": "request validation failed",
    "details": []
  }
}

1) See

`POST /see`

Capture a full screen or a region. Optional grid overlay returns coordinate metadata for click mapping.

{
  "screen": 0,
  "region_x": null,
  "region_y": null,
  "region_width": null,
  "region_height": null,
  "with_grid": true,
  "grid_rows": 12,
  "grid_cols": 12,
  "include_labels": true,
  "image_format": "png",
  "jpeg_quality": 85,
  "ocr": false,
  "ocr_min_confidence": 0,
  "ocr_lang": "eng",
  "ocr_psm": null
}

Returns:

data.image.base64
data.meta.region (global desktop coords)
data.meta.grid (rows/cols/cell size + formula)
data.meta.ocr (when ocr=true)

OCR item shape:

text
confidence
bbox (global coords)
center
region_relative_bbox

`POST /see/zoom`

Capture a tighter crop around a global point and draw another grid over that crop.

{
  "screen": 0,
  "center_x": 1200,
  "center_y": 720,
  "width": 500,
  "height": 350,
  "with_grid": true,
  "grid_rows": 20,
  "grid_cols": 20,
  "include_labels": true,
  "image_format": "png",
  "jpeg_quality": 90
}

Use this for precision before clicking tiny controls.

2) Interact

`POST /interact`

Mouse/keyboard action execution.

{
  "screen": 0,
  "action": {
    "action": "click",
    "target": {
      "mode": "grid",
      "region_x": 0,
      "region_y": 0,
      "region_width": 1920,
      "region_height": 1080,
      "rows": 12,
      "cols": 12,
      "row": 7,
      "col": 3,
      "dx": 0.0,
      "dy": 0.0
    },
    "button": "left",
    "clicks": 1
  }
}

Supported actions:

move, click, right_click, double_click, middle_click
scroll (scroll_amount)
type (text, interval_ms)
hotkey (keys)
click_text (OCR-driven text click with optional region)

Target modes:

pixel: absolute global x,y
grid: grid cell from a see/see/zoom response

`click_text` example (full screen OCR)

{
  "screen": 0,
  "action": {
    "action": "click_text",
    "click_text": {
      "text": "Sign in",
      "match": "contains",
      "case_sensitive": false,
      "min_confidence": 45,
      "occurrence": "best"
    }
  }
}

`click_text` example (region OCR)

{
  "screen": 0,
  "action": {
    "action": "click_text",
    "click_text": {
      "text": "Continue",
      "match": "exact",
      "region": { "x": 940, "y": 520, "width": 400, "height": 260 },
      "occurrence": "first"
    }
  }
}

3) Exec

`POST /exec`

Run host shell commands (PowerShell/Bash/CMD).

{
  "command": "Get-Process | Select-Object -First 5",
  "shell": "powershell",
  "timeout_s": 20,
  "cwd": "C:/Users/Paul",
  "dry_run": false
}

Required header:

x-clickthrough-exec-secret: <secret>

Minimal Procedure for Agents

see full screen with coarse grid.
If uncertain, see/zoom target area with denser grid.
interact one action.
see again to confirm state change.
Use exec only when GUI interaction is not the right tool.

3.6 KiB Raw Blame History

API Reference

Response Envelope

1) See

POST /see

POST /see/zoom