This repository has been archived on 2026-05-20. You can view files and clone it. You cannot open issues or pull requests or push a commit.
Files
clickthrough/docs/API.md
Paul Wähner 1c03cab457
All checks were successful
python-syntax / syntax-check (push) Successful in 6s
refactor: simplify to see/interact/exec and split server modules
2026-05-03 20:07:12 +02:00

2.7 KiB

API Reference

Base URL: http://127.0.0.1:8123

Auth header when enabled:

x-clickthrough-token: <token>

This API is intended for AI computer control through 3 methods only:

  • see
  • interact
  • exec

All responses use one envelope.

Response Envelope

Success:

{
  "ok": true,
  "request_id": "...",
  "time_ms": 1710000000000,
  "data": {},
  "error": null
}

Error:

{
  "ok": false,
  "request_id": "...",
  "time_ms": 1710000000000,
  "data": null,
  "error": {
    "code": "validation_error",
    "message": "request validation failed",
    "details": []
  }
}

1) See

POST /see

Capture a full screen or a region. Optional grid overlay returns coordinate metadata for click mapping.

{
  "screen": 0,
  "region_x": null,
  "region_y": null,
  "region_width": null,
  "region_height": null,
  "with_grid": true,
  "grid_rows": 12,
  "grid_cols": 12,
  "include_labels": true,
  "image_format": "png",
  "jpeg_quality": 85
}

Returns:

  • data.image.base64
  • data.meta.region (global desktop coords)
  • data.meta.grid (rows/cols/cell size + formula)

POST /see/zoom

Capture a tighter crop around a global point and draw another grid over that crop.

{
  "screen": 0,
  "center_x": 1200,
  "center_y": 720,
  "width": 500,
  "height": 350,
  "with_grid": true,
  "grid_rows": 20,
  "grid_cols": 20,
  "include_labels": true,
  "image_format": "png",
  "jpeg_quality": 90
}

Use this for precision before clicking tiny controls.

2) Interact

POST /interact

Mouse/keyboard action execution.

{
  "screen": 0,
  "action": {
    "action": "click",
    "target": {
      "mode": "grid",
      "region_x": 0,
      "region_y": 0,
      "region_width": 1920,
      "region_height": 1080,
      "rows": 12,
      "cols": 12,
      "row": 7,
      "col": 3,
      "dx": 0.0,
      "dy": 0.0
    },
    "button": "left",
    "clicks": 1
  }
}

Supported actions:

  • move, click, right_click, double_click, middle_click
  • scroll (scroll_amount)
  • type (text, interval_ms)
  • hotkey (keys)

Target modes:

  • pixel: absolute global x,y
  • grid: grid cell from a see/see/zoom response

3) Exec

POST /exec

Run host shell commands (PowerShell/Bash/CMD).

{
  "command": "Get-Process | Select-Object -First 5",
  "shell": "powershell",
  "timeout_s": 20,
  "cwd": "C:/Users/Paul",
  "dry_run": false
}

Required header:

x-clickthrough-exec-secret: <secret>

Minimal Procedure for Agents

  1. see full screen with coarse grid.
  2. If uncertain, see/zoom target area with denser grid.
  3. interact one action.
  4. see again to confirm state change.
  5. Use exec only when GUI interaction is not the right tool.