This repository has been archived on 2026-05-20. You can view files and clone it. You cannot open issues or pull requests or push a commit.
space 211f38003e
All checks were successful
python-syntax / syntax-check (push) Successful in 10s
Align interact verify test interval with schema minimum
2026-05-03 21:04:39 +02:00
2026-04-06 18:31:48 +02:00

Clickthrough

Clickthrough is a lightweight HTTP control layer that lets an AI safely operate a real computer by repeatedly capturing structured screenshots with coordinate-aware grids (see), executing precise mouse/keyboard actions from those coordinates (interact), and optionally running authenticated shell commands for system-level tasks (exec) under a consistent response contract.

Core Methods

  • POST /see: Capture a full screen or region, optionally with a click-ready grid overlay.
  • POST /see/zoom: Capture a tighter crop around a point and draw a denser grid for precise targeting.
  • POST /interact: Perform one mouse or keyboard action (click, scroll, type, hotkey, etc.).
  • POST /exec: Run PowerShell/Bash/CMD commands when shell-level control is needed.

Why this works for AI agents

  • Agents do not need live vision; they iterate on snapshots.
  • Grid metadata bridges image understanding to deterministic click coordinates.
  • Interaction stays explicit and auditable (one action per request).
  • A unified response envelope (ok, data, error) reduces agent-side branching.

Minimal Agent Loop

  1. Call see with a coarse grid.
  2. If uncertain, call see/zoom with a denser grid.
  3. Call interact once.
  4. Call see again to verify state change.
  5. Use exec only for explicit shell/system tasks.

Safety and Auth

  • x-clickthrough-token protects API access when enabled.
  • x-clickthrough-exec-secret is required for /exec.
  • Optional dry-run and allowed-region constraints reduce accidental risk.

Docs

  • API: docs/API.md
  • Agent procedure: skill/SKILL.md
  • Coordinate system details: docs/coordinate-system.md
Languages
Python 100%