# Clickthrough Clickthrough is a lightweight HTTP control layer that lets an AI safely operate a real computer by repeatedly capturing structured screenshots with coordinate-aware grids (`see`), executing precise mouse/keyboard actions from those coordinates (`interact`), and optionally running authenticated shell commands for system-level tasks (`exec`) under a consistent response contract. ## Core Methods - `POST /see`: Capture a full screen or region, optionally with a click-ready grid overlay. - `POST /see/zoom`: Capture a tighter crop around a point and draw a denser grid for precise targeting. - `POST /interact`: Perform one mouse or keyboard action (`click`, `scroll`, `type`, `hotkey`, etc.). - `POST /exec`: Run PowerShell/Bash/CMD commands when shell-level control is needed. ## Why this works for AI agents - Agents do not need live vision; they iterate on snapshots. - Grid metadata bridges image understanding to deterministic click coordinates. - Interaction stays explicit and auditable (one action per request). - A unified response envelope (`ok`, `data`, `error`) reduces agent-side branching. ## Minimal Agent Loop 1. Call `see` with a coarse grid. 2. If uncertain, call `see/zoom` with a denser grid. 3. Call `interact` once. 4. Call `see` again to verify state change. 5. Use `exec` only for explicit shell/system tasks. ## Safety and Auth - `x-clickthrough-token` protects API access when enabled. - `x-clickthrough-exec-secret` is required for `/exec`. - Optional dry-run and allowed-region constraints reduce accidental risk. ## Docs - API: `docs/API.md` - Agent procedure: `skill/SKILL.md` - Coordinate system details: `docs/coordinate-system.md`