--- name: clickthrough-http-control description: Control a local computer through the Clickthrough HTTP server using screenshot grids, zoomed grids, and pointer/keyboard actions. Use when an agent must operate GUI apps by repeatedly capturing the screen, refining target coordinates, and executing precise interactions (click/right-click/double-click/scroll/type/hotkey) with verification. --- # Clickthrough HTTP Control Use a strict observe-decide-act-verify loop. ## Getting a computer instance (user-owned setup) The **user/operator** is responsible for provisioning and exposing the target machine. The agent should not assume it can self-install this stack. ### What the user must do 1. Install dependencies and run Clickthrough on the target computer (default bind: `127.0.0.1:8123`). 2. Expose access path to the agent (LAN/Tailscale/reverse proxy) and provide the base URL. 3. Configure secrets on target machine: - `CLICKTHROUGH_TOKEN` for general API auth - `CLICKTHROUGH_EXEC_SECRET` for `/exec` calls 4. Share connection details with the agent through a secure channel: - `base_url` - `x-clickthrough-token` - `x-clickthrough-exec-secret` (only when `/exec` is needed) ### What the agent should do 1. Validate connection with `GET /health` using provided headers. 2. Refuse `/exec` attempts when exec secret is missing/invalid. 3. Ask user for missing setup inputs instead of guessing infrastructure. ## Mini API map - `GET /health` → server status + safety flags - `GET /screen` → full screenshot (JSON with base64 by default, or raw image with `asImage=true`) - `POST /zoom` → cropped screenshot around point/region (also supports `asImage=true`) - `POST /action` → single interaction (`move`, `click`, `scroll`, `type`, `hotkey`, ...) - `POST /batch` → sequential action list - `POST /exec` → PowerShell/Bash/CMD command execution (requires configured exec secret + header) ### Header requirements - Always send `x-clickthrough-token` when token auth is enabled. - For `/exec`, also send `x-clickthrough-exec-secret`. ## Core workflow (mandatory) 1. Call `GET /screen` with coarse grid (e.g., 12x12). 2. Identify likely target region and compute an initial confidence score. 3. If confidence < 0.85, call `POST /zoom` with denser grid (e.g., 20x20) and re-evaluate. 4. **Before any click**, verify target identity (text/icon/location consistency). 5. Execute one minimal action via `POST /action`. 6. Re-capture with `GET /screen` and verify the expected state change. 7. Repeat until objective is complete. ## Verify-before-click rules - Never click if target identity is ambiguous. - Require at least two matching signals before click (example: expected text + expected UI region). - If confidence is low, do not "test click"; zoom and re-localize first. - For high-impact actions (close/delete/send/purchase), use two-phase flow: 1) preview intended coordinate + reason 2) execute only after explicit confirmation. ## Precision rules - Prefer grid targets first, then use `dx/dy` for subcell precision. - Keep `dx/dy` in `[-1,1]`; start at `0,0` and only offset when needed. - Use zoom before guessing offsets. - Avoid stale coordinates: re-capture before action if UI moved/scrolled. ## Safety rules - Respect `dry_run` and `allowed_region` restrictions from `/health`. - Respect `/exec` security requirements (`CLICKTHROUGH_EXEC_SECRET` + `x-clickthrough-exec-secret`). - Avoid destructive shortcuts unless explicitly requested. - Send one action at a time unless deterministic; then use `/batch`. ## Reliability rules - After every meaningful action, verify with a fresh screenshot. - On mismatch, do not spam clicks: zoom, re-localize, and retry once. - Prefer short, reversible actions over long macros. - If two retries fail, switch strategy (hotkey/window focus/search) instead of repeating the same click. ## App-specific playbooks (recommended) Build per-app routines for repetitive tasks instead of generic clicking. ### Spotify playbook - Focus app window before search/navigation. - Prefer keyboard-first flow for song start: 1) `Ctrl+L` (search) 2) type exact query 3) Enter 4) verify exact song+artist text 5) click/double-click row 6) verify now-playing bar - If now-playing does not match target track, stop and re-localize; do not keep clicking nearby rows.