diff --git a/TODO.md b/TODO.md index dc97073..ccfeedd 100644 --- a/TODO.md +++ b/TODO.md @@ -24,3 +24,4 @@ - [x] Open PR for exec feature branch and review/merge - [x] Require configured exec secret + per-request exec secret header - [x] Upgrade skill with verify-before-click rules, confidence thresholds, two-phase risky actions, and Spotify playbook +- [x] Add top-level skill section for instance setup + mini API docs diff --git a/skill/SKILL.md b/skill/SKILL.md index 7416de6..1f507e0 100644 --- a/skill/SKILL.md +++ b/skill/SKILL.md @@ -7,6 +7,34 @@ description: Control a local computer through the Clickthrough HTTP server using Use a strict observe-decide-act-verify loop. +## Getting a computer instance (quick setup) + +1. Start Clickthrough on the target computer (default: `127.0.0.1:8123`). +2. Expose it to the agent host (LAN/Tailscale/reverse proxy) and note the base URL. +3. Set auth on the target machine: + - `CLICKTHROUGH_TOKEN` for general API auth + - `CLICKTHROUGH_EXEC_SECRET` for `/exec` calls +4. Verify connectivity from the agent side: + - `GET /health` with `x-clickthrough-token` header +5. Store connection details for reuse: + - `base_url` + - `x-clickthrough-token` + - `x-clickthrough-exec-secret` (only when using `/exec`) + +## Mini API map + +- `GET /health` → server status + safety flags +- `GET /screen` → full screenshot (JSON with base64 by default, or raw image with `asImage=true`) +- `POST /zoom` → cropped screenshot around point/region (also supports `asImage=true`) +- `POST /action` → single interaction (`move`, `click`, `scroll`, `type`, `hotkey`, ...) +- `POST /batch` → sequential action list +- `POST /exec` → PowerShell/Bash/CMD command execution (requires configured exec secret + header) + +### Header requirements + +- Always send `x-clickthrough-token` when token auth is enabled. +- For `/exec`, also send `x-clickthrough-exec-secret`. + ## Core workflow (mandatory) 1. Call `GET /screen` with coarse grid (e.g., 12x12).