# Clickthrough Let an agent interact with a computer over HTTP. ## Primary mode (v2) Use the v2 contract for faster, less OCR-heavy control loops: - `POST /v2/observe` - `POST /v2/localize` - `POST /v2/act` - `POST /v2/act-verify` This is optimized for agents that cannot directly see the screen and must use screenshot/image tools. ## What this provides - Screen/region capture with optional OCR and timing stats - Observation IDs for deterministic follow-up localization - Text localization and image-tool coordinate localization - Action execution with resolved target IDs - Risk-aware action+verification defaults - Unified response envelope across all endpoints ## Quick start ```bash cd /root/external-projects/clickthrough python3 -m venv .venv . .venv/bin/activate pip install -r requirements.txt CLICKTHROUGH_TOKEN=change-me python -m server.app ``` Server defaults to `127.0.0.1:8123`. ## Fast control loop 1. `POST /v2/observe` on a tight region 2. If OCR is enough, `POST /v2/localize` with `text_query` 3. If ambiguous, ask image tool for one x,y in observation bounds 4. `POST /v2/localize` with `image_tool_point` 5. `POST /v2/act` or `POST /v2/act-verify` 6. Re-observe only changed region ## See docs - `docs/API.md` - `skill/SKILL.md` - `docs/coordinate-system.md`