--- name: clickthrough-http-control description: Control a local computer through the Clickthrough HTTP server using screenshot grids, zoomed grids, and pointer/keyboard actions. Use when an agent must operate GUI apps by repeatedly capturing the screen, refining target coordinates, and executing precise interactions (click/right-click/double-click/scroll/type/hotkey) with verification. --- # Clickthrough HTTP Control Use a strict observe-decide-act-verify loop. ## Workflow 1. Call `GET /screen` with coarse grid (e.g., 12x12). 2. Identify likely cell/region for the target UI element. 3. If confidence is low, call `POST /zoom` centered on the candidate and use denser grid (e.g., 20x20). 4. Execute one minimal action via `POST /action`. 5. Re-capture with `GET /screen` and verify the expected state change. 6. Repeat until objective is complete. ## Precision rules - Prefer grid targets first, then use `dx/dy` for subcell precision. - Keep `dx/dy` in `[-1,1]`; start at `0,0` and only offset when needed. - Use zoom before guessing offsets. ## Safety rules - Respect `dry_run` and `allowed_region` restrictions from `/health`. - Avoid destructive shortcuts unless explicitly requested. - Send one action at a time unless deterministic; then use `/batch`. ## Reliability rules - After every meaningful action, verify with a fresh screenshot. - On mismatch, do not spam clicks: zoom, re-localize, and retry once. - Prefer short, reversible actions over long macros.