--- name: clickthrough-http-control description: Use 3 methods to control a computer: see (screenshot+grid), interact (mouse/keyboard), and exec (shell). --- # Clickthrough Computer Control Use exactly 3 methods: - `see` - `interact` - `exec` ## Method 1: See Use `POST /see` to capture full screen or a region with a grid overlay. Use `POST /see/zoom` to capture a tighter crop with a denser grid. Rules: - Start with coarse grid (`12x12`). - For precision, zoom and use denser grid (`20x20` or higher). - Always use returned `meta.region` and `meta.grid` when computing click targets. - Coordinates are global desktop coordinates. ## Method 2: Interact Use `POST /interact` for one action at a time. Mouse actions: - `move`, `click`, `right_click`, `double_click`, `middle_click`, `scroll` Keyboard actions: - `type`, `hotkey` Rules: - Prefer `grid` targets derived from fresh `see`/`see/zoom` captures. - Use `pixel` only when you already have reliable coordinates. - After each important action, call `see` again before continuing. ## Method 3: Exec Use `POST /exec` only for shell/system tasks. Rules: - Requires `x-clickthrough-exec-secret`. - Do not use exec for normal clicking/typing flows. - Prefer GUI interaction first; exec is fallback or explicit shell task. ## Lightweight Procedure 1. `see` capture. 2. If needed, `see/zoom` refine. 3. `interact` one step. 4. `see` verify. 5. Repeat. ## Quick Safety Rules - Never click with stale screenshots. - Never send multiple uncertain clicks in a row. - If localization is ambiguous, re-capture with a tighter zoom.