1.4 KiB
1.4 KiB
name, description
| name | description |
|---|---|
| clickthrough-http-control | Control a local computer through the Clickthrough HTTP server using screenshot grids, zoomed grids, and pointer/keyboard actions. Use when an agent must operate GUI apps by repeatedly capturing the screen, refining target coordinates, and executing precise interactions (click/right-click/double-click/scroll/type/hotkey) with verification. |
Clickthrough HTTP Control
Use a strict observe-decide-act-verify loop.
Workflow
- Call
GET /screenwith coarse grid (e.g., 12x12). - Identify likely cell/region for the target UI element.
- If confidence is low, call
POST /zoomcentered on the candidate and use denser grid (e.g., 20x20). - Execute one minimal action via
POST /action. - Re-capture with
GET /screenand verify the expected state change. - Repeat until objective is complete.
Precision rules
- Prefer grid targets first, then use
dx/dyfor subcell precision. - Keep
dx/dyin[-1,1]; start at0,0and only offset when needed. - Use zoom before guessing offsets.
Safety rules
- Respect
dry_runandallowed_regionrestrictions from/health. - Avoid destructive shortcuts unless explicitly requested.
- Send one action at a time unless deterministic; then use
/batch.
Reliability rules
- After every meaningful action, verify with a fresh screenshot.
- On mismatch, do not spam clicks: zoom, re-localize, and retry once.
- Prefer short, reversible actions over long macros.