1.9 KiB
1.9 KiB
name: clickthrough-http-control
description: Use 3 methods to control a computer: see (screenshot+grid), interact (mouse/keyboard), and exec (shell).
Clickthrough Computer Control
Use these methods:
seeinteractexec
Method 1: See
Use POST /see to capture full screen or a region with a grid overlay.
Use POST /see/zoom to capture a tighter crop with a denser grid.
Use POST /see with ocr=true when text localization is needed.
Rules:
- Start with coarse grid (
12x12). - For precision, zoom and use denser grid (
20x20or higher). - Always use returned
meta.regionandmeta.gridwhen computing click targets. - Coordinates are global desktop coordinates.
- OCR results are in
data.meta.ocrand include confidence, bbox, and center.
Method 2: Interact
Use POST /interact for one action at a time.
Mouse actions:
move,click,right_click,double_click,middle_click,scrollclick_text(OCR-driven click; optionally scope withclick_text.region)
Keyboard actions:
type,hotkey
Rules:
- Prefer
gridtargets derived from freshsee/see/zoomcaptures. - For text buttons/labels, prefer
click_textand bound OCR with a region when possible. - Use
pixelonly when you already have reliable coordinates. - After each important action, call
seeagain before continuing.
Method 3: Exec
Use POST /exec only for shell/system tasks.
Rules:
- Requires
x-clickthrough-exec-secret. - Do not use exec for normal clicking/typing flows.
- Prefer GUI interaction first; exec is fallback or explicit shell task.
Lightweight Procedure
seecapture.- If needed,
see/zoomrefine. interactone step (click_textfor text UI targets).seeverify.- Repeat.
Quick Safety Rules
- Never click with stale screenshots.
- Never send multiple uncertain clicks in a row.
- If localization is ambiguous, re-capture with a tighter zoom.