This repository has been archived on 2026-05-20. You can view files and clone it. You cannot open issues or pull requests or push a commit.
Files
clickthrough/skill/SKILL.md
Space-Banane 22ca0097d1
All checks were successful
python-syntax / syntax-check (push) Successful in 31s
Remove interact verify endpoint
2026-05-04 15:59:43 +02:00

1.9 KiB

name: clickthrough-http-control description: Use 3 methods to control a computer: see (screenshot+grid), interact (mouse/keyboard), and exec (shell).

Clickthrough Computer Control

Use these methods:

  • see
  • interact
  • exec

Method 1: See

Use POST /see to capture full screen or a region with a grid overlay. Use POST /see/zoom to capture a tighter crop with a denser grid. Use POST /see with ocr=true when text localization is needed.

Rules:

  • Start with coarse grid (12x12).
  • For precision, zoom and use denser grid (20x20 or higher).
  • Always use returned meta.region and meta.grid when computing click targets.
  • Coordinates are global desktop coordinates.
  • OCR results are in data.meta.ocr and include confidence, bbox, and center.

Method 2: Interact

Use POST /interact for one action at a time.

Mouse actions:

  • move, click, right_click, double_click, middle_click, scroll
  • click_text (OCR-driven click; optionally scope with click_text.region)

Keyboard actions:

  • type, hotkey

Rules:

  • Prefer grid targets derived from fresh see/see/zoom captures.
  • For text buttons/labels, prefer click_text and bound OCR with a region when possible.
  • Use pixel only when you already have reliable coordinates.
  • After each important action, call see again before continuing.

Method 3: Exec

Use POST /exec only for shell/system tasks.

Rules:

  • Requires x-clickthrough-exec-secret.
  • Do not use exec for normal clicking/typing flows.
  • Prefer GUI interaction first; exec is fallback or explicit shell task.

Lightweight Procedure

  1. see capture.
  2. If needed, see/zoom refine.
  3. interact one step (click_text for text UI targets).
  4. see verify.
  5. Repeat.

Quick Safety Rules

  • Never click with stale screenshots.
  • Never send multiple uncertain clicks in a row.
  • If localization is ambiguous, re-capture with a tighter zoom.