This repository has been archived on 2026-05-20. You can view files and clone it. You cannot open issues or pull requests or push a commit.
Files
clickthrough/skill/SKILL.md
Paul Wähner 1c03cab457
All checks were successful
python-syntax / syntax-check (push) Successful in 6s
refactor: simplify to see/interact/exec and split server modules
2026-05-03 20:07:12 +02:00

1.6 KiB

name: clickthrough-http-control description: Use 3 methods to control a computer: see (screenshot+grid), interact (mouse/keyboard), and exec (shell).

Clickthrough Computer Control

Use exactly 3 methods:

  • see
  • interact
  • exec

Method 1: See

Use POST /see to capture full screen or a region with a grid overlay. Use POST /see/zoom to capture a tighter crop with a denser grid.

Rules:

  • Start with coarse grid (12x12).
  • For precision, zoom and use denser grid (20x20 or higher).
  • Always use returned meta.region and meta.grid when computing click targets.
  • Coordinates are global desktop coordinates.

Method 2: Interact

Use POST /interact for one action at a time.

Mouse actions:

  • move, click, right_click, double_click, middle_click, scroll

Keyboard actions:

  • type, hotkey

Rules:

  • Prefer grid targets derived from fresh see/see/zoom captures.
  • Use pixel only when you already have reliable coordinates.
  • After each important action, call see again before continuing.

Method 3: Exec

Use POST /exec only for shell/system tasks.

Rules:

  • Requires x-clickthrough-exec-secret.
  • Do not use exec for normal clicking/typing flows.
  • Prefer GUI interaction first; exec is fallback or explicit shell task.

Lightweight Procedure

  1. see capture.
  2. If needed, see/zoom refine.
  3. interact one step.
  4. see verify.
  5. Repeat.

Quick Safety Rules

  • Never click with stale screenshots.
  • Never send multiple uncertain clicks in a row.
  • If localization is ambiguous, re-capture with a tighter zoom.