space/clickthrough

Archived

This repository has been archived on 2026-05-20. You can view files and clone it. You cannot open issues or pull requests or push a commit.

Files

Paul Wähner 1c03cab457

python-syntax / syntax-check (push) Successful in 6s

Details

refactor: simplify to see/interact/exec and split server modules

2026-05-03 20:07:12 +02:00

1.6 KiB

Raw Blame History

name: clickthrough-http-control description: Use 3 methods to control a computer: see (screenshot+grid), interact (mouse/keyboard), and exec (shell).

Clickthrough Computer Control

Use exactly 3 methods:

see
interact
exec

Method 1: See

Use POST /see to capture full screen or a region with a grid overlay. Use POST /see/zoom to capture a tighter crop with a denser grid.

Rules:

Start with coarse grid (12x12).
For precision, zoom and use denser grid (20x20 or higher).
Always use returned meta.region and meta.grid when computing click targets.
Coordinates are global desktop coordinates.

Method 2: Interact

Use POST /interact for one action at a time.

Mouse actions:

move, click, right_click, double_click, middle_click, scroll

Keyboard actions:

type, hotkey

Rules:

Prefer grid targets derived from fresh see/see/zoom captures.
Use pixel only when you already have reliable coordinates.
After each important action, call see again before continuing.

Method 3: Exec

Use POST /exec only for shell/system tasks.

Rules:

Requires x-clickthrough-exec-secret.
Do not use exec for normal clicking/typing flows.
Prefer GUI interaction first; exec is fallback or explicit shell task.

Lightweight Procedure

see capture.
If needed, see/zoom refine.
interact one step.
see verify.
Repeat.

Quick Safety Rules

Never click with stale screenshots.
Never send multiple uncertain clicks in a row.
If localization is ambiguous, re-capture with a tighter zoom.