This repository has been archived on 2026-05-20. You can view files and clone it. You cannot open issues or pull requests or push a commit.
Files
clickthrough/skill/SKILL.md
Space-Banane 22ca0097d1
All checks were successful
python-syntax / syntax-check (push) Successful in 31s
Remove interact verify endpoint
2026-05-04 15:59:43 +02:00

65 lines
1.9 KiB
Markdown

---
name: clickthrough-http-control
description: Use 3 methods to control a computer: see (screenshot+grid), interact (mouse/keyboard), and exec (shell).
---
# Clickthrough Computer Control
Use these methods:
- `see`
- `interact`
- `exec`
## Method 1: See
Use `POST /see` to capture full screen or a region with a grid overlay.
Use `POST /see/zoom` to capture a tighter crop with a denser grid.
Use `POST /see` with `ocr=true` when text localization is needed.
Rules:
- Start with coarse grid (`12x12`).
- For precision, zoom and use denser grid (`20x20` or higher).
- Always use returned `meta.region` and `meta.grid` when computing click targets.
- Coordinates are global desktop coordinates.
- OCR results are in `data.meta.ocr` and include confidence, bbox, and center.
## Method 2: Interact
Use `POST /interact` for one action at a time.
Mouse actions:
- `move`, `click`, `right_click`, `double_click`, `middle_click`, `scroll`
- `click_text` (OCR-driven click; optionally scope with `click_text.region`)
Keyboard actions:
- `type`, `hotkey`
Rules:
- Prefer `grid` targets derived from fresh `see`/`see/zoom` captures.
- For text buttons/labels, prefer `click_text` and bound OCR with a region when possible.
- Use `pixel` only when you already have reliable coordinates.
- After each important action, call `see` again before continuing.
## Method 3: Exec
Use `POST /exec` only for shell/system tasks.
Rules:
- Requires `x-clickthrough-exec-secret`.
- Do not use exec for normal clicking/typing flows.
- Prefer GUI interaction first; exec is fallback or explicit shell task.
## Lightweight Procedure
1. `see` capture.
2. If needed, `see/zoom` refine.
3. `interact` one step (`click_text` for text UI targets).
4. `see` verify.
5. Repeat.
## Quick Safety Rules
- Never click with stale screenshots.
- Never send multiple uncertain clicks in a row.
- If localization is ambiguous, re-capture with a tighter zoom.