All checks were successful
python-syntax / syntax-check (push) Successful in 6s
61 lines
1.6 KiB
Markdown
61 lines
1.6 KiB
Markdown
---
|
|
name: clickthrough-http-control
|
|
description: Use 3 methods to control a computer: see (screenshot+grid), interact (mouse/keyboard), and exec (shell).
|
|
---
|
|
|
|
# Clickthrough Computer Control
|
|
|
|
Use exactly 3 methods:
|
|
- `see`
|
|
- `interact`
|
|
- `exec`
|
|
|
|
## Method 1: See
|
|
|
|
Use `POST /see` to capture full screen or a region with a grid overlay.
|
|
Use `POST /see/zoom` to capture a tighter crop with a denser grid.
|
|
|
|
Rules:
|
|
- Start with coarse grid (`12x12`).
|
|
- For precision, zoom and use denser grid (`20x20` or higher).
|
|
- Always use returned `meta.region` and `meta.grid` when computing click targets.
|
|
- Coordinates are global desktop coordinates.
|
|
|
|
## Method 2: Interact
|
|
|
|
Use `POST /interact` for one action at a time.
|
|
|
|
Mouse actions:
|
|
- `move`, `click`, `right_click`, `double_click`, `middle_click`, `scroll`
|
|
|
|
Keyboard actions:
|
|
- `type`, `hotkey`
|
|
|
|
Rules:
|
|
- Prefer `grid` targets derived from fresh `see`/`see/zoom` captures.
|
|
- Use `pixel` only when you already have reliable coordinates.
|
|
- After each important action, call `see` again before continuing.
|
|
|
|
## Method 3: Exec
|
|
|
|
Use `POST /exec` only for shell/system tasks.
|
|
|
|
Rules:
|
|
- Requires `x-clickthrough-exec-secret`.
|
|
- Do not use exec for normal clicking/typing flows.
|
|
- Prefer GUI interaction first; exec is fallback or explicit shell task.
|
|
|
|
## Lightweight Procedure
|
|
|
|
1. `see` capture.
|
|
2. If needed, `see/zoom` refine.
|
|
3. `interact` one step.
|
|
4. `see` verify.
|
|
5. Repeat.
|
|
|
|
## Quick Safety Rules
|
|
|
|
- Never click with stale screenshots.
|
|
- Never send multiple uncertain clicks in a row.
|
|
- If localization is ambiguous, re-capture with a tighter zoom.
|