All checks were successful
python-syntax / syntax-check (push) Successful in 6s
38 lines
1.7 KiB
Markdown
38 lines
1.7 KiB
Markdown
# Clickthrough
|
|
|
|
Clickthrough is a lightweight HTTP control layer that lets an AI safely operate a real computer by repeatedly capturing structured screenshots with coordinate-aware grids (`see`), executing precise mouse/keyboard actions from those coordinates (`interact`), and optionally running authenticated shell commands for system-level tasks (`exec`) under a consistent response contract.
|
|
|
|
## Core Methods
|
|
|
|
- `POST /see`: Capture a full screen or region, optionally with a click-ready grid overlay.
|
|
- `POST /see/zoom`: Capture a tighter crop around a point and draw a denser grid for precise targeting.
|
|
- `POST /interact`: Perform one mouse or keyboard action (`click`, `scroll`, `type`, `hotkey`, etc.).
|
|
- `POST /exec`: Run PowerShell/Bash/CMD commands when shell-level control is needed.
|
|
|
|
## Why this works for AI agents
|
|
|
|
- Agents do not need live vision; they iterate on snapshots.
|
|
- Grid metadata bridges image understanding to deterministic click coordinates.
|
|
- Interaction stays explicit and auditable (one action per request).
|
|
- A unified response envelope (`ok`, `data`, `error`) reduces agent-side branching.
|
|
|
|
## Minimal Agent Loop
|
|
|
|
1. Call `see` with a coarse grid.
|
|
2. If uncertain, call `see/zoom` with a denser grid.
|
|
3. Call `interact` once.
|
|
4. Call `see` again to verify state change.
|
|
5. Use `exec` only for explicit shell/system tasks.
|
|
|
|
## Safety and Auth
|
|
|
|
- `x-clickthrough-token` protects API access when enabled.
|
|
- `x-clickthrough-exec-secret` is required for `/exec`.
|
|
- Optional dry-run and allowed-region constraints reduce accidental risk.
|
|
|
|
## Docs
|
|
|
|
- API: `docs/API.md`
|
|
- Agent procedure: `skill/SKILL.md`
|
|
- Coordinate system details: `docs/coordinate-system.md`
|