This repository has been archived on 2026-05-20 . You can view files and clone it. You cannot open issues or pull requests or push a commit.
211f38003eec4fa9d03753c336835cbdfc1c546d
All checks were successful
python-syntax / syntax-check (push) Successful in 10s
Clickthrough
Clickthrough is a lightweight HTTP control layer that lets an AI safely operate a real computer by repeatedly capturing structured screenshots with coordinate-aware grids (see), executing precise mouse/keyboard actions from those coordinates (interact), and optionally running authenticated shell commands for system-level tasks (exec) under a consistent response contract.
Core Methods
POST /see: Capture a full screen or region, optionally with a click-ready grid overlay.POST /see/zoom: Capture a tighter crop around a point and draw a denser grid for precise targeting.POST /interact: Perform one mouse or keyboard action (click,scroll,type,hotkey, etc.).POST /exec: Run PowerShell/Bash/CMD commands when shell-level control is needed.
Why this works for AI agents
- Agents do not need live vision; they iterate on snapshots.
- Grid metadata bridges image understanding to deterministic click coordinates.
- Interaction stays explicit and auditable (one action per request).
- A unified response envelope (
ok,data,error) reduces agent-side branching.
Minimal Agent Loop
- Call
seewith a coarse grid. - If uncertain, call
see/zoomwith a denser grid. - Call
interactonce. - Call
seeagain to verify state change. - Use
execonly for explicit shell/system tasks.
Safety and Auth
x-clickthrough-tokenprotects API access when enabled.x-clickthrough-exec-secretis required for/exec.- Optional dry-run and allowed-region constraints reduce accidental risk.
Docs
- API:
docs/API.md - Agent procedure:
skill/SKILL.md - Coordinate system details:
docs/coordinate-system.md
Description
Replaced by screenjob https://gitea.reversed.dev/space/screenjob
https://gitea.reversed.dev/space/screenjob
Languages
Python
100%