This repository has been archived on 2026-05-20 . You can view files and clone it. You cannot open issues or pull requests or push a commit.
aced5be25e476780edcd5ff05fb6e45da925c8ce
All checks were successful
python-syntax / syntax-check (push) Successful in 7s
Clickthrough
Let an agent interact with a computer over HTTP.
Primary mode (v2)
Use the v2 contract for faster, less OCR-heavy control loops:
POST /v2/observePOST /v2/localizePOST /v2/actPOST /v2/act-verify
This is optimized for agents that cannot directly see the screen and must use screenshot/image tools.
What this provides
- Screen/region capture with optional OCR and timing stats
- Observation IDs for deterministic follow-up localization
- Text localization and image-tool coordinate localization
- Action execution with resolved target IDs
- Risk-aware action+verification defaults
- Unified response envelope across all endpoints
Quick start
cd /root/external-projects/clickthrough
python3 -m venv .venv
. .venv/bin/activate
pip install -r requirements.txt
CLICKTHROUGH_TOKEN=change-me python -m server.app
Server defaults to 127.0.0.1:8123.
Fast control loop
POST /v2/observeon a tight region- If OCR is enough,
POST /v2/localizewithtext_query - If ambiguous, ask image tool for one x,y in observation bounds
POST /v2/localizewithimage_tool_pointPOST /v2/actorPOST /v2/act-verify- Re-observe only changed region
See docs
docs/API.mdskill/SKILL.mddocs/coordinate-system.md
Description
Replaced by screenjob https://gitea.reversed.dev/space/screenjob
https://gitea.reversed.dev/space/screenjob
Languages
Python
100%