This repository has been archived on 2026-05-20 . You can view files and clone it. You cannot open issues or pull requests or push a commit.
66615c8a81df5d907196777441905e63cf70b9db
Clickthrough
Let an Agent interact with your computer over HTTP, with grid-aware screenshots and precise input actions.
What this provides
- Visual endpoints: full-screen capture with optional grid overlay and labeled cells (
asImage=truecan return raw image bytes) - Zoom endpoint: crop around a point with denser grid for fine targeting (
asImage=truesupported) - Multi-display support: list displays with
GET /displaysand select one with?screen=0,?screen=1, ... - Action endpoints: move/click/right-click/double-click/middle-click/scroll/type/hotkey
- Window lifecycle endpoints: list/focus/restore/minimize/maximize/close windows via
GET /windows+POST /windows/action - Structured launch endpoint: start an app/process without dropping to a shell via
POST /launch - Wait/sync endpoint: poll for text, window, or visual state changes via
POST /wait - Vision helper endpoints: compare screenshots and measure stability via
POST /vision/diffandPOST /vision/stability - OCR endpoints: extract text blocks or search for matching text via
POST /ocrandPOST /ocr/find - Compound verify endpoint: execute an action and wait for a structured success condition via
POST /action/verify - Command execution endpoint: run PowerShell/Bash/CMD commands via
POST /exec - Coordinate transform metadata in visual responses so agents can map grid cells to real pixels
- Safety knobs: token auth, dry-run mode, optional allowed-region restriction
Quick start
cd /root/external-projects/clickthrough
python3 -m venv .venv
. .venv/bin/activate
pip install -r requirements.txt
CLICKTHROUGH_TOKEN=change-me python -m server.app
Server defaults to 127.0.0.1:8123.
For OCR support, install the native tesseract binary on the host (in addition to Python deps), or point CLICKTHROUGH_TESSERACT_CMD at the executable if it lives somewhere weird.
python-dotenv is enabled, so values from a repo-root .env file are loaded automatically.
Minimal API flow
GET /displaysif you need a non-primary monitorGET /screen?screen=0with grid- Decide cell / target
- Optional
POST /zoom?screen=0for finer targeting POST /action?screen=0to execute (orPOST /action/verify?screen=0for a bundled action+wait flow)GET /screen?screen=0again to verify result, or usePOST /wait,POST /vision/diff, orPOST /ocr/find
Important:
POST /actionexpects anactionplus atargetobject; do not send raw top-levelx/yfields.- Pixel coordinates and OCR bounding boxes are always global desktop coordinates.
- The agent does not inherently see the remote desktop; it reasons from screenshots, OCR, and window metadata.
- When OCR is not enough, pair Clickthrough screenshots with OpenClaw's
imagetool for explicit screenshot interpretation. - Prefer structured GUI interaction first; use
/windows,/launch,/wait, and/actionbefore reaching for/exec.
See:
docs/API.mddocs/coordinate-system.mdskill/SKILL.md
Configuration
Environment variables:
CLICKTHROUGH_HOST(default127.0.0.1)CLICKTHROUGH_PORT(default8123)CLICKTHROUGH_TOKEN(optional; if set, requirex-clickthrough-tokenheader)CLICKTHROUGH_DRY_RUN(true/false; defaultfalse)CLICKTHROUGH_GRID_ROWS(default12)CLICKTHROUGH_GRID_COLS(default12)CLICKTHROUGH_ALLOWED_REGION(optionalx,y,width,height)CLICKTHROUGH_EXEC_ENABLED(defaulttrue)CLICKTHROUGH_EXEC_SECRET(required for/execto run)CLICKTHROUGH_EXEC_DEFAULT_SHELL(defaultpowershell; one ofpowershell,bash,cmd)CLICKTHROUGH_EXEC_TIMEOUT_S(default30)CLICKTHROUGH_EXEC_MAX_TIMEOUT_S(default120)CLICKTHROUGH_EXEC_MAX_OUTPUT_CHARS(default20000)CLICKTHROUGH_TESSERACT_CMD(optional path to thetesseractexecutable)
Window management endpoints currently target Windows hosts. On non-Windows hosts they return 501 instead of guessing.
Gitea CI
A Gitea Actions workflow is included at .gitea/workflows/python-syntax.yml.
It runs Python syntax checks (py_compile) on every push and pull request.
Description
Replaced by screenjob https://gitea.reversed.dev/space/screenjob
https://gitea.reversed.dev/space/screenjob
Languages
Python
100%