89cf228d13bbcf627d4c0357ddadf07741ee0b5a
All checks were successful
python-syntax / syntax-check (push) Successful in 6s
Merge PR #7: add OCR endpoint and skill/docs updates
Clickthrough
Let an Agent interact with your computer over HTTP, with grid-aware screenshots and precise input actions.
What this provides
- Visual endpoints: full-screen capture with optional grid overlay and labeled cells (
asImage=truecan return raw image bytes) - Zoom endpoint: crop around a point with denser grid for fine targeting (
asImage=truesupported) - Action endpoints: move/click/right-click/double-click/middle-click/scroll/type/hotkey
- OCR endpoint: extract text blocks with bounding boxes via
POST /ocr - Command execution endpoint: run PowerShell/Bash/CMD commands via
POST /exec - Coordinate transform metadata in visual responses so agents can map grid cells to real pixels
- Safety knobs: token auth, dry-run mode, optional allowed-region restriction
Quick start
cd /root/external-projects/clickthrough
python3 -m venv .venv
. .venv/bin/activate
pip install -r requirements.txt
CLICKTHROUGH_TOKEN=change-me python -m server.app
Server defaults to 127.0.0.1:8123.
For OCR support, install the native tesseract binary on the host (in addition to Python deps).
python-dotenv is enabled, so values from a repo-root .env file are loaded automatically.
Minimal API flow
GET /screenwith grid- Decide cell / target
- Optional
POST /zoomfor finer targeting POST /actionto executeGET /screenagain to verify result
See:
docs/API.mddocs/coordinate-system.mdskill/SKILL.md
Configuration
Environment variables:
CLICKTHROUGH_HOST(default127.0.0.1)CLICKTHROUGH_PORT(default8123)CLICKTHROUGH_TOKEN(optional; if set, requirex-clickthrough-tokenheader)CLICKTHROUGH_DRY_RUN(true/false; defaultfalse)CLICKTHROUGH_GRID_ROWS(default12)CLICKTHROUGH_GRID_COLS(default12)CLICKTHROUGH_ALLOWED_REGION(optionalx,y,width,height)CLICKTHROUGH_EXEC_ENABLED(defaulttrue)CLICKTHROUGH_EXEC_SECRET(required for/execto run)CLICKTHROUGH_EXEC_DEFAULT_SHELL(defaultpowershell; one ofpowershell,bash,cmd)CLICKTHROUGH_EXEC_TIMEOUT_S(default30)CLICKTHROUGH_EXEC_MAX_TIMEOUT_S(default120)CLICKTHROUGH_EXEC_MAX_OUTPUT_CHARS(default20000)
Gitea CI
A Gitea Actions workflow is included at .gitea/workflows/python-syntax.yml.
It runs Python syntax checks (py_compile) on every push and pull request.
Languages
Python
100%