Files
clickthrough/README.md
Luna 097c6a095c
All checks were successful
python-syntax / syntax-check (push) Successful in 5s
python-syntax / syntax-check (pull_request) Successful in 4s
feat(ocr): add /ocr endpoint for screen, region, and image input
2026-04-06 13:48:33 +02:00

2.4 KiB

Clickthrough

Let an Agent interact with your computer over HTTP, with grid-aware screenshots and precise input actions.

What this provides

  • Visual endpoints: full-screen capture with optional grid overlay and labeled cells (asImage=true can return raw image bytes)
  • Zoom endpoint: crop around a point with denser grid for fine targeting (asImage=true supported)
  • Action endpoints: move/click/right-click/double-click/middle-click/scroll/type/hotkey
  • OCR endpoint: extract text blocks with bounding boxes via POST /ocr
  • Command execution endpoint: run PowerShell/Bash/CMD commands via POST /exec
  • Coordinate transform metadata in visual responses so agents can map grid cells to real pixels
  • Safety knobs: token auth, dry-run mode, optional allowed-region restriction

Quick start

cd /root/external-projects/clickthrough
python3 -m venv .venv
. .venv/bin/activate
pip install -r requirements.txt
CLICKTHROUGH_TOKEN=change-me python -m server.app

Server defaults to 127.0.0.1:8123.

For OCR support, install the native tesseract binary on the host (in addition to Python deps).

python-dotenv is enabled, so values from a repo-root .env file are loaded automatically.

Minimal API flow

  1. GET /screen with grid
  2. Decide cell / target
  3. Optional POST /zoom for finer targeting
  4. POST /action to execute
  5. GET /screen again to verify result

See:

  • docs/API.md
  • docs/coordinate-system.md
  • skill/SKILL.md

Configuration

Environment variables:

  • CLICKTHROUGH_HOST (default 127.0.0.1)
  • CLICKTHROUGH_PORT (default 8123)
  • CLICKTHROUGH_TOKEN (optional; if set, require x-clickthrough-token header)
  • CLICKTHROUGH_DRY_RUN (true/false; default false)
  • CLICKTHROUGH_GRID_ROWS (default 12)
  • CLICKTHROUGH_GRID_COLS (default 12)
  • CLICKTHROUGH_ALLOWED_REGION (optional x,y,width,height)
  • CLICKTHROUGH_EXEC_ENABLED (default true)
  • CLICKTHROUGH_EXEC_SECRET (required for /exec to run)
  • CLICKTHROUGH_EXEC_DEFAULT_SHELL (default powershell; one of powershell, bash, cmd)
  • CLICKTHROUGH_EXEC_TIMEOUT_S (default 30)
  • CLICKTHROUGH_EXEC_MAX_TIMEOUT_S (default 120)
  • CLICKTHROUGH_EXEC_MAX_OUTPUT_CHARS (default 20000)

Gitea CI

A Gitea Actions workflow is included at .gitea/workflows/python-syntax.yml. It runs Python syntax checks (py_compile) on every push and pull request.