Clickthrough

Let an Agent interact with your computer over HTTP, with grid-aware screenshots and precise input actions.

What this provides

  • Visual endpoints: full-screen capture with optional grid overlay and labeled cells
  • Zoom endpoint: crop around a point with denser grid for fine targeting
  • Action endpoints: move/click/right-click/double-click/middle-click/scroll/type/hotkey
  • Coordinate transform metadata in visual responses so agents can map grid cells to real pixels
  • Safety knobs: token auth, dry-run mode, optional allowed-region restriction

Quick start

cd /root/external-projects/clickthrough
python3 -m venv .venv
. .venv/bin/activate
pip install -r requirements.txt
CLICKTHROUGH_TOKEN=change-me python -m server.app

Server defaults to 127.0.0.1:8123.

Minimal API flow

  1. GET /screen with grid
  2. Decide cell / target
  3. Optional POST /zoom for finer targeting
  4. POST /action to execute
  5. GET /screen again to verify result

See:

  • docs/API.md
  • docs/coordinate-system.md
  • skill/SKILL.md

Configuration

Environment variables:

  • CLICKTHROUGH_HOST (default 127.0.0.1)
  • CLICKTHROUGH_PORT (default 8123)
  • CLICKTHROUGH_TOKEN (optional; if set, require x-clickthrough-token header)
  • CLICKTHROUGH_DRY_RUN (true/false; default false)
  • CLICKTHROUGH_GRID_ROWS (default 12)
  • CLICKTHROUGH_GRID_COLS (default 12)
  • CLICKTHROUGH_ALLOWED_REGION (optional x,y,width,height)

Gitea CI

A Gitea Actions workflow is included at .gitea/workflows/python-syntax.yml. It runs Python syntax checks (py_compile) on every push and pull request.

Description
No description provided
Readme MIT 146 KiB
Languages
Python 100%