# Clickthrough Let an Agent interact with your computer over HTTP, with grid-aware screenshots and precise input actions. ## What this provides - **Visual endpoints**: full-screen capture with optional grid overlay and labeled cells (`asImage=true` can return raw image bytes) - **Zoom endpoint**: crop around a point with denser grid for fine targeting (`asImage=true` supported) - **Action endpoints**: move/click/right-click/double-click/middle-click/scroll/type/hotkey - **OCR endpoint**: extract text blocks with bounding boxes via `POST /ocr` - **Command execution endpoint**: run PowerShell/Bash/CMD commands via `POST /exec` - **Coordinate transform metadata** in visual responses so agents can map grid cells to real pixels - **Safety knobs**: token auth, dry-run mode, optional allowed-region restriction ## Quick start ```bash cd /root/external-projects/clickthrough python3 -m venv .venv . .venv/bin/activate pip install -r requirements.txt CLICKTHROUGH_TOKEN=change-me python -m server.app ``` Server defaults to `127.0.0.1:8123`. For OCR support, install the native `tesseract` binary on the host (in addition to Python deps), or point `CLICKTHROUGH_TESSERACT_CMD` at the executable if it lives somewhere weird. `python-dotenv` is enabled, so values from a repo-root `.env` file are loaded automatically. ## Minimal API flow 1. `GET /screen` with grid 2. Decide cell / target 3. Optional `POST /zoom` for finer targeting 4. `POST /action` to execute 5. `GET /screen` again to verify result See: - `docs/API.md` - `docs/coordinate-system.md` - `skill/SKILL.md` ## Configuration Environment variables: - `CLICKTHROUGH_HOST` (default `127.0.0.1`) - `CLICKTHROUGH_PORT` (default `8123`) - `CLICKTHROUGH_TOKEN` (optional; if set, require `x-clickthrough-token` header) - `CLICKTHROUGH_DRY_RUN` (`true`/`false`; default `false`) - `CLICKTHROUGH_GRID_ROWS` (default `12`) - `CLICKTHROUGH_GRID_COLS` (default `12`) - `CLICKTHROUGH_ALLOWED_REGION` (optional `x,y,width,height`) - `CLICKTHROUGH_EXEC_ENABLED` (default `true`) - `CLICKTHROUGH_EXEC_SECRET` (**required for `/exec` to run**) - `CLICKTHROUGH_EXEC_DEFAULT_SHELL` (default `powershell`; one of `powershell`, `bash`, `cmd`) - `CLICKTHROUGH_EXEC_TIMEOUT_S` (default `30`) - `CLICKTHROUGH_EXEC_MAX_TIMEOUT_S` (default `120`) - `CLICKTHROUGH_EXEC_MAX_OUTPUT_CHARS` (default `20000`) - `CLICKTHROUGH_TESSERACT_CMD` (optional path to the `tesseract` executable) ## Gitea CI A Gitea Actions workflow is included at `.gitea/workflows/python-syntax.yml`. It runs Python syntax checks (`py_compile`) on every push and pull request.