This repository has been archived on 2026-05-20. You can view files and clone it. You cannot open issues or pull requests or push a commit.
Files
clickthrough/README.md
Luna 1429e90be2
All checks were successful
python-syntax / syntax-check (push) Successful in 43s
docs: tighten clickthrough skill and API guidance
Closes #10
2026-05-01 15:40:38 +02:00

74 lines
3.0 KiB
Markdown

# Clickthrough
Let an Agent interact with your computer over HTTP, with grid-aware screenshots and precise input actions.
## What this provides
- **Visual endpoints**: full-screen capture with optional grid overlay and labeled cells (`asImage=true` can return raw image bytes)
- **Zoom endpoint**: crop around a point with denser grid for fine targeting (`asImage=true` supported)
- **Multi-display support**: list displays with `GET /displays` and select one with `?screen=0`, `?screen=1`, ...
- **Action endpoints**: move/click/right-click/double-click/middle-click/scroll/type/hotkey
- **OCR endpoint**: extract text blocks with bounding boxes via `POST /ocr`
- **Command execution endpoint**: run PowerShell/Bash/CMD commands via `POST /exec`
- **Coordinate transform metadata** in visual responses so agents can map grid cells to real pixels
- **Safety knobs**: token auth, dry-run mode, optional allowed-region restriction
## Quick start
```bash
cd /root/external-projects/clickthrough
python3 -m venv .venv
. .venv/bin/activate
pip install -r requirements.txt
CLICKTHROUGH_TOKEN=change-me python -m server.app
```
Server defaults to `127.0.0.1:8123`.
For OCR support, install the native `tesseract` binary on the host (in addition to Python deps), or point `CLICKTHROUGH_TESSERACT_CMD` at the executable if it lives somewhere weird.
`python-dotenv` is enabled, so values from a repo-root `.env` file are loaded automatically.
## Minimal API flow
1. `GET /displays` if you need a non-primary monitor
2. `GET /screen?screen=0` with grid
3. Decide cell / target
4. Optional `POST /zoom?screen=0` for finer targeting
5. `POST /action?screen=0` to execute
6. `GET /screen?screen=0` again to verify result
Important:
- `POST /action` expects an `action` plus a `target` object; do not send raw top-level `x` / `y` fields.
- Pixel coordinates and OCR bounding boxes are always global desktop coordinates.
- Prefer structured GUI interaction first; use `/exec` for launch, recovery, or explicit system-level tasks.
See:
- `docs/API.md`
- `docs/coordinate-system.md`
- `skill/SKILL.md`
## Configuration
Environment variables:
- `CLICKTHROUGH_HOST` (default `127.0.0.1`)
- `CLICKTHROUGH_PORT` (default `8123`)
- `CLICKTHROUGH_TOKEN` (optional; if set, require `x-clickthrough-token` header)
- `CLICKTHROUGH_DRY_RUN` (`true`/`false`; default `false`)
- `CLICKTHROUGH_GRID_ROWS` (default `12`)
- `CLICKTHROUGH_GRID_COLS` (default `12`)
- `CLICKTHROUGH_ALLOWED_REGION` (optional `x,y,width,height`)
- `CLICKTHROUGH_EXEC_ENABLED` (default `true`)
- `CLICKTHROUGH_EXEC_SECRET` (**required for `/exec` to run**)
- `CLICKTHROUGH_EXEC_DEFAULT_SHELL` (default `powershell`; one of `powershell`, `bash`, `cmd`)
- `CLICKTHROUGH_EXEC_TIMEOUT_S` (default `30`)
- `CLICKTHROUGH_EXEC_MAX_TIMEOUT_S` (default `120`)
- `CLICKTHROUGH_EXEC_MAX_OUTPUT_CHARS` (default `20000`)
- `CLICKTHROUGH_TESSERACT_CMD` (optional path to the `tesseract` executable)
## Gitea CI
A Gitea Actions workflow is included at `.gitea/workflows/python-syntax.yml`.
It runs Python syntax checks (`py_compile`) on every push and pull request.