clickthrough/README.md

# Clickthrough

Let an agent interact with a computer over HTTP.

## Primary mode (v2)

Use the v2 contract for faster, less OCR-heavy control loops:
- `POST /v2/observe`
- `POST /v2/localize`
- `POST /v2/act`
- `POST /v2/act-verify`

This is optimized for agents that cannot directly see the screen and must use screenshot/image tools.

## What this provides

- Screen/region capture with optional OCR and timing stats
- Observation IDs for deterministic follow-up localization
- Text localization and image-tool coordinate localization
- Action execution with resolved target IDs
- Risk-aware action+verification defaults
- Unified response envelope across all endpoints

## Quick start

```bash
cd /root/external-projects/clickthrough
python3 -m venv .venv
. .venv/bin/activate
pip install -r requirements.txt
CLICKTHROUGH_TOKEN=change-me python -m server.app
```

Server defaults to `127.0.0.1:8123`.

## Fast control loop

1. `POST /v2/observe` on a tight region
2. If OCR is enough, `POST /v2/localize` with `text_query`
3. If ambiguous, ask image tool for one x,y in observation bounds
4. `POST /v2/localize` with `image_tool_point`
5. `POST /v2/act` or `POST /v2/act-verify`
6. Re-observe only changed region

## See docs

- `docs/API.md`
- `skill/SKILL.md`
- `docs/coordinate-system.md`