All checks were successful
python-syntax / syntax-check (push) Successful in 7s
50 lines
1.3 KiB
Markdown
50 lines
1.3 KiB
Markdown
# Clickthrough
|
|
|
|
Let an agent interact with a computer over HTTP.
|
|
|
|
## Primary mode (v2)
|
|
|
|
Use the v2 contract for faster, less OCR-heavy control loops:
|
|
- `POST /v2/observe`
|
|
- `POST /v2/localize`
|
|
- `POST /v2/act`
|
|
- `POST /v2/act-verify`
|
|
|
|
This is optimized for agents that cannot directly see the screen and must use screenshot/image tools.
|
|
|
|
## What this provides
|
|
|
|
- Screen/region capture with optional OCR and timing stats
|
|
- Observation IDs for deterministic follow-up localization
|
|
- Text localization and image-tool coordinate localization
|
|
- Action execution with resolved target IDs
|
|
- Risk-aware action+verification defaults
|
|
- Unified response envelope across all endpoints
|
|
|
|
## Quick start
|
|
|
|
```bash
|
|
cd /root/external-projects/clickthrough
|
|
python3 -m venv .venv
|
|
. .venv/bin/activate
|
|
pip install -r requirements.txt
|
|
CLICKTHROUGH_TOKEN=change-me python -m server.app
|
|
```
|
|
|
|
Server defaults to `127.0.0.1:8123`.
|
|
|
|
## Fast control loop
|
|
|
|
1. `POST /v2/observe` on a tight region
|
|
2. If OCR is enough, `POST /v2/localize` with `text_query`
|
|
3. If ambiguous, ask image tool for one x,y in observation bounds
|
|
4. `POST /v2/localize` with `image_tool_point`
|
|
5. `POST /v2/act` or `POST /v2/act-verify`
|
|
6. Re-observe only changed region
|
|
|
|
## See docs
|
|
|
|
- `docs/API.md`
|
|
- `skill/SKILL.md`
|
|
- `docs/coordinate-system.md`
|