space/clickthrough

Archived

This repository has been archived on 2026-05-20. You can view files and clone it. You cannot open issues or pull requests or push a commit.

Go to file

Paul Wähner aced5be25e

python-syntax / syntax-check (push) Successful in 7s

Details

feat: migrate to v2-only API and unified response envelope

2026-05-03 19:11:11 +02:00

.gitea/workflows

feat: bootstrap clickthrough server, skill docs, and syntax CI

2026-04-05 19:59:39 +02:00

feat: migrate to v2-only API and unified response envelope

2026-05-03 19:11:11 +02:00

feat: migrate to v2-only API and unified response envelope

2026-05-03 19:11:11 +02:00

feat: migrate to v2-only API and unified response envelope

2026-05-03 19:11:11 +02:00

feat: migrate to v2-only API and unified response envelope

2026-05-03 19:11:11 +02:00

.env.example

fix(ocr): allow configuring tesseract path

2026-04-06 19:02:50 +02:00

.gitignore

feat: bootstrap clickthrough server, skill docs, and syntax CI

2026-04-05 19:59:39 +02:00

LICENSE

docs: add MIT license

2026-04-06 18:31:48 +02:00

pyproject.toml

feat: bootstrap clickthrough server, skill docs, and syntax CI

2026-04-05 19:59:39 +02:00

README.md

feat: migrate to v2-only API and unified response envelope

2026-05-03 19:11:11 +02:00

requirements.txt

feat(ocr): add /ocr endpoint for screen, region, and image input

2026-04-06 13:48:33 +02:00

TODO.md

docs(skill): clarify user-owned instance setup responsibilities

2026-04-05 20:35:35 +02:00

README.md

Clickthrough

Let an agent interact with a computer over HTTP.

Primary mode (v2)

Use the v2 contract for faster, less OCR-heavy control loops:

POST /v2/observe
POST /v2/localize
POST /v2/act
POST /v2/act-verify

This is optimized for agents that cannot directly see the screen and must use screenshot/image tools.

What this provides

Screen/region capture with optional OCR and timing stats
Observation IDs for deterministic follow-up localization
Text localization and image-tool coordinate localization
Action execution with resolved target IDs
Risk-aware action+verification defaults
Unified response envelope across all endpoints

Quick start

cd /root/external-projects/clickthrough
python3 -m venv .venv
. .venv/bin/activate
pip install -r requirements.txt
CLICKTHROUGH_TOKEN=change-me python -m server.app

Server defaults to 127.0.0.1:8123.

Fast control loop

POST /v2/observe on a tight region
If OCR is enough, POST /v2/localize with text_query
If ambiguous, ask image tool for one x,y in observation bounds
POST /v2/localize with image_tool_point
POST /v2/act or POST /v2/act-verify
Re-observe only changed region

See docs

docs/API.md
skill/SKILL.md
docs/coordinate-system.md