This repository has been archived on 2026-05-20. You can view files and clone it. You cannot open issues or pull requests or push a commit.

Clickthrough

Let an agent interact with a computer over HTTP.

Primary mode (v2)

Use the v2 contract for faster, less OCR-heavy control loops:

  • POST /v2/observe
  • POST /v2/localize
  • POST /v2/act
  • POST /v2/act-verify

This is optimized for agents that cannot directly see the screen and must use screenshot/image tools.

What this provides

  • Screen/region capture with optional OCR and timing stats
  • Observation IDs for deterministic follow-up localization
  • Text localization and image-tool coordinate localization
  • Action execution with resolved target IDs
  • Risk-aware action+verification defaults
  • Unified response envelope across all endpoints

Quick start

cd /root/external-projects/clickthrough
python3 -m venv .venv
. .venv/bin/activate
pip install -r requirements.txt
CLICKTHROUGH_TOKEN=change-me python -m server.app

Server defaults to 127.0.0.1:8123.

Fast control loop

  1. POST /v2/observe on a tight region
  2. If OCR is enough, POST /v2/localize with text_query
  3. If ambiguous, ask image tool for one x,y in observation bounds
  4. POST /v2/localize with image_tool_point
  5. POST /v2/act or POST /v2/act-verify
  6. Re-observe only changed region

See docs

  • docs/API.md
  • skill/SKILL.md
  • docs/coordinate-system.md
Languages
Python 100%