Files
clickthrough/skill/SKILL.md
Luna 3a49560e82
All checks were successful
python-syntax / syntax-check (push) Successful in 4s
docs(skill): add instance setup and mini API quick reference
2026-04-05 20:34:14 +02:00

3.9 KiB

name, description
name description
clickthrough-http-control Control a local computer through the Clickthrough HTTP server using screenshot grids, zoomed grids, and pointer/keyboard actions. Use when an agent must operate GUI apps by repeatedly capturing the screen, refining target coordinates, and executing precise interactions (click/right-click/double-click/scroll/type/hotkey) with verification.

Clickthrough HTTP Control

Use a strict observe-decide-act-verify loop.

Getting a computer instance (quick setup)

  1. Start Clickthrough on the target computer (default: 127.0.0.1:8123).
  2. Expose it to the agent host (LAN/Tailscale/reverse proxy) and note the base URL.
  3. Set auth on the target machine:
    • CLICKTHROUGH_TOKEN for general API auth
    • CLICKTHROUGH_EXEC_SECRET for /exec calls
  4. Verify connectivity from the agent side:
    • GET /health with x-clickthrough-token header
  5. Store connection details for reuse:
    • base_url
    • x-clickthrough-token
    • x-clickthrough-exec-secret (only when using /exec)

Mini API map

  • GET /health → server status + safety flags
  • GET /screen → full screenshot (JSON with base64 by default, or raw image with asImage=true)
  • POST /zoom → cropped screenshot around point/region (also supports asImage=true)
  • POST /action → single interaction (move, click, scroll, type, hotkey, ...)
  • POST /batch → sequential action list
  • POST /exec → PowerShell/Bash/CMD command execution (requires configured exec secret + header)

Header requirements

  • Always send x-clickthrough-token when token auth is enabled.
  • For /exec, also send x-clickthrough-exec-secret.

Core workflow (mandatory)

  1. Call GET /screen with coarse grid (e.g., 12x12).
  2. Identify likely target region and compute an initial confidence score.
  3. If confidence < 0.85, call POST /zoom with denser grid (e.g., 20x20) and re-evaluate.
  4. Before any click, verify target identity (text/icon/location consistency).
  5. Execute one minimal action via POST /action.
  6. Re-capture with GET /screen and verify the expected state change.
  7. Repeat until objective is complete.

Verify-before-click rules

  • Never click if target identity is ambiguous.
  • Require at least two matching signals before click (example: expected text + expected UI region).
  • If confidence is low, do not "test click"; zoom and re-localize first.
  • For high-impact actions (close/delete/send/purchase), use two-phase flow:
    1. preview intended coordinate + reason
    2. execute only after explicit confirmation.

Precision rules

  • Prefer grid targets first, then use dx/dy for subcell precision.
  • Keep dx/dy in [-1,1]; start at 0,0 and only offset when needed.
  • Use zoom before guessing offsets.
  • Avoid stale coordinates: re-capture before action if UI moved/scrolled.

Safety rules

  • Respect dry_run and allowed_region restrictions from /health.
  • Respect /exec security requirements (CLICKTHROUGH_EXEC_SECRET + x-clickthrough-exec-secret).
  • Avoid destructive shortcuts unless explicitly requested.
  • Send one action at a time unless deterministic; then use /batch.

Reliability rules

  • After every meaningful action, verify with a fresh screenshot.
  • On mismatch, do not spam clicks: zoom, re-localize, and retry once.
  • Prefer short, reversible actions over long macros.
  • If two retries fail, switch strategy (hotkey/window focus/search) instead of repeating the same click.

Build per-app routines for repetitive tasks instead of generic clicking.

Spotify playbook

  • Focus app window before search/navigation.
  • Prefer keyboard-first flow for song start:
    1. Ctrl+L (search)
    2. type exact query
    3. Enter
    4. verify exact song+artist text
    5. click/double-click row
    6. verify now-playing bar
  • If now-playing does not match target track, stop and re-localize; do not keep clicking nearby rows.