Files
clickthrough/skill/SKILL.md
Luna 4aa51e2d69
All checks were successful
python-syntax / syntax-check (push) Successful in 29s
feat: bootstrap clickthrough server, skill docs, and syntax CI
2026-04-05 19:59:39 +02:00

1.4 KiB

name, description
name description
clickthrough-http-control Control a local computer through the Clickthrough HTTP server using screenshot grids, zoomed grids, and pointer/keyboard actions. Use when an agent must operate GUI apps by repeatedly capturing the screen, refining target coordinates, and executing precise interactions (click/right-click/double-click/scroll/type/hotkey) with verification.

Clickthrough HTTP Control

Use a strict observe-decide-act-verify loop.

Workflow

  1. Call GET /screen with coarse grid (e.g., 12x12).
  2. Identify likely cell/region for the target UI element.
  3. If confidence is low, call POST /zoom centered on the candidate and use denser grid (e.g., 20x20).
  4. Execute one minimal action via POST /action.
  5. Re-capture with GET /screen and verify the expected state change.
  6. Repeat until objective is complete.

Precision rules

  • Prefer grid targets first, then use dx/dy for subcell precision.
  • Keep dx/dy in [-1,1]; start at 0,0 and only offset when needed.
  • Use zoom before guessing offsets.

Safety rules

  • Respect dry_run and allowed_region restrictions from /health.
  • Avoid destructive shortcuts unless explicitly requested.
  • Send one action at a time unless deterministic; then use /batch.

Reliability rules

  • After every meaningful action, verify with a fresh screenshot.
  • On mismatch, do not spam clicks: zoom, re-localize, and retry once.
  • Prefer short, reversible actions over long macros.