Archived

This repository has been archived on 2026-05-20. You can view files and clone it. You cannot open issues or pull requests or push a commit.

Files

Luna 5122d416e8

python-syntax / syntax-check (push) Successful in 7s

Details

feat(wait): add structured wait endpoint

2026-05-01 15:55:29 +02:00

10 KiB

Raw Blame History

API Reference (v0.1)

Base URL: http://127.0.0.1:8123

If CLICKTHROUGH_TOKEN is set, include header:

x-clickthrough-token: <token>

`GET /health`

Returns status and runtime safety flags, including exec capability config.

`GET /displays`

Returns detected displays in API screen order.

{
  "ok": true,
  "default_screen": 0,
  "displays": [
    {"screen": 0, "mss_index": 1, "primary": true, "x": 0, "y": 0, "width": 1920, "height": 1080},
    {"screen": 1, "mss_index": 2, "primary": false, "x": 1920, "y": 0, "width": 1920, "height": 1080}
  ]
}

screen is zero-based. screen=0 is the primary display when detectable, falling back to the first monitor reported by the capture backend. Invalid screen values fall back to 0.

`GET /screen`

Query params:

screen (int, default 0) — zero-based display selector; invalid values fall back to 0
with_grid (bool, default true)
grid_rows (int, default env or 12)
grid_cols (int, default env or 12)
include_labels (bool, default true)
image_format (png|jpeg, default png)
jpeg_quality (1-100, default 85)
asImage (bool, default false) - if true, return raw image bytes only (image/png or image/jpeg)

Default response includes base64 image and metadata (meta.region, meta.screen, meta.displays, optional meta.grid). meta.region uses global desktop coordinates.

`POST /zoom`

Body:

{
  "center_x": 1200,
  "center_y": 700,
  "width": 500,
  "height": 350,
  "with_grid": true,
  "grid_rows": 20,
  "grid_cols": 20,
  "include_labels": true,
  "image_format": "png",
  "jpeg_quality": 90
}

Query params:

screen (int, default 0) - zero-based display selector; invalid values fall back to 0
asImage (bool, default false) - if true, return raw image bytes only (image/png or image/jpeg)

Default response returns cropped image + region metadata in global pixel coordinates. center_x and center_y are also global coordinates; use the selected display's meta.region from /screen?screen=X as the coordinate base.

`POST /action`

Body: one action.

Important:

the request body uses action plus an optional target
pixel coordinates live inside target when target.mode="pixel"
do not send top-level x / y fields

Query params:

screen (int, default 0) - zero-based display selector included in the response metadata; invalid values fall back to 0

Pointer coordinates remain global desktop coordinates. For multi-display actions, first capture /screen?screen=X and use that response's meta.region or grid metadata to compute the target.

Pointer target modes

Pixel target

{
  "mode": "pixel",
  "x": 100,
  "y": 200,
  "dx": 0,
  "dy": 0
}

Grid target

{
  "mode": "grid",
  "region_x": 0,
  "region_y": 0,
  "region_width": 1920,
  "region_height": 1080,
  "rows": 12,
  "cols": 12,
  "row": 5,
  "col": 9,
  "dx": 0.0,
  "dy": 0.0
}

dx/dy are normalized offsets in [-1, 1] inside the selected cell.

Action examples

Click:

{
  "action": "click",
  "target": {
    "mode": "grid",
    "region_x": 0,
    "region_y": 0,
    "region_width": 1920,
    "region_height": 1080,
    "rows": 12,
    "cols": 12,
    "row": 7,
    "col": 3,
    "dx": 0.2,
    "dy": -0.1
  },
  "clicks": 1,
  "button": "left"
}

Scroll:

{
  "action": "scroll",
  "target": {"mode": "pixel", "x": 1300, "y": 740},
  "scroll_amount": -500
}

Type text:

{
  "action": "type",
  "text": "hello world",
  "interval_ms": 20
}

Hotkey:

{
  "action": "hotkey",
  "keys": ["ctrl", "l"]
}

Right click:

{
  "action": "right_click",
  "target": {"mode": "pixel", "x": 1300, "y": 740}
}

Move only:

{
  "action": "move",
  "target": {"mode": "pixel", "x": 1300, "y": 740},
  "duration_ms": 150
}

`GET /windows`

List desktop windows using structured filters instead of shelling out.

Query params:

title_contains (optional substring match)
title_regex (optional case-insensitive regex)
process_name (optional exact process name, e.g. explorer.exe)
hwnd (optional exact window handle)
visible_only (bool, default true)

{
  "ok": true,
  "count": 1,
  "windows": [
    {
      "hwnd": 132640,
      "title": "WinDirStat",
      "class_name": "WinDirStatMainWindow",
      "pid": 18420,
      "process_name": "windirstat.exe",
      "visible": true,
      "enabled": true,
      "minimized": false,
      "maximized": false,
      "foreground": true,
      "rect": {"x": 194, "y": 116, "width": 1532, "height": 870}
    }
  ]
}

Notes:

Currently supported on Windows hosts only.
Returns 409 for ambiguous write-target matches when a mutation endpoint would affect multiple windows.

`POST /windows/action`

Perform a structured window action against exactly one matched window.

{
  "action": "focus",
  "title_contains": "WinDirStat",
  "visible_only": true,
  "timeout_ms": 3000
}

Supported actions:

focus
restore
minimize
maximize
close

The response includes the matched pre-action window and the final observed window state (or closed=true if it disappeared).

`POST /launch`

Start an app/process without invoking a shell.

{
  "executable": "C:/Program Files/WinDirStat/WinDirStat.exe",
  "args": [],
  "cwd": "C:/Program Files/WinDirStat",
  "wait_for_window": true,
  "match": {
    "title_contains": "WinDirStat",
    "visible_only": true
  },
  "timeout_ms": 8000
}

Notes:

Launch uses direct process execution (subprocess.Popen) rather than PowerShell/CMD.
If wait_for_window=true, the server polls for a matching window and returns window_found.
dry_run=true returns the resolved argv/cwd without launching.

`POST /wait`

Wait on a structured UI condition instead of guessing sleep durations.

Query params:

screen (int, default 0) - used for text and visual waits

Wait for text to appear

{
  "condition": {
    "kind": "text",
    "mode": "screen",
    "text": "Scan complete",
    "match": "contains",
    "present": true,
    "language_hint": "eng",
    "min_confidence": 0.4
  },
  "timeout_ms": 15000,
  "poll_interval_ms": 400
}

Wait for a window state

{
  "condition": {
    "kind": "window",
    "title_contains": "WinDirStat",
    "visible_only": true,
    "state": "focused"
  },
  "timeout_ms": 5000,
  "poll_interval_ms": 200
}

Window states:

exists
focused
closed

Wait for visual change or stability

{
  "condition": {
    "kind": "visual",
    "state": "stable",
    "region_x": 0,
    "region_y": 0,
    "region_width": 1920,
    "region_height": 1080,
    "diff_threshold": 0.005,
    "stable_for_ms": 1000
  },
  "timeout_ms": 12000,
  "poll_interval_ms": 300
}

Visual states:

change — succeeds when the average pixel diff crosses diff_threshold
stable — succeeds when the diff stays at or below diff_threshold for stable_for_ms

Notes:

Text waits reuse the OCR pipeline and return matching OCR blocks on success.
Window waits build on the structured window discovery endpoint.
Visual waits compare repeated captures of either the full selected display or an explicit region.

`POST /ocr`

Extract visible text from either a full screenshot, a region crop, or caller-provided image bytes.

Query params:

screen (int, default 0) - zero-based display selector for mode=screen and mode=region; invalid values fall back to 0

Body:

{
  "mode": "screen",
  "language_hint": "eng",
  "min_confidence": 0.4
}

Modes:

screen (default): OCR over full selected monitor
region: OCR over explicit region (region_x, region_y, region_width, region_height)
image: OCR over provided image_base64 (supports plain base64 or data URL)

Region mode example:

{
  "mode": "region",
  "region_x": 220,
  "region_y": 160,
  "region_width": 900,
  "region_height": 400,
  "language_hint": "eng",
  "min_confidence": 0.5
}

Image mode example:

{
  "mode": "image",
  "image_base64": "iVBORw0KGgoAAAANSUhEUgAA...",
  "language_hint": "eng"
}

Response shape:

{
  "ok": true,
  "request_id": "...",
  "time_ms": 1710000000000,
  "result": {
    "mode": "screen",
    "language_hint": "eng",
    "min_confidence": 0.4,
    "region": {"x": 0, "y": 0, "width": 1920, "height": 1080},
    "blocks": [
      {
        "text": "Settings",
        "confidence": 0.9821,
        "bbox": {"x": 144, "y": 92, "width": 96, "height": 21}
      }
    ]
  }
}

Notes:

Output is deterministic JSON (stable ordering by top-to-bottom, then left-to-right).
bbox coordinates are in global screen space for screen/region, and image-local for image.
Requires tesseract executable plus Python package pytesseract.
If tesseract is not on PATH, set CLICKTHROUGH_TESSERACT_CMD to the full executable path.

`POST /exec`

Execute a shell command on the host running Clickthrough.

Requirements:

CLICKTHROUGH_EXEC_SECRET must be configured on the server
send header x-clickthrough-exec-secret: <secret>

{
  "command": "Get-Process | Select-Object -First 5",
  "shell": "powershell",
  "timeout_s": 20,
  "cwd": "C:/Users/Paul",
  "dry_run": false
}

Notes:

shell supports powershell, bash, cmd
if shell is omitted, server uses CLICKTHROUGH_EXEC_DEFAULT_SHELL
output is truncated based on CLICKTHROUGH_EXEC_MAX_OUTPUT_CHARS
endpoint can be disabled with CLICKTHROUGH_EXEC_ENABLED=false
if CLICKTHROUGH_EXEC_SECRET is missing, /exec is blocked (403)

Response includes stdout, stderr, exit_code, timeout state, and execution metadata.

`POST /batch`

Runs multiple action payloads sequentially.

Query params:

screen (int, default 0) - zero-based display selector applied to each action response; invalid values fall back to 0

{
  "actions": [
    {"action": "move", "target": {"mode": "pixel", "x": 100, "y": 100}},
    {"action": "click", "target": {"mode": "pixel", "x": 100, "y": 100}}
  ],
  "stop_on_error": true
}

10 KiB Raw Blame History

API Reference (v0.1)

GET /health

GET /displays

GET /screen

POST /zoom

POST /action

Pointer target modes

Pixel target

Grid target

Action examples

GET /windows

POST /windows/action

POST /launch

POST /wait

Wait for text to appear

Wait for a window state

Wait for visual change or stability

POST /ocr

POST /exec

POST /batch

10 KiB

Raw Blame History

`GET /health`

`GET /displays`

`GET /screen`

`POST /zoom`

`POST /action`

`GET /windows`

`POST /windows/action`

`POST /launch`

`POST /wait`

`POST /ocr`

`POST /exec`

`POST /batch`