# API Reference (v0.1) Base URL: `http://127.0.0.1:8123` If `CLICKTHROUGH_TOKEN` is set, include header: ```http x-clickthrough-token: ``` ## `GET /health` Returns status and runtime safety flags, including `exec` capability config. ## `GET /displays` Returns detected displays in API screen order. ```json { "ok": true, "default_screen": 0, "displays": [ {"screen": 0, "mss_index": 1, "primary": true, "x": 0, "y": 0, "width": 1920, "height": 1080}, {"screen": 1, "mss_index": 2, "primary": false, "x": 1920, "y": 0, "width": 1920, "height": 1080} ] } ``` `screen` is zero-based. `screen=0` is the primary display when detectable, falling back to the first monitor reported by the capture backend. Invalid `screen` values fall back to `0`. ## `GET /screen` Query params: - `screen` (int, default `0`) — zero-based display selector; invalid values fall back to `0` - `with_grid` (bool, default `true`) - `grid_rows` (int, default env or `12`) - `grid_cols` (int, default env or `12`) - `include_labels` (bool, default `true`) - `image_format` (`png`|`jpeg`, default `png`) - `jpeg_quality` (1-100, default `85`) - `asImage` (bool, default `false`) - if `true`, return raw image bytes only (`image/png` or `image/jpeg`) Default response includes base64 image and metadata (`meta.region`, `meta.screen`, `meta.displays`, optional `meta.grid`). `meta.region` uses global desktop coordinates. ## `POST /zoom` Body: ```json { "center_x": 1200, "center_y": 700, "width": 500, "height": 350, "with_grid": true, "grid_rows": 20, "grid_cols": 20, "include_labels": true, "image_format": "png", "jpeg_quality": 90 } ``` Query params: - `screen` (int, default `0`) - zero-based display selector; invalid values fall back to `0` - `asImage` (bool, default `false`) - if `true`, return raw image bytes only (`image/png` or `image/jpeg`) Default response returns cropped image + region metadata in global pixel coordinates. `center_x` and `center_y` are also global coordinates; use the selected display's `meta.region` from `/screen?screen=X` as the coordinate base. ## `POST /action` Body: one action. Important: - the request body uses `action` plus an optional `target` - pixel coordinates live inside `target` when `target.mode="pixel"` - do **not** send top-level `x` / `y` fields Query params: - `screen` (int, default `0`) - zero-based display selector included in the response metadata; invalid values fall back to `0` Pointer coordinates remain global desktop coordinates. For multi-display actions, first capture `/screen?screen=X` and use that response's `meta.region` or grid metadata to compute the target. ### Pointer target modes #### Pixel target ```json { "mode": "pixel", "x": 100, "y": 200, "dx": 0, "dy": 0 } ``` #### Grid target ```json { "mode": "grid", "region_x": 0, "region_y": 0, "region_width": 1920, "region_height": 1080, "rows": 12, "cols": 12, "row": 5, "col": 9, "dx": 0.0, "dy": 0.0 } ``` `dx`/`dy` are normalized offsets in `[-1, 1]` inside the selected cell. ### Action examples Click: ```json { "action": "click", "target": { "mode": "grid", "region_x": 0, "region_y": 0, "region_width": 1920, "region_height": 1080, "rows": 12, "cols": 12, "row": 7, "col": 3, "dx": 0.2, "dy": -0.1 }, "clicks": 1, "button": "left" } ``` Scroll: ```json { "action": "scroll", "target": {"mode": "pixel", "x": 1300, "y": 740}, "scroll_amount": -500 } ``` Type text: ```json { "action": "type", "text": "hello world", "interval_ms": 20 } ``` Hotkey: ```json { "action": "hotkey", "keys": ["ctrl", "l"] } ``` Right click: ```json { "action": "right_click", "target": {"mode": "pixel", "x": 1300, "y": 740} } ``` Move only: ```json { "action": "move", "target": {"mode": "pixel", "x": 1300, "y": 740}, "duration_ms": 150 } ``` ## `GET /windows` List desktop windows using structured filters instead of shelling out. Query params: - `title_contains` (optional substring match) - `title_regex` (optional case-insensitive regex) - `process_name` (optional exact process name, e.g. `explorer.exe`) - `hwnd` (optional exact window handle) - `visible_only` (bool, default `true`) ```json { "ok": true, "count": 1, "windows": [ { "hwnd": 132640, "title": "WinDirStat", "class_name": "WinDirStatMainWindow", "pid": 18420, "process_name": "windirstat.exe", "visible": true, "enabled": true, "minimized": false, "maximized": false, "foreground": true, "rect": {"x": 194, "y": 116, "width": 1532, "height": 870} } ] } ``` Notes: - Currently supported on Windows hosts only. - Returns `409` for ambiguous write-target matches when a mutation endpoint would affect multiple windows. ## `POST /windows/action` Perform a structured window action against exactly one matched window. ```json { "action": "focus", "title_contains": "WinDirStat", "visible_only": true, "timeout_ms": 3000 } ``` Supported actions: - `focus` - `restore` - `minimize` - `maximize` - `close` The response includes the matched pre-action window and the final observed window state (or `closed=true` if it disappeared). ## `POST /launch` Start an app/process without invoking a shell. ```json { "executable": "C:/Program Files/WinDirStat/WinDirStat.exe", "args": [], "cwd": "C:/Program Files/WinDirStat", "wait_for_window": true, "match": { "title_contains": "WinDirStat", "visible_only": true }, "timeout_ms": 8000 } ``` Notes: - Launch uses direct process execution (`subprocess.Popen`) rather than PowerShell/CMD. - If `wait_for_window=true`, the server polls for a matching window and returns `window_found`. - `dry_run=true` returns the resolved argv/cwd without launching. ## `POST /ocr` Extract visible text from either a full screenshot, a region crop, or caller-provided image bytes. Query params: - `screen` (int, default `0`) - zero-based display selector for `mode=screen` and `mode=region`; invalid values fall back to `0` Body: ```json { "mode": "screen", "language_hint": "eng", "min_confidence": 0.4 } ``` Modes: - `screen` (default): OCR over full selected monitor - `region`: OCR over explicit region (`region_x`, `region_y`, `region_width`, `region_height`) - `image`: OCR over provided `image_base64` (supports plain base64 or data URL) Region mode example: ```json { "mode": "region", "region_x": 220, "region_y": 160, "region_width": 900, "region_height": 400, "language_hint": "eng", "min_confidence": 0.5 } ``` Image mode example: ```json { "mode": "image", "image_base64": "iVBORw0KGgoAAAANSUhEUgAA...", "language_hint": "eng" } ``` Response shape: ```json { "ok": true, "request_id": "...", "time_ms": 1710000000000, "result": { "mode": "screen", "language_hint": "eng", "min_confidence": 0.4, "region": {"x": 0, "y": 0, "width": 1920, "height": 1080}, "blocks": [ { "text": "Settings", "confidence": 0.9821, "bbox": {"x": 144, "y": 92, "width": 96, "height": 21} } ] } } ``` Notes: - Output is deterministic JSON (stable ordering by top-to-bottom, then left-to-right). - `bbox` coordinates are in global screen space for `screen`/`region`, and image-local for `image`. - Requires `tesseract` executable plus Python package `pytesseract`. - If `tesseract` is not on `PATH`, set `CLICKTHROUGH_TESSERACT_CMD` to the full executable path. ## `POST /exec` Execute a shell command on the host running Clickthrough. Requirements: - `CLICKTHROUGH_EXEC_SECRET` must be configured on the server - send header `x-clickthrough-exec-secret: ` ```json { "command": "Get-Process | Select-Object -First 5", "shell": "powershell", "timeout_s": 20, "cwd": "C:/Users/Paul", "dry_run": false } ``` Notes: - `shell` supports `powershell`, `bash`, `cmd` - if `shell` is omitted, server uses `CLICKTHROUGH_EXEC_DEFAULT_SHELL` - output is truncated based on `CLICKTHROUGH_EXEC_MAX_OUTPUT_CHARS` - endpoint can be disabled with `CLICKTHROUGH_EXEC_ENABLED=false` - if `CLICKTHROUGH_EXEC_SECRET` is missing, `/exec` is blocked (`403`) Response includes `stdout`, `stderr`, `exit_code`, timeout state, and execution metadata. ## `POST /batch` Runs multiple `action` payloads sequentially. Query params: - `screen` (int, default `0`) - zero-based display selector applied to each action response; invalid values fall back to `0` ```json { "actions": [ {"action": "move", "target": {"mode": "pixel", "x": 100, "y": 100}}, {"action": "click", "target": {"mode": "pixel", "x": 100, "y": 100}} ], "stop_on_error": true } ```