4.6 KiB
API Reference (v0.1)
Base URL: http://127.0.0.1:8123
If CLICKTHROUGH_TOKEN is set, include header:
x-clickthrough-token: <token>
GET /health
Returns status and runtime safety flags, including exec capability config.
GET /screen
Query params:
with_grid(bool, defaulttrue)grid_rows(int, default env or12)grid_cols(int, default env or12)include_labels(bool, defaulttrue)image_format(png|jpeg, defaultpng)jpeg_quality(1-100, default85)asImage(bool, defaultfalse) — iftrue, return raw image bytes only (image/pngorimage/jpeg)
Default response includes base64 image and metadata (meta.region, optional meta.grid).
POST /zoom
Body:
{
"center_x": 1200,
"center_y": 700,
"width": 500,
"height": 350,
"with_grid": true,
"grid_rows": 20,
"grid_cols": 20,
"include_labels": true,
"image_format": "png",
"jpeg_quality": 90
}
Query params:
asImage(bool, defaultfalse) — iftrue, return raw image bytes only (image/pngorimage/jpeg)
Default response returns cropped image + region metadata in global pixel coordinates.
POST /action
Body: one action.
Pointer target modes
Pixel target
{
"mode": "pixel",
"x": 100,
"y": 200,
"dx": 0,
"dy": 0
}
Grid target
{
"mode": "grid",
"region_x": 0,
"region_y": 0,
"region_width": 1920,
"region_height": 1080,
"rows": 12,
"cols": 12,
"row": 5,
"col": 9,
"dx": 0.0,
"dy": 0.0
}
dx/dy are normalized offsets in [-1, 1] inside the selected cell.
Action examples
Click:
{
"action": "click",
"target": {
"mode": "grid",
"region_x": 0,
"region_y": 0,
"region_width": 1920,
"region_height": 1080,
"rows": 12,
"cols": 12,
"row": 7,
"col": 3,
"dx": 0.2,
"dy": -0.1
},
"clicks": 1,
"button": "left"
}
Scroll:
{
"action": "scroll",
"target": {"mode": "pixel", "x": 1300, "y": 740},
"scroll_amount": -500
}
Type text:
{
"action": "type",
"text": "hello world",
"interval_ms": 20
}
Hotkey:
{
"action": "hotkey",
"keys": ["ctrl", "l"]
}
POST /ocr
Extract visible text from either a full screenshot, a region crop, or caller-provided image bytes.
Body:
{
"mode": "screen",
"language_hint": "eng",
"min_confidence": 0.4
}
Modes:
screen(default): OCR over full captured monitorregion: OCR over explicit region (region_x,region_y,region_width,region_height)image: OCR over providedimage_base64(supports plain base64 or data URL)
Region mode example:
{
"mode": "region",
"region_x": 220,
"region_y": 160,
"region_width": 900,
"region_height": 400,
"language_hint": "eng",
"min_confidence": 0.5
}
Image mode example:
{
"mode": "image",
"image_base64": "iVBORw0KGgoAAAANSUhEUgAA...",
"language_hint": "eng"
}
Response shape:
{
"ok": true,
"request_id": "...",
"time_ms": 1710000000000,
"result": {
"mode": "screen",
"language_hint": "eng",
"min_confidence": 0.4,
"region": {"x": 0, "y": 0, "width": 1920, "height": 1080},
"blocks": [
{
"text": "Settings",
"confidence": 0.9821,
"bbox": {"x": 144, "y": 92, "width": 96, "height": 21}
}
]
}
}
Notes:
- Output is deterministic JSON (stable ordering by top-to-bottom, then left-to-right).
bboxcoordinates are in global screen space forscreen/region, and image-local forimage.- Requires
tesseractexecutable plus Python packagepytesseract.
POST /exec
Execute a shell command on the host running Clickthrough.
Requirements:
CLICKTHROUGH_EXEC_SECRETmust be configured on the server- send header
x-clickthrough-exec-secret: <secret>
{
"command": "Get-Process | Select-Object -First 5",
"shell": "powershell",
"timeout_s": 20,
"cwd": "C:/Users/Paul",
"dry_run": false
}
Notes:
shellsupportspowershell,bash,cmd- if
shellis omitted, server usesCLICKTHROUGH_EXEC_DEFAULT_SHELL - output is truncated based on
CLICKTHROUGH_EXEC_MAX_OUTPUT_CHARS - endpoint can be disabled with
CLICKTHROUGH_EXEC_ENABLED=false - if
CLICKTHROUGH_EXEC_SECRETis missing,/execis blocked (403)
Response includes stdout, stderr, exit_code, timeout state, and execution metadata.
POST /batch
Runs multiple action payloads sequentially.
{
"actions": [
{"action": "move", "target": {"mode": "pixel", "x": 100, "y": 100}},
{"action": "click", "target": {"mode": "pixel", "x": 100, "y": 100}}
],
"stop_on_error": true
}