3.6 KiB
3.6 KiB
API Reference
Base URL: http://127.0.0.1:8123
Auth header when enabled:
x-clickthrough-token: <token>
This API is intended for AI computer control through these methods:
seeinteractexec
All responses use one envelope.
Response Envelope
Success:
{
"ok": true,
"request_id": "...",
"time_ms": 1710000000000,
"data": {},
"error": null
}
Error:
{
"ok": false,
"request_id": "...",
"time_ms": 1710000000000,
"data": null,
"error": {
"code": "validation_error",
"message": "request validation failed",
"details": []
}
}
1) See
POST /see
Capture a full screen or a region. Optional grid overlay returns coordinate metadata for click mapping.
{
"screen": 0,
"region_x": null,
"region_y": null,
"region_width": null,
"region_height": null,
"with_grid": true,
"grid_rows": 12,
"grid_cols": 12,
"include_labels": true,
"image_format": "png",
"jpeg_quality": 85,
"ocr": false,
"ocr_min_confidence": 0,
"ocr_lang": "eng",
"ocr_psm": null
}
Returns:
data.image.base64data.meta.region(global desktop coords)data.meta.grid(rows/cols/cell size + formula)data.meta.ocr(whenocr=true)
OCR item shape:
textconfidencebbox(global coords)centerregion_relative_bbox
POST /see/zoom
Capture a tighter crop around a global point and draw another grid over that crop.
{
"screen": 0,
"center_x": 1200,
"center_y": 720,
"width": 500,
"height": 350,
"with_grid": true,
"grid_rows": 20,
"grid_cols": 20,
"include_labels": true,
"image_format": "png",
"jpeg_quality": 90
}
Use this for precision before clicking tiny controls.
2) Interact
POST /interact
Mouse/keyboard action execution.
{
"screen": 0,
"action": {
"action": "click",
"target": {
"mode": "grid",
"region_x": 0,
"region_y": 0,
"region_width": 1920,
"region_height": 1080,
"rows": 12,
"cols": 12,
"row": 7,
"col": 3,
"dx": 0.0,
"dy": 0.0
},
"button": "left",
"clicks": 1
}
}
Supported actions:
move,click,right_click,double_click,middle_clickscroll(scroll_amount)type(text,interval_ms)hotkey(keys)click_text(OCR-driven text click with optional region)
Target modes:
pixel: absolute globalx,ygrid: grid cell from asee/see/zoomresponse
click_text example (full screen OCR)
{
"screen": 0,
"action": {
"action": "click_text",
"click_text": {
"text": "Sign in",
"match": "contains",
"case_sensitive": false,
"min_confidence": 45,
"occurrence": "best"
}
}
}
click_text example (region OCR)
{
"screen": 0,
"action": {
"action": "click_text",
"click_text": {
"text": "Continue",
"match": "exact",
"region": { "x": 940, "y": 520, "width": 400, "height": 260 },
"occurrence": "first"
}
}
}
3) Exec
POST /exec
Run host shell commands (PowerShell/Bash/CMD).
{
"command": "Get-Process | Select-Object -First 5",
"shell": "powershell",
"timeout_s": 20,
"cwd": "C:/Users/Paul",
"dry_run": false
}
Required header:
x-clickthrough-exec-secret: <secret>
Minimal Procedure for Agents
seefull screen with coarse grid.- If uncertain,
see/zoomtarget area with denser grid. interactone action.seeagain to confirm state change.- Use
execonly when GUI interaction is not the right tool.