# API Reference Base URL: `http://127.0.0.1:8123` Auth header when enabled: ```http x-clickthrough-token: ``` This API is intended for AI computer control through these methods: - `see` - `interact` - `interact/verify` - `exec` All responses use one envelope. ## Response Envelope Success: ```json { "ok": true, "request_id": "...", "time_ms": 1710000000000, "data": {}, "error": null } ``` Error: ```json { "ok": false, "request_id": "...", "time_ms": 1710000000000, "data": null, "error": { "code": "validation_error", "message": "request validation failed", "details": [] } } ``` ## 1) See ### `POST /see` Capture a full screen or a region. Optional grid overlay returns coordinate metadata for click mapping. ```json { "screen": 0, "region_x": null, "region_y": null, "region_width": null, "region_height": null, "with_grid": true, "grid_rows": 12, "grid_cols": 12, "include_labels": true, "image_format": "png", "jpeg_quality": 85, "ocr": false, "ocr_min_confidence": 0, "ocr_lang": "eng", "ocr_psm": null } ``` Returns: - `data.image.base64` - `data.meta.region` (global desktop coords) - `data.meta.grid` (rows/cols/cell size + formula) - `data.meta.ocr` (when `ocr=true`) OCR item shape: - `text` - `confidence` - `bbox` (global coords) - `center` - `region_relative_bbox` ### `POST /see/zoom` Capture a tighter crop around a global point and draw another grid over that crop. ```json { "screen": 0, "center_x": 1200, "center_y": 720, "width": 500, "height": 350, "with_grid": true, "grid_rows": 20, "grid_cols": 20, "include_labels": true, "image_format": "png", "jpeg_quality": 90 } ``` Use this for precision before clicking tiny controls. ## 2) Interact ### `POST /interact` Mouse/keyboard action execution. ```json { "screen": 0, "action": { "action": "click", "target": { "mode": "grid", "region_x": 0, "region_y": 0, "region_width": 1920, "region_height": 1080, "rows": 12, "cols": 12, "row": 7, "col": 3, "dx": 0.0, "dy": 0.0 }, "button": "left", "clicks": 1 } } ``` Supported actions: - `move`, `click`, `right_click`, `double_click`, `middle_click` - `scroll` (`scroll_amount`) - `type` (`text`, `interval_ms`) - `hotkey` (`keys`) - `click_text` (OCR-driven text click with optional region) Target modes: - `pixel`: absolute global `x,y` - `grid`: grid cell from a `see`/`see/zoom` response ### `click_text` example (full screen OCR) ```json { "screen": 0, "action": { "action": "click_text", "click_text": { "text": "Sign in", "match": "contains", "case_sensitive": false, "min_confidence": 45, "occurrence": "best" } } } ``` ### `click_text` example (region OCR) ```json { "screen": 0, "action": { "action": "click_text", "click_text": { "text": "Continue", "match": "exact", "region": { "x": 940, "y": 520, "width": 400, "height": 260 }, "occurrence": "first" } } } ``` ## 3) Interact Verify ### `POST /interact/verify` Execute one interact action, then poll quick OCR verification checks until success or timeout. ```json { "action": { "screen": 0, "action": { "action": "click_text", "click_text": { "text": "Apply", "match": "contains" } } }, "verify": { "type": "ocr_text_near_point", "text": "Applied", "x": 1180, "y": 640, "radius": 120, "screen": 0, "match": "contains" }, "check_interval_ms": 250, "timeout_ms": 3000 } ``` Response includes: - `action_result` - `verified` - `attempts` - `last_check` - `duration_ms` ## 4) Exec ### `POST /exec` Run host shell commands (PowerShell/Bash/CMD). ```json { "command": "Get-Process | Select-Object -First 5", "shell": "powershell", "timeout_s": 20, "cwd": "C:/Users/Paul", "dry_run": false } ``` Required header: ```http x-clickthrough-exec-secret: ``` ## Minimal Procedure for Agents 1. `see` full screen with coarse grid. 2. If uncertain, `see/zoom` target area with denser grid. 3. `interact` one action. 4. `see` again to confirm state change. 5. Use `exec` only when GUI interaction is not the right tool.