246 lines
4.2 KiB
Markdown
246 lines
4.2 KiB
Markdown
# API Reference
|
|
|
|
Base URL: `http://127.0.0.1:8123`
|
|
|
|
Auth header when enabled:
|
|
|
|
```http
|
|
x-clickthrough-token: <token>
|
|
```
|
|
|
|
This API is intended for AI computer control through these methods:
|
|
- `see`
|
|
- `interact`
|
|
- `interact/verify`
|
|
- `exec`
|
|
|
|
All responses use one envelope.
|
|
|
|
## Response Envelope
|
|
|
|
Success:
|
|
|
|
```json
|
|
{
|
|
"ok": true,
|
|
"request_id": "...",
|
|
"time_ms": 1710000000000,
|
|
"data": {},
|
|
"error": null
|
|
}
|
|
```
|
|
|
|
Error:
|
|
|
|
```json
|
|
{
|
|
"ok": false,
|
|
"request_id": "...",
|
|
"time_ms": 1710000000000,
|
|
"data": null,
|
|
"error": {
|
|
"code": "validation_error",
|
|
"message": "request validation failed",
|
|
"details": []
|
|
}
|
|
}
|
|
```
|
|
|
|
## 1) See
|
|
|
|
### `POST /see`
|
|
Capture a full screen or a region. Optional grid overlay returns coordinate metadata for click mapping.
|
|
|
|
```json
|
|
{
|
|
"screen": 0,
|
|
"region_x": null,
|
|
"region_y": null,
|
|
"region_width": null,
|
|
"region_height": null,
|
|
"with_grid": true,
|
|
"grid_rows": 12,
|
|
"grid_cols": 12,
|
|
"include_labels": true,
|
|
"image_format": "png",
|
|
"jpeg_quality": 85,
|
|
"ocr": false,
|
|
"ocr_min_confidence": 0,
|
|
"ocr_lang": "eng",
|
|
"ocr_psm": null
|
|
}
|
|
```
|
|
|
|
Returns:
|
|
- `data.image.base64`
|
|
- `data.meta.region` (global desktop coords)
|
|
- `data.meta.grid` (rows/cols/cell size + formula)
|
|
- `data.meta.ocr` (when `ocr=true`)
|
|
|
|
OCR item shape:
|
|
- `text`
|
|
- `confidence`
|
|
- `bbox` (global coords)
|
|
- `center`
|
|
- `region_relative_bbox`
|
|
|
|
### `POST /see/zoom`
|
|
Capture a tighter crop around a global point and draw another grid over that crop.
|
|
|
|
```json
|
|
{
|
|
"screen": 0,
|
|
"center_x": 1200,
|
|
"center_y": 720,
|
|
"width": 500,
|
|
"height": 350,
|
|
"with_grid": true,
|
|
"grid_rows": 20,
|
|
"grid_cols": 20,
|
|
"include_labels": true,
|
|
"image_format": "png",
|
|
"jpeg_quality": 90
|
|
}
|
|
```
|
|
|
|
Use this for precision before clicking tiny controls.
|
|
|
|
## 2) Interact
|
|
|
|
### `POST /interact`
|
|
Mouse/keyboard action execution.
|
|
|
|
```json
|
|
{
|
|
"screen": 0,
|
|
"action": {
|
|
"action": "click",
|
|
"target": {
|
|
"mode": "grid",
|
|
"region_x": 0,
|
|
"region_y": 0,
|
|
"region_width": 1920,
|
|
"region_height": 1080,
|
|
"rows": 12,
|
|
"cols": 12,
|
|
"row": 7,
|
|
"col": 3,
|
|
"dx": 0.0,
|
|
"dy": 0.0
|
|
},
|
|
"button": "left",
|
|
"clicks": 1
|
|
}
|
|
}
|
|
```
|
|
|
|
Supported actions:
|
|
- `move`, `click`, `right_click`, `double_click`, `middle_click`
|
|
- `scroll` (`scroll_amount`)
|
|
- `type` (`text`, `interval_ms`)
|
|
- `hotkey` (`keys`)
|
|
- `click_text` (OCR-driven text click with optional region)
|
|
|
|
Target modes:
|
|
- `pixel`: absolute global `x,y`
|
|
- `grid`: grid cell from a `see`/`see/zoom` response
|
|
|
|
### `click_text` example (full screen OCR)
|
|
```json
|
|
{
|
|
"screen": 0,
|
|
"action": {
|
|
"action": "click_text",
|
|
"click_text": {
|
|
"text": "Sign in",
|
|
"match": "contains",
|
|
"case_sensitive": false,
|
|
"min_confidence": 45,
|
|
"occurrence": "best"
|
|
}
|
|
}
|
|
}
|
|
```
|
|
|
|
### `click_text` example (region OCR)
|
|
```json
|
|
{
|
|
"screen": 0,
|
|
"action": {
|
|
"action": "click_text",
|
|
"click_text": {
|
|
"text": "Continue",
|
|
"match": "exact",
|
|
"region": { "x": 940, "y": 520, "width": 400, "height": 260 },
|
|
"occurrence": "first"
|
|
}
|
|
}
|
|
}
|
|
```
|
|
|
|
## 3) Interact Verify
|
|
|
|
### `POST /interact/verify`
|
|
Execute one interact action, then poll quick OCR verification checks until success or timeout.
|
|
|
|
```json
|
|
{
|
|
"action": {
|
|
"screen": 0,
|
|
"action": {
|
|
"action": "click_text",
|
|
"click_text": {
|
|
"text": "Apply",
|
|
"match": "contains"
|
|
}
|
|
}
|
|
},
|
|
"verify": {
|
|
"type": "ocr_text_near_point",
|
|
"text": "Applied",
|
|
"x": 1180,
|
|
"y": 640,
|
|
"radius": 120,
|
|
"screen": 0,
|
|
"match": "contains"
|
|
},
|
|
"check_interval_ms": 250,
|
|
"timeout_ms": 3000
|
|
}
|
|
```
|
|
|
|
Response includes:
|
|
- `action_result`
|
|
- `verified`
|
|
- `attempts`
|
|
- `last_check`
|
|
- `duration_ms`
|
|
## 4) Exec
|
|
|
|
### `POST /exec`
|
|
Run host shell commands (PowerShell/Bash/CMD).
|
|
|
|
```json
|
|
{
|
|
"command": "Get-Process | Select-Object -First 5",
|
|
"shell": "powershell",
|
|
"timeout_s": 20,
|
|
"cwd": "C:/Users/Paul",
|
|
"dry_run": false
|
|
}
|
|
```
|
|
|
|
Required header:
|
|
|
|
```http
|
|
x-clickthrough-exec-secret: <secret>
|
|
```
|
|
|
|
## Minimal Procedure for Agents
|
|
|
|
1. `see` full screen with coarse grid.
|
|
2. If uncertain, `see/zoom` target area with denser grid.
|
|
3. `interact` one action.
|
|
4. `see` again to confirm state change.
|
|
5. Use `exec` only when GUI interaction is not the right tool.
|