refactor: simplify to see/interact/exec and split server modules
All checks were successful
python-syntax / syntax-check (push) Successful in 6s
All checks were successful
python-syntax / syntax-check (push) Successful in 6s
This commit is contained in:
234
docs/API.md
234
docs/API.md
@@ -1,116 +1,21 @@
|
||||
# API Reference (v2)
|
||||
# API Reference
|
||||
|
||||
Base URL: `http://127.0.0.1:8123`
|
||||
|
||||
If `CLICKTHROUGH_TOKEN` is set, include:
|
||||
Auth header when enabled:
|
||||
|
||||
```http
|
||||
x-clickthrough-token: <token>
|
||||
```
|
||||
|
||||
## Endpoints
|
||||
This API is intended for AI computer control through 3 methods only:
|
||||
- `see`
|
||||
- `interact`
|
||||
- `exec`
|
||||
|
||||
- `POST /v2/observe`
|
||||
- `POST /v2/localize`
|
||||
- `POST /v2/act`
|
||||
- `POST /v2/act-verify`
|
||||
- `GET /health`
|
||||
- `GET /displays`
|
||||
- `GET /windows`
|
||||
- `POST /windows/action`
|
||||
- `POST /launch`
|
||||
- `POST /exec`
|
||||
All responses use one envelope.
|
||||
|
||||
No v1 endpoints are supported.
|
||||
|
||||
## `POST /v2/observe`
|
||||
|
||||
```json
|
||||
{
|
||||
"mode": "region",
|
||||
"region_x": 800,
|
||||
"region_y": 420,
|
||||
"region_width": 700,
|
||||
"region_height": 420,
|
||||
"include_image": true,
|
||||
"image_format": "jpeg",
|
||||
"jpeg_quality": 75,
|
||||
"ocr_mode": "region",
|
||||
"language_hint": "eng",
|
||||
"min_confidence": 0.45,
|
||||
"max_ocr_area_px": 1500000,
|
||||
"group_lines": true
|
||||
}
|
||||
```
|
||||
|
||||
Returns observation metadata, optional image, OCR blocks/lines, and timing fields.
|
||||
|
||||
## `POST /v2/localize`
|
||||
|
||||
Text localization:
|
||||
|
||||
```json
|
||||
{
|
||||
"observation_id": "...",
|
||||
"text_query": "Save",
|
||||
"text_match": "exact",
|
||||
"candidate_index": 0
|
||||
}
|
||||
```
|
||||
|
||||
Image-tool point localization:
|
||||
|
||||
```json
|
||||
{
|
||||
"observation_id": "...",
|
||||
"image_tool_point": {"x": 312, "y": 188}
|
||||
}
|
||||
```
|
||||
|
||||
Returns `resolved_target_id`, global pixel, and `localization_confidence`.
|
||||
|
||||
## `POST /v2/act`
|
||||
|
||||
```json
|
||||
{
|
||||
"action": {
|
||||
"action": "click",
|
||||
"target": {"resolved_target_id": "..."},
|
||||
"button": "left",
|
||||
"clicks": 1
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## `POST /v2/act-verify`
|
||||
|
||||
```json
|
||||
{
|
||||
"action": {
|
||||
"action": "click",
|
||||
"target": {"resolved_target_id": "..."}
|
||||
},
|
||||
"condition": {
|
||||
"kind": "text",
|
||||
"mode": "region",
|
||||
"text": "Saved",
|
||||
"match": "contains",
|
||||
"present": true,
|
||||
"region_x": 820,
|
||||
"region_y": 420,
|
||||
"region_width": 500,
|
||||
"region_height": 140,
|
||||
"min_confidence": 0.4
|
||||
},
|
||||
"risk_level": "low"
|
||||
}
|
||||
```
|
||||
|
||||
Risk defaults:
|
||||
- `low`: retries `0`, timeout `2500ms`
|
||||
- `high`: retries `1`, timeout `6000ms`
|
||||
|
||||
## Response envelope
|
||||
## Response Envelope
|
||||
|
||||
Success:
|
||||
|
||||
@@ -119,7 +24,7 @@ Success:
|
||||
"ok": true,
|
||||
"request_id": "...",
|
||||
"time_ms": 1710000000000,
|
||||
"data": { },
|
||||
"data": {},
|
||||
"error": null
|
||||
}
|
||||
```
|
||||
@@ -133,9 +38,124 @@ Error:
|
||||
"time_ms": 1710000000000,
|
||||
"data": null,
|
||||
"error": {
|
||||
"code": "http_error",
|
||||
"message": "...",
|
||||
"details": {}
|
||||
"code": "validation_error",
|
||||
"message": "request validation failed",
|
||||
"details": []
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## 1) See
|
||||
|
||||
### `POST /see`
|
||||
Capture a full screen or a region. Optional grid overlay returns coordinate metadata for click mapping.
|
||||
|
||||
```json
|
||||
{
|
||||
"screen": 0,
|
||||
"region_x": null,
|
||||
"region_y": null,
|
||||
"region_width": null,
|
||||
"region_height": null,
|
||||
"with_grid": true,
|
||||
"grid_rows": 12,
|
||||
"grid_cols": 12,
|
||||
"include_labels": true,
|
||||
"image_format": "png",
|
||||
"jpeg_quality": 85
|
||||
}
|
||||
```
|
||||
|
||||
Returns:
|
||||
- `data.image.base64`
|
||||
- `data.meta.region` (global desktop coords)
|
||||
- `data.meta.grid` (rows/cols/cell size + formula)
|
||||
|
||||
### `POST /see/zoom`
|
||||
Capture a tighter crop around a global point and draw another grid over that crop.
|
||||
|
||||
```json
|
||||
{
|
||||
"screen": 0,
|
||||
"center_x": 1200,
|
||||
"center_y": 720,
|
||||
"width": 500,
|
||||
"height": 350,
|
||||
"with_grid": true,
|
||||
"grid_rows": 20,
|
||||
"grid_cols": 20,
|
||||
"include_labels": true,
|
||||
"image_format": "png",
|
||||
"jpeg_quality": 90
|
||||
}
|
||||
```
|
||||
|
||||
Use this for precision before clicking tiny controls.
|
||||
|
||||
## 2) Interact
|
||||
|
||||
### `POST /interact`
|
||||
Mouse/keyboard action execution.
|
||||
|
||||
```json
|
||||
{
|
||||
"screen": 0,
|
||||
"action": {
|
||||
"action": "click",
|
||||
"target": {
|
||||
"mode": "grid",
|
||||
"region_x": 0,
|
||||
"region_y": 0,
|
||||
"region_width": 1920,
|
||||
"region_height": 1080,
|
||||
"rows": 12,
|
||||
"cols": 12,
|
||||
"row": 7,
|
||||
"col": 3,
|
||||
"dx": 0.0,
|
||||
"dy": 0.0
|
||||
},
|
||||
"button": "left",
|
||||
"clicks": 1
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Supported actions:
|
||||
- `move`, `click`, `right_click`, `double_click`, `middle_click`
|
||||
- `scroll` (`scroll_amount`)
|
||||
- `type` (`text`, `interval_ms`)
|
||||
- `hotkey` (`keys`)
|
||||
|
||||
Target modes:
|
||||
- `pixel`: absolute global `x,y`
|
||||
- `grid`: grid cell from a `see`/`see/zoom` response
|
||||
|
||||
## 3) Exec
|
||||
|
||||
### `POST /exec`
|
||||
Run host shell commands (PowerShell/Bash/CMD).
|
||||
|
||||
```json
|
||||
{
|
||||
"command": "Get-Process | Select-Object -First 5",
|
||||
"shell": "powershell",
|
||||
"timeout_s": 20,
|
||||
"cwd": "C:/Users/Paul",
|
||||
"dry_run": false
|
||||
}
|
||||
```
|
||||
|
||||
Required header:
|
||||
|
||||
```http
|
||||
x-clickthrough-exec-secret: <secret>
|
||||
```
|
||||
|
||||
## Minimal Procedure for Agents
|
||||
|
||||
1. `see` full screen with coarse grid.
|
||||
2. If uncertain, `see/zoom` target area with denser grid.
|
||||
3. `interact` one action.
|
||||
4. `see` again to confirm state change.
|
||||
5. Use `exec` only when GUI interaction is not the right tool.
|
||||
|
||||
Reference in New Issue
Block a user