Add pytesseract OCR, click_text interact action, and interact verify endpoint
All checks were successful
python-syntax / syntax-check (push) Successful in 6s
All checks were successful
python-syntax / syntax-check (push) Successful in 6s
This commit is contained in:
90
docs/API.md
90
docs/API.md
@@ -8,9 +8,10 @@ Auth header when enabled:
|
||||
x-clickthrough-token: <token>
|
||||
```
|
||||
|
||||
This API is intended for AI computer control through 3 methods only:
|
||||
This API is intended for AI computer control through these methods:
|
||||
- `see`
|
||||
- `interact`
|
||||
- `interact/verify`
|
||||
- `exec`
|
||||
|
||||
All responses use one envelope.
|
||||
@@ -62,7 +63,11 @@ Capture a full screen or a region. Optional grid overlay returns coordinate meta
|
||||
"grid_cols": 12,
|
||||
"include_labels": true,
|
||||
"image_format": "png",
|
||||
"jpeg_quality": 85
|
||||
"jpeg_quality": 85,
|
||||
"ocr": false,
|
||||
"ocr_min_confidence": 0,
|
||||
"ocr_lang": "eng",
|
||||
"ocr_psm": null
|
||||
}
|
||||
```
|
||||
|
||||
@@ -70,6 +75,14 @@ Returns:
|
||||
- `data.image.base64`
|
||||
- `data.meta.region` (global desktop coords)
|
||||
- `data.meta.grid` (rows/cols/cell size + formula)
|
||||
- `data.meta.ocr` (when `ocr=true`)
|
||||
|
||||
OCR item shape:
|
||||
- `text`
|
||||
- `confidence`
|
||||
- `bbox` (global coords)
|
||||
- `center`
|
||||
- `region_relative_bbox`
|
||||
|
||||
### `POST /see/zoom`
|
||||
Capture a tighter crop around a global point and draw another grid over that crop.
|
||||
@@ -126,12 +139,83 @@ Supported actions:
|
||||
- `scroll` (`scroll_amount`)
|
||||
- `type` (`text`, `interval_ms`)
|
||||
- `hotkey` (`keys`)
|
||||
- `click_text` (OCR-driven text click with optional region)
|
||||
|
||||
Target modes:
|
||||
- `pixel`: absolute global `x,y`
|
||||
- `grid`: grid cell from a `see`/`see/zoom` response
|
||||
|
||||
## 3) Exec
|
||||
### `click_text` example (full screen OCR)
|
||||
```json
|
||||
{
|
||||
"screen": 0,
|
||||
"action": {
|
||||
"action": "click_text",
|
||||
"click_text": {
|
||||
"text": "Sign in",
|
||||
"match": "contains",
|
||||
"case_sensitive": false,
|
||||
"min_confidence": 45,
|
||||
"occurrence": "best"
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### `click_text` example (region OCR)
|
||||
```json
|
||||
{
|
||||
"screen": 0,
|
||||
"action": {
|
||||
"action": "click_text",
|
||||
"click_text": {
|
||||
"text": "Continue",
|
||||
"match": "exact",
|
||||
"region": { "x": 940, "y": 520, "width": 400, "height": 260 },
|
||||
"occurrence": "first"
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## 3) Interact Verify
|
||||
|
||||
### `POST /interact/verify`
|
||||
Execute one interact action, then poll quick OCR verification checks until success or timeout.
|
||||
|
||||
```json
|
||||
{
|
||||
"action": {
|
||||
"screen": 0,
|
||||
"action": {
|
||||
"action": "click_text",
|
||||
"click_text": {
|
||||
"text": "Apply",
|
||||
"match": "contains"
|
||||
}
|
||||
}
|
||||
},
|
||||
"verify": {
|
||||
"type": "ocr_text_near_point",
|
||||
"text": "Applied",
|
||||
"x": 1180,
|
||||
"y": 640,
|
||||
"radius": 120,
|
||||
"screen": 0,
|
||||
"match": "contains"
|
||||
},
|
||||
"check_interval_ms": 250,
|
||||
"timeout_ms": 3000
|
||||
}
|
||||
```
|
||||
|
||||
Response includes:
|
||||
- `action_result`
|
||||
- `verified`
|
||||
- `attempts`
|
||||
- `last_check`
|
||||
- `duration_ms`
|
||||
## 4) Exec
|
||||
|
||||
### `POST /exec`
|
||||
Run host shell commands (PowerShell/Bash/CMD).
|
||||
|
||||
Reference in New Issue
Block a user