feat(wait): add structured wait endpoint
All checks were successful
python-syntax / syntax-check (push) Successful in 7s

This commit is contained in:
2026-05-01 15:55:29 +02:00
parent 493e5499e8
commit 5122d416e8
4 changed files with 304 additions and 2 deletions

View File

@@ -277,6 +277,80 @@ Notes:
- If `wait_for_window=true`, the server polls for a matching window and returns `window_found`.
- `dry_run=true` returns the resolved argv/cwd without launching.
## `POST /wait`
Wait on a structured UI condition instead of guessing sleep durations.
Query params:
- `screen` (int, default `0`) - used for text and visual waits
### Wait for text to appear
```json
{
"condition": {
"kind": "text",
"mode": "screen",
"text": "Scan complete",
"match": "contains",
"present": true,
"language_hint": "eng",
"min_confidence": 0.4
},
"timeout_ms": 15000,
"poll_interval_ms": 400
}
```
### Wait for a window state
```json
{
"condition": {
"kind": "window",
"title_contains": "WinDirStat",
"visible_only": true,
"state": "focused"
},
"timeout_ms": 5000,
"poll_interval_ms": 200
}
```
Window states:
- `exists`
- `focused`
- `closed`
### Wait for visual change or stability
```json
{
"condition": {
"kind": "visual",
"state": "stable",
"region_x": 0,
"region_y": 0,
"region_width": 1920,
"region_height": 1080,
"diff_threshold": 0.005,
"stable_for_ms": 1000
},
"timeout_ms": 12000,
"poll_interval_ms": 300
}
```
Visual states:
- `change` — succeeds when the average pixel diff crosses `diff_threshold`
- `stable` — succeeds when the diff stays at or below `diff_threshold` for `stable_for_ms`
Notes:
- Text waits reuse the OCR pipeline and return matching OCR blocks on success.
- Window waits build on the structured window discovery endpoint.
- Visual waits compare repeated captures of either the full selected display or an explicit region.
## `POST /ocr`
Extract visible text from either a full screenshot, a region crop, or caller-provided image bytes.