feat(wait): add structured wait endpoint
All checks were successful
python-syntax / syntax-check (push) Successful in 7s

This commit is contained in:
2026-05-01 15:55:29 +02:00
parent 493e5499e8
commit 5122d416e8
4 changed files with 304 additions and 2 deletions

View File

@@ -10,6 +10,7 @@ Let an Agent interact with your computer over HTTP, with grid-aware screenshots
- **Action endpoints**: move/click/right-click/double-click/middle-click/scroll/type/hotkey
- **Window lifecycle endpoints**: list/focus/restore/minimize/maximize/close windows via `GET /windows` + `POST /windows/action`
- **Structured launch endpoint**: start an app/process without dropping to a shell via `POST /launch`
- **Wait/sync endpoint**: poll for text, window, or visual state changes via `POST /wait`
- **OCR endpoint**: extract text blocks with bounding boxes via `POST /ocr`
- **Command execution endpoint**: run PowerShell/Bash/CMD commands via `POST /exec`
- **Coordinate transform metadata** in visual responses so agents can map grid cells to real pixels
@@ -43,7 +44,7 @@ For OCR support, install the native `tesseract` binary on the host (in addition
Important:
- `POST /action` expects an `action` plus a `target` object; do not send raw top-level `x` / `y` fields.
- Pixel coordinates and OCR bounding boxes are always global desktop coordinates.
- Prefer structured GUI interaction first; use `/windows`, `/launch`, and `/action` before reaching for `/exec`.
- Prefer structured GUI interaction first; use `/windows`, `/launch`, `/wait`, and `/action` before reaching for `/exec`.
See:
- `docs/API.md`