feat(wait): add structured wait endpoint
All checks were successful
python-syntax / syntax-check (push) Successful in 7s

This commit is contained in:
2026-05-01 15:55:29 +02:00
parent 493e5499e8
commit 5122d416e8
4 changed files with 304 additions and 2 deletions

View File

@@ -39,6 +39,7 @@ The agent should not assume it can self-install this stack.
- `GET /windows` → discover visible desktop windows and their handles/processes
- `POST /windows/action` → focus/restore/minimize/maximize/close a matched window
- `POST /launch` → start an app/process without dropping to a shell
- `POST /wait?screen=0` → wait for text, window, or visual state changes
- `POST /ocr` → text extraction with bounding boxes from full screen, region, or provided image bytes
- `POST /action?screen=0` → single interaction (`move`, `click`, `scroll`, `type`, `hotkey`, ...)
- `POST /batch?screen=0` → sequential action list
@@ -140,7 +141,7 @@ Avoid using `/exec` for routine in-app clicks, menu navigation, or text entry wh
3. If confidence < 0.85, call `POST /zoom` with denser grid (e.g., 20x20) and re-evaluate.
4. **Before any click**, verify target identity (OCR text/icon/location consistency).
5. Execute one minimal action via `POST /action`.
6. Re-capture with `GET /screen` and verify the expected state change.
6. Re-capture with `GET /screen` or use `POST /wait` to verify the expected state change.
7. Repeat until objective is complete.
## Verify-before-click rules