feat(wait): add structured wait endpoint
All checks were successful
python-syntax / syntax-check (push) Successful in 7s
All checks were successful
python-syntax / syntax-check (push) Successful in 7s
This commit is contained in:
@@ -39,6 +39,7 @@ The agent should not assume it can self-install this stack.
|
||||
- `GET /windows` → discover visible desktop windows and their handles/processes
|
||||
- `POST /windows/action` → focus/restore/minimize/maximize/close a matched window
|
||||
- `POST /launch` → start an app/process without dropping to a shell
|
||||
- `POST /wait?screen=0` → wait for text, window, or visual state changes
|
||||
- `POST /ocr` → text extraction with bounding boxes from full screen, region, or provided image bytes
|
||||
- `POST /action?screen=0` → single interaction (`move`, `click`, `scroll`, `type`, `hotkey`, ...)
|
||||
- `POST /batch?screen=0` → sequential action list
|
||||
@@ -140,7 +141,7 @@ Avoid using `/exec` for routine in-app clicks, menu navigation, or text entry wh
|
||||
3. If confidence < 0.85, call `POST /zoom` with denser grid (e.g., 20x20) and re-evaluate.
|
||||
4. **Before any click**, verify target identity (OCR text/icon/location consistency).
|
||||
5. Execute one minimal action via `POST /action`.
|
||||
6. Re-capture with `GET /screen` and verify the expected state change.
|
||||
6. Re-capture with `GET /screen` or use `POST /wait` to verify the expected state change.
|
||||
7. Repeat until objective is complete.
|
||||
|
||||
## Verify-before-click rules
|
||||
|
||||
Reference in New Issue
Block a user