feat(verify): add compound action+verify flows
All checks were successful
python-syntax / syntax-check (push) Successful in 9s

This commit is contained in:
2026-05-01 16:26:57 +02:00
parent 02bf069425
commit c66779d929
4 changed files with 111 additions and 4 deletions

View File

@@ -13,6 +13,7 @@ Let an Agent interact with your computer over HTTP, with grid-aware screenshots
- **Wait/sync endpoint**: poll for text, window, or visual state changes via `POST /wait`
- **Vision helper endpoints**: compare screenshots and measure stability via `POST /vision/diff` and `POST /vision/stability`
- **OCR endpoints**: extract text blocks or search for matching text via `POST /ocr` and `POST /ocr/find`
- **Compound verify endpoint**: execute an action and wait for a structured success condition via `POST /action/verify`
- **Command execution endpoint**: run PowerShell/Bash/CMD commands via `POST /exec`
- **Coordinate transform metadata** in visual responses so agents can map grid cells to real pixels
- **Safety knobs**: token auth, dry-run mode, optional allowed-region restriction
@@ -39,8 +40,8 @@ For OCR support, install the native `tesseract` binary on the host (in addition
2. `GET /screen?screen=0` with grid
3. Decide cell / target
4. Optional `POST /zoom?screen=0` for finer targeting
5. `POST /action?screen=0` to execute
6. `GET /screen?screen=0` again to verify result, or use `POST /ocr/find` when you need explicit text matching
5. `POST /action?screen=0` to execute (or `POST /action/verify?screen=0` for a bundled action+wait flow)
6. `GET /screen?screen=0` again to verify result, or use `POST /wait`, `POST /vision/diff`, or `POST /ocr/find`
Important:
- `POST /action` expects an `action` plus a `target` object; do not send raw top-level `x` / `y` fields.