feat(verify): add compound action+verify flows
All checks were successful
python-syntax / syntax-check (push) Successful in 9s
All checks were successful
python-syntax / syntax-check (push) Successful in 9s
This commit is contained in:
@@ -13,6 +13,7 @@ Let an Agent interact with your computer over HTTP, with grid-aware screenshots
|
||||
- **Wait/sync endpoint**: poll for text, window, or visual state changes via `POST /wait`
|
||||
- **Vision helper endpoints**: compare screenshots and measure stability via `POST /vision/diff` and `POST /vision/stability`
|
||||
- **OCR endpoints**: extract text blocks or search for matching text via `POST /ocr` and `POST /ocr/find`
|
||||
- **Compound verify endpoint**: execute an action and wait for a structured success condition via `POST /action/verify`
|
||||
- **Command execution endpoint**: run PowerShell/Bash/CMD commands via `POST /exec`
|
||||
- **Coordinate transform metadata** in visual responses so agents can map grid cells to real pixels
|
||||
- **Safety knobs**: token auth, dry-run mode, optional allowed-region restriction
|
||||
@@ -39,8 +40,8 @@ For OCR support, install the native `tesseract` binary on the host (in addition
|
||||
2. `GET /screen?screen=0` with grid
|
||||
3. Decide cell / target
|
||||
4. Optional `POST /zoom?screen=0` for finer targeting
|
||||
5. `POST /action?screen=0` to execute
|
||||
6. `GET /screen?screen=0` again to verify result, or use `POST /ocr/find` when you need explicit text matching
|
||||
5. `POST /action?screen=0` to execute (or `POST /action/verify?screen=0` for a bundled action+wait flow)
|
||||
6. `GET /screen?screen=0` again to verify result, or use `POST /wait`, `POST /vision/diff`, or `POST /ocr/find`
|
||||
|
||||
Important:
|
||||
- `POST /action` expects an `action` plus a `target` object; do not send raw top-level `x` / `y` fields.
|
||||
|
||||
Reference in New Issue
Block a user