Support multi-display screen selection
All checks were successful
python-syntax / syntax-check (push) Successful in 1m33s
All checks were successful
python-syntax / syntax-check (push) Successful in 1m33s
This commit is contained in:
@@ -33,13 +33,20 @@ The agent should not assume it can self-install this stack.
|
||||
## Mini API map
|
||||
|
||||
- `GET /health` → server status + safety flags
|
||||
- `GET /screen` → full screenshot (JSON with base64 by default, or raw image with `asImage=true`)
|
||||
- `POST /zoom` → cropped screenshot around point/region (also supports `asImage=true`)
|
||||
- `GET /displays` → detected displays in zero-based API order
|
||||
- `GET /screen?screen=0` → full screenshot (JSON with base64 by default, or raw image with `asImage=true`)
|
||||
- `POST /zoom?screen=0` → cropped screenshot around point/region (also supports `asImage=true`)
|
||||
- `POST /ocr` → text extraction with bounding boxes from full screen, region, or provided image bytes
|
||||
- `POST /action` → single interaction (`move`, `click`, `scroll`, `type`, `hotkey`, ...)
|
||||
- `POST /batch` → sequential action list
|
||||
- `POST /action?screen=0` → single interaction (`move`, `click`, `scroll`, `type`, `hotkey`, ...)
|
||||
- `POST /batch?screen=0` → sequential action list
|
||||
- `POST /exec` → PowerShell/Bash/CMD command execution (requires configured exec secret + header)
|
||||
|
||||
### Display selection
|
||||
|
||||
- Use `GET /displays` before operating on multi-monitor systems.
|
||||
- Use `?screen=X` on `/screen`, `/zoom`, `/ocr`, `/action`, and `/batch`; invalid values fall back to `screen=0`.
|
||||
- Treat returned `region` and OCR bounding boxes as global desktop coordinates, not screen-local coordinates.
|
||||
|
||||
### OCR usage
|
||||
|
||||
- Prefer `POST /ocr` when targeting text-heavy UI (menus, labels, buttons, dialogs).
|
||||
@@ -55,7 +62,7 @@ The agent should not assume it can self-install this stack.
|
||||
|
||||
## Core workflow (mandatory)
|
||||
|
||||
1. Call `GET /screen` with coarse grid (e.g., 12x12).
|
||||
1. Call `GET /screen?screen=0` with coarse grid (e.g., 12x12), or another selected display.
|
||||
2. Identify likely target region and compute an initial confidence score.
|
||||
3. If confidence < 0.85, call `POST /zoom` with denser grid (e.g., 20x20) and re-evaluate.
|
||||
4. **Before any click**, verify target identity (OCR text/icon/location consistency).
|
||||
|
||||
Reference in New Issue
Block a user