Support multi-display screen selection
All checks were successful
python-syntax / syntax-check (push) Successful in 1m33s

This commit is contained in:
Space-Banane
2026-04-29 21:52:01 +02:00
parent a8f2e01bb9
commit 775c188732
6 changed files with 170 additions and 33 deletions

View File

@@ -6,6 +6,7 @@ Let an Agent interact with your computer over HTTP, with grid-aware screenshots
- **Visual endpoints**: full-screen capture with optional grid overlay and labeled cells (`asImage=true` can return raw image bytes)
- **Zoom endpoint**: crop around a point with denser grid for fine targeting (`asImage=true` supported)
- **Multi-display support**: list displays with `GET /displays` and select one with `?screen=0`, `?screen=1`, ...
- **Action endpoints**: move/click/right-click/double-click/middle-click/scroll/type/hotkey
- **OCR endpoint**: extract text blocks with bounding boxes via `POST /ocr`
- **Command execution endpoint**: run PowerShell/Bash/CMD commands via `POST /exec`
@@ -30,11 +31,12 @@ For OCR support, install the native `tesseract` binary on the host (in addition
## Minimal API flow
1. `GET /screen` with grid
2. Decide cell / target
3. Optional `POST /zoom` for finer targeting
4. `POST /action` to execute
5. `GET /screen` again to verify result
1. `GET /displays` if you need a non-primary monitor
2. `GET /screen?screen=0` with grid
3. Decide cell / target
4. Optional `POST /zoom?screen=0` for finer targeting
5. `POST /action?screen=0` to execute
6. `GET /screen?screen=0` again to verify result
See:
- `docs/API.md`