Support multi-display screen selection

2026-04-29 21:52:01 +02:00
parent a8f2e01bb9
commit 775c188732
6 changed files with 170 additions and 33 deletions
--- a/README.md
+++ b/README.md
@@ -6,6 +6,7 @@ Let an Agent interact with your computer over HTTP, with grid-aware screenshots

 - **Visual endpoints**: full-screen capture with optional grid overlay and labeled cells (`asImage=true` can return raw image bytes)
 - **Zoom endpoint**: crop around a point with denser grid for fine targeting (`asImage=true` supported)
+- **Multi-display support**: list displays with `GET /displays` and select one with `?screen=0`, `?screen=1`, ...
 - **Action endpoints**: move/click/right-click/double-click/middle-click/scroll/type/hotkey
 - **OCR endpoint**: extract text blocks with bounding boxes via `POST /ocr`
 - **Command execution endpoint**: run PowerShell/Bash/CMD commands via `POST /exec`
@@ -30,11 +31,12 @@ For OCR support, install the native `tesseract` binary on the host (in addition

 ## Minimal API flow

-1. `GET /screen` with grid
-2. Decide cell / target
-3. Optional `POST /zoom` for finer targeting
-4. `POST /action` to execute
-5. `GET /screen` again to verify result
+1. `GET /displays` if you need a non-primary monitor
+2. `GET /screen?screen=0` with grid
+3. Decide cell / target
+4. Optional `POST /zoom?screen=0` for finer targeting
+5. `POST /action?screen=0` to execute
+6. `GET /screen?screen=0` again to verify result

 See:
 - `docs/API.md`