Support multi-display screen selection

2026-04-29 21:52:01 +02:00
parent a8f2e01bb9
commit 775c188732
6 changed files with 170 additions and 33 deletions
--- a/docs/coordinate-system.md
+++ b/docs/coordinate-system.md
@@ -1,6 +1,8 @@
 # Coordinate System

-All interactions ultimately execute in **global pixel coordinates** of the primary monitor.
+All interactions ultimately execute in **global desktop pixel coordinates**.
+
+Use `GET /displays` to list available displays. Visual endpoints accept `?screen=X` where `X` is a zero-based display index. `screen=0` is the primary display when detectable, falling back to the first monitor reported by the capture backend. Invalid screen values fall back to `0`.

 ## Regions

@@ -12,6 +14,12 @@ Visual endpoints return a `region` object:

 This describes where the image sits in global desktop space.

+For a second display to the right of the primary display, `GET /screen?screen=1` might return:
+
+```json
+{"x": 1920, "y": 0, "width": 1920, "height": 1080}
+```
+
 ## Grid indexing

 - Rows/cols are **zero-based**
@@ -35,7 +43,7 @@ Interpretation:

 ## Recommended agent loop

-1. Capture `/screen` with coarse grid
+1. Capture `/screen?screen=0` with coarse grid, or choose another display with `/screen?screen=1`
 2. Find candidate cell
 3. If uncertain, use `/zoom` around candidate
 4. Convert target to grid action