Archived

This repository has been archived on 2026-05-20. You can view files and clone it. You cannot open issues or pull requests or push a commit.

Files

Space-Banane 775c188732

python-syntax / syntax-check (push) Successful in 1m33s

Details

Support multi-display screen selection

2026-04-29 21:52:01 +02:00

1.5 KiB

Raw Blame History

Coordinate System

All interactions ultimately execute in global desktop pixel coordinates.

Use GET /displays to list available displays. Visual endpoints accept ?screen=X where X is a zero-based display index. screen=0 is the primary display when detectable, falling back to the first monitor reported by the capture backend. Invalid screen values fall back to 0.

Regions

Visual endpoints return a region object:

{"x": 0, "y": 0, "width": 1920, "height": 1080}

This describes where the image sits in global desktop space.

For a second display to the right of the primary display, GET /screen?screen=1 might return:

{"x": 1920, "y": 0, "width": 1920, "height": 1080}

Grid indexing

Rows/cols are zero-based
Cell (row=0, col=0) is top-left
Each cell has:
- cell_width = region.width / cols
- cell_height = region.height / rows

Cell center formula

Given (row, col, dx, dy) where dx,dy ∈ [-1,1]:

x = region.x + ((col + 0.5 + dx*0.5) * cell_width)
y = region.y + ((row + 0.5 + dy*0.5) * cell_height)

Interpretation:

dx = -1 -> left edge of cell
dx = 0 -> center
dx = 1 -> right edge
same concept for dy

Recommended agent loop

Capture /screen?screen=0 with coarse grid, or choose another display with /screen?screen=1
Find candidate cell
If uncertain, use /zoom around candidate
Convert target to grid action
Execute /action
Re-capture and verify

1.5 KiB Raw Blame History