This repository has been archived on 2026-05-20. You can view files and clone it. You cannot open issues or pull requests or push a commit.
Files
clickthrough/docs/coordinate-system.md
Space-Banane 775c188732
All checks were successful
python-syntax / syntax-check (push) Successful in 1m33s
Support multi-display screen selection
2026-04-29 21:52:01 +02:00

52 lines
1.5 KiB
Markdown

# Coordinate System
All interactions ultimately execute in **global desktop pixel coordinates**.
Use `GET /displays` to list available displays. Visual endpoints accept `?screen=X` where `X` is a zero-based display index. `screen=0` is the primary display when detectable, falling back to the first monitor reported by the capture backend. Invalid screen values fall back to `0`.
## Regions
Visual endpoints return a `region` object:
```json
{"x": 0, "y": 0, "width": 1920, "height": 1080}
```
This describes where the image sits in global desktop space.
For a second display to the right of the primary display, `GET /screen?screen=1` might return:
```json
{"x": 1920, "y": 0, "width": 1920, "height": 1080}
```
## Grid indexing
- Rows/cols are **zero-based**
- Cell `(row=0, col=0)` is top-left
- Each cell has:
- `cell_width = region.width / cols`
- `cell_height = region.height / rows`
## Cell center formula
Given `(row, col, dx, dy)` where `dx,dy ∈ [-1,1]`:
- `x = region.x + ((col + 0.5 + dx*0.5) * cell_width)`
- `y = region.y + ((row + 0.5 + dy*0.5) * cell_height)`
Interpretation:
- `dx = -1` -> left edge of cell
- `dx = 0` -> center
- `dx = 1` -> right edge
- same concept for `dy`
## Recommended agent loop
1. Capture `/screen?screen=0` with coarse grid, or choose another display with `/screen?screen=1`
2. Find candidate cell
3. If uncertain, use `/zoom` around candidate
4. Convert target to grid action
5. Execute `/action`
6. Re-capture and verify