# Coordinate System

All interactions ultimately execute in **global pixel coordinates** of the primary monitor.

## Regions

Visual endpoints return a `region` object:

```json
{"x": 0, "y": 0, "width": 1920, "height": 1080}
```

This describes where the image sits in global desktop space.

## Grid indexing

- Rows/cols are **zero-based**
- Cell `(row=0, col=0)` is top-left
- Each cell has:
  - `cell_width = region.width / cols`
  - `cell_height = region.height / rows`

## Cell center formula

Given `(row, col, dx, dy)` where `dx,dy ∈ [-1,1]`:

- `x = region.x + ((col + 0.5 + dx*0.5) * cell_width)`
- `y = region.y + ((row + 0.5 + dy*0.5) * cell_height)`

Interpretation:
- `dx = -1` -> left edge of cell
- `dx = 0` -> center
- `dx = 1` -> right edge
- same concept for `dy`

## Recommended agent loop

1. Capture `/screen` with coarse grid
2. Find candidate cell
3. If uncertain, use `/zoom` around candidate
4. Convert target to grid action
5. Execute `/action`
6. Re-capture and verify