# Coordinate System All interactions ultimately execute in **global pixel coordinates** of the primary monitor. ## Regions Visual endpoints return a `region` object: ```json {"x": 0, "y": 0, "width": 1920, "height": 1080} ``` This describes where the image sits in global desktop space. ## Grid indexing - Rows/cols are **zero-based** - Cell `(row=0, col=0)` is top-left - Each cell has: - `cell_width = region.width / cols` - `cell_height = region.height / rows` ## Cell center formula Given `(row, col, dx, dy)` where `dx,dy ∈ [-1,1]`: - `x = region.x + ((col + 0.5 + dx*0.5) * cell_width)` - `y = region.y + ((row + 0.5 + dy*0.5) * cell_height)` Interpretation: - `dx = -1` -> left edge of cell - `dx = 0` -> center - `dx = 1` -> right edge - same concept for `dy` ## Recommended agent loop 1. Capture `/screen` with coarse grid 2. Find candidate cell 3. If uncertain, use `/zoom` around candidate 4. Convert target to grid action 5. Execute `/action` 6. Re-capture and verify