999 B
999 B
Coordinate System
All interactions ultimately execute in global pixel coordinates of the primary monitor.
Regions
Visual endpoints return a region object:
{"x": 0, "y": 0, "width": 1920, "height": 1080}
This describes where the image sits in global desktop space.
Grid indexing
- Rows/cols are zero-based
- Cell
(row=0, col=0)is top-left - Each cell has:
cell_width = region.width / colscell_height = region.height / rows
Cell center formula
Given (row, col, dx, dy) where dx,dy ∈ [-1,1]:
x = region.x + ((col + 0.5 + dx*0.5) * cell_width)y = region.y + ((row + 0.5 + dy*0.5) * cell_height)
Interpretation:
dx = -1-> left edge of celldx = 0-> centerdx = 1-> right edge- same concept for
dy
Recommended agent loop
- Capture
/screenwith coarse grid - Find candidate cell
- If uncertain, use
/zoomaround candidate - Convert target to grid action
- Execute
/action - Re-capture and verify