This repository has been archived on 2026-05-20. You can view files and clone it. You cannot open issues or pull requests or push a commit.
Files
clickthrough/docs/coordinate-system.md
Space-Banane 775c188732
All checks were successful
python-syntax / syntax-check (push) Successful in 1m33s
Support multi-display screen selection
2026-04-29 21:52:01 +02:00

1.5 KiB

Coordinate System

All interactions ultimately execute in global desktop pixel coordinates.

Use GET /displays to list available displays. Visual endpoints accept ?screen=X where X is a zero-based display index. screen=0 is the primary display when detectable, falling back to the first monitor reported by the capture backend. Invalid screen values fall back to 0.

Regions

Visual endpoints return a region object:

{"x": 0, "y": 0, "width": 1920, "height": 1080}

This describes where the image sits in global desktop space.

For a second display to the right of the primary display, GET /screen?screen=1 might return:

{"x": 1920, "y": 0, "width": 1920, "height": 1080}

Grid indexing

  • Rows/cols are zero-based
  • Cell (row=0, col=0) is top-left
  • Each cell has:
    • cell_width = region.width / cols
    • cell_height = region.height / rows

Cell center formula

Given (row, col, dx, dy) where dx,dy ∈ [-1,1]:

  • x = region.x + ((col + 0.5 + dx*0.5) * cell_width)
  • y = region.y + ((row + 0.5 + dy*0.5) * cell_height)

Interpretation:

  • dx = -1 -> left edge of cell
  • dx = 0 -> center
  • dx = 1 -> right edge
  • same concept for dy
  1. Capture /screen?screen=0 with coarse grid, or choose another display with /screen?screen=1
  2. Find candidate cell
  3. If uncertain, use /zoom around candidate
  4. Convert target to grid action
  5. Execute /action
  6. Re-capture and verify