feat: bootstrap clickthrough server, skill docs, and syntax CI
All checks were successful
python-syntax / syntax-check (push) Successful in 29s
All checks were successful
python-syntax / syntax-check (push) Successful in 29s
This commit is contained in:
43
docs/coordinate-system.md
Normal file
43
docs/coordinate-system.md
Normal file
@@ -0,0 +1,43 @@
|
||||
# Coordinate System
|
||||
|
||||
All interactions ultimately execute in **global pixel coordinates** of the primary monitor.
|
||||
|
||||
## Regions
|
||||
|
||||
Visual endpoints return a `region` object:
|
||||
|
||||
```json
|
||||
{"x": 0, "y": 0, "width": 1920, "height": 1080}
|
||||
```
|
||||
|
||||
This describes where the image sits in global desktop space.
|
||||
|
||||
## Grid indexing
|
||||
|
||||
- Rows/cols are **zero-based**
|
||||
- Cell `(row=0, col=0)` is top-left
|
||||
- Each cell has:
|
||||
- `cell_width = region.width / cols`
|
||||
- `cell_height = region.height / rows`
|
||||
|
||||
## Cell center formula
|
||||
|
||||
Given `(row, col, dx, dy)` where `dx,dy ∈ [-1,1]`:
|
||||
|
||||
- `x = region.x + ((col + 0.5 + dx*0.5) * cell_width)`
|
||||
- `y = region.y + ((row + 0.5 + dy*0.5) * cell_height)`
|
||||
|
||||
Interpretation:
|
||||
- `dx = -1` -> left edge of cell
|
||||
- `dx = 0` -> center
|
||||
- `dx = 1` -> right edge
|
||||
- same concept for `dy`
|
||||
|
||||
## Recommended agent loop
|
||||
|
||||
1. Capture `/screen` with coarse grid
|
||||
2. Find candidate cell
|
||||
3. If uncertain, use `/zoom` around candidate
|
||||
4. Convert target to grid action
|
||||
5. Execute `/action`
|
||||
6. Re-capture and verify
|
||||
Reference in New Issue
Block a user