feat: bootstrap clickthrough server, skill docs, and syntax CI
All checks were successful
python-syntax / syntax-check (push) Successful in 29s
All checks were successful
python-syntax / syntax-check (push) Successful in 29s
This commit is contained in:
153
docs/API.md
Normal file
153
docs/API.md
Normal file
@@ -0,0 +1,153 @@
|
||||
# API Reference (v0.1)
|
||||
|
||||
Base URL: `http://127.0.0.1:8123`
|
||||
|
||||
If `CLICKTHROUGH_TOKEN` is set, include header:
|
||||
|
||||
```http
|
||||
x-clickthrough-token: <token>
|
||||
```
|
||||
|
||||
## `GET /health`
|
||||
|
||||
Returns status and runtime safety flags.
|
||||
|
||||
## `GET /screen`
|
||||
|
||||
Query params:
|
||||
|
||||
- `with_grid` (bool, default `true`)
|
||||
- `grid_rows` (int, default env or `12`)
|
||||
- `grid_cols` (int, default env or `12`)
|
||||
- `include_labels` (bool, default `true`)
|
||||
- `image_format` (`png`|`jpeg`, default `png`)
|
||||
- `jpeg_quality` (1-100, default `85`)
|
||||
|
||||
Response includes base64 image and metadata (`meta.region`, optional `meta.grid`).
|
||||
|
||||
## `POST /zoom`
|
||||
|
||||
Body:
|
||||
|
||||
```json
|
||||
{
|
||||
"center_x": 1200,
|
||||
"center_y": 700,
|
||||
"width": 500,
|
||||
"height": 350,
|
||||
"with_grid": true,
|
||||
"grid_rows": 20,
|
||||
"grid_cols": 20,
|
||||
"include_labels": true,
|
||||
"image_format": "png",
|
||||
"jpeg_quality": 90
|
||||
}
|
||||
```
|
||||
|
||||
Returns cropped image + region metadata in global pixel coordinates.
|
||||
|
||||
## `POST /action`
|
||||
|
||||
Body: one action.
|
||||
|
||||
### Pointer target modes
|
||||
|
||||
#### Pixel target
|
||||
|
||||
```json
|
||||
{
|
||||
"mode": "pixel",
|
||||
"x": 100,
|
||||
"y": 200,
|
||||
"dx": 0,
|
||||
"dy": 0
|
||||
}
|
||||
```
|
||||
|
||||
#### Grid target
|
||||
|
||||
```json
|
||||
{
|
||||
"mode": "grid",
|
||||
"region_x": 0,
|
||||
"region_y": 0,
|
||||
"region_width": 1920,
|
||||
"region_height": 1080,
|
||||
"rows": 12,
|
||||
"cols": 12,
|
||||
"row": 5,
|
||||
"col": 9,
|
||||
"dx": 0.0,
|
||||
"dy": 0.0
|
||||
}
|
||||
```
|
||||
|
||||
`dx`/`dy` are normalized offsets in `[-1, 1]` inside the selected cell.
|
||||
|
||||
### Action examples
|
||||
|
||||
Click:
|
||||
|
||||
```json
|
||||
{
|
||||
"action": "click",
|
||||
"target": {
|
||||
"mode": "grid",
|
||||
"region_x": 0,
|
||||
"region_y": 0,
|
||||
"region_width": 1920,
|
||||
"region_height": 1080,
|
||||
"rows": 12,
|
||||
"cols": 12,
|
||||
"row": 7,
|
||||
"col": 3,
|
||||
"dx": 0.2,
|
||||
"dy": -0.1
|
||||
},
|
||||
"clicks": 1,
|
||||
"button": "left"
|
||||
}
|
||||
```
|
||||
|
||||
Scroll:
|
||||
|
||||
```json
|
||||
{
|
||||
"action": "scroll",
|
||||
"target": {"mode": "pixel", "x": 1300, "y": 740},
|
||||
"scroll_amount": -500
|
||||
}
|
||||
```
|
||||
|
||||
Type text:
|
||||
|
||||
```json
|
||||
{
|
||||
"action": "type",
|
||||
"text": "hello world",
|
||||
"interval_ms": 20
|
||||
}
|
||||
```
|
||||
|
||||
Hotkey:
|
||||
|
||||
```json
|
||||
{
|
||||
"action": "hotkey",
|
||||
"keys": ["ctrl", "l"]
|
||||
}
|
||||
```
|
||||
|
||||
## `POST /batch`
|
||||
|
||||
Runs multiple `action` payloads sequentially.
|
||||
|
||||
```json
|
||||
{
|
||||
"actions": [
|
||||
{"action": "move", "target": {"mode": "pixel", "x": 100, "y": 100}},
|
||||
{"action": "click", "target": {"mode": "pixel", "x": 100, "y": 100}}
|
||||
],
|
||||
"stop_on_error": true
|
||||
}
|
||||
```
|
||||
43
docs/coordinate-system.md
Normal file
43
docs/coordinate-system.md
Normal file
@@ -0,0 +1,43 @@
|
||||
# Coordinate System
|
||||
|
||||
All interactions ultimately execute in **global pixel coordinates** of the primary monitor.
|
||||
|
||||
## Regions
|
||||
|
||||
Visual endpoints return a `region` object:
|
||||
|
||||
```json
|
||||
{"x": 0, "y": 0, "width": 1920, "height": 1080}
|
||||
```
|
||||
|
||||
This describes where the image sits in global desktop space.
|
||||
|
||||
## Grid indexing
|
||||
|
||||
- Rows/cols are **zero-based**
|
||||
- Cell `(row=0, col=0)` is top-left
|
||||
- Each cell has:
|
||||
- `cell_width = region.width / cols`
|
||||
- `cell_height = region.height / rows`
|
||||
|
||||
## Cell center formula
|
||||
|
||||
Given `(row, col, dx, dy)` where `dx,dy ∈ [-1,1]`:
|
||||
|
||||
- `x = region.x + ((col + 0.5 + dx*0.5) * cell_width)`
|
||||
- `y = region.y + ((row + 0.5 + dy*0.5) * cell_height)`
|
||||
|
||||
Interpretation:
|
||||
- `dx = -1` -> left edge of cell
|
||||
- `dx = 0` -> center
|
||||
- `dx = 1` -> right edge
|
||||
- same concept for `dy`
|
||||
|
||||
## Recommended agent loop
|
||||
|
||||
1. Capture `/screen` with coarse grid
|
||||
2. Find candidate cell
|
||||
3. If uncertain, use `/zoom` around candidate
|
||||
4. Convert target to grid action
|
||||
5. Execute `/action`
|
||||
6. Re-capture and verify
|
||||
Reference in New Issue
Block a user