Support multi-display screen selection
All checks were successful
python-syntax / syntax-check (push) Successful in 1m33s
All checks were successful
python-syntax / syntax-check (push) Successful in 1m33s
This commit is contained in:
12
README.md
12
README.md
@@ -6,6 +6,7 @@ Let an Agent interact with your computer over HTTP, with grid-aware screenshots
|
||||
|
||||
- **Visual endpoints**: full-screen capture with optional grid overlay and labeled cells (`asImage=true` can return raw image bytes)
|
||||
- **Zoom endpoint**: crop around a point with denser grid for fine targeting (`asImage=true` supported)
|
||||
- **Multi-display support**: list displays with `GET /displays` and select one with `?screen=0`, `?screen=1`, ...
|
||||
- **Action endpoints**: move/click/right-click/double-click/middle-click/scroll/type/hotkey
|
||||
- **OCR endpoint**: extract text blocks with bounding boxes via `POST /ocr`
|
||||
- **Command execution endpoint**: run PowerShell/Bash/CMD commands via `POST /exec`
|
||||
@@ -30,11 +31,12 @@ For OCR support, install the native `tesseract` binary on the host (in addition
|
||||
|
||||
## Minimal API flow
|
||||
|
||||
1. `GET /screen` with grid
|
||||
2. Decide cell / target
|
||||
3. Optional `POST /zoom` for finer targeting
|
||||
4. `POST /action` to execute
|
||||
5. `GET /screen` again to verify result
|
||||
1. `GET /displays` if you need a non-primary monitor
|
||||
2. `GET /screen?screen=0` with grid
|
||||
3. Decide cell / target
|
||||
4. Optional `POST /zoom?screen=0` for finer targeting
|
||||
5. `POST /action?screen=0` to execute
|
||||
6. `GET /screen?screen=0` again to verify result
|
||||
|
||||
See:
|
||||
- `docs/API.md`
|
||||
|
||||
Reference in New Issue
Block a user