feat(window): add window lifecycle and launch endpoints
All checks were successful
python-syntax / syntax-check (push) Successful in 28s
All checks were successful
python-syntax / syntax-check (push) Successful in 28s
This commit is contained in:
@@ -8,6 +8,8 @@ Let an Agent interact with your computer over HTTP, with grid-aware screenshots
|
||||
- **Zoom endpoint**: crop around a point with denser grid for fine targeting (`asImage=true` supported)
|
||||
- **Multi-display support**: list displays with `GET /displays` and select one with `?screen=0`, `?screen=1`, ...
|
||||
- **Action endpoints**: move/click/right-click/double-click/middle-click/scroll/type/hotkey
|
||||
- **Window lifecycle endpoints**: list/focus/restore/minimize/maximize/close windows via `GET /windows` + `POST /windows/action`
|
||||
- **Structured launch endpoint**: start an app/process without dropping to a shell via `POST /launch`
|
||||
- **OCR endpoint**: extract text blocks with bounding boxes via `POST /ocr`
|
||||
- **Command execution endpoint**: run PowerShell/Bash/CMD commands via `POST /exec`
|
||||
- **Coordinate transform metadata** in visual responses so agents can map grid cells to real pixels
|
||||
@@ -41,7 +43,7 @@ For OCR support, install the native `tesseract` binary on the host (in addition
|
||||
Important:
|
||||
- `POST /action` expects an `action` plus a `target` object; do not send raw top-level `x` / `y` fields.
|
||||
- Pixel coordinates and OCR bounding boxes are always global desktop coordinates.
|
||||
- Prefer structured GUI interaction first; use `/exec` for launch, recovery, or explicit system-level tasks.
|
||||
- Prefer structured GUI interaction first; use `/windows`, `/launch`, and `/action` before reaching for `/exec`.
|
||||
|
||||
See:
|
||||
- `docs/API.md`
|
||||
@@ -67,6 +69,8 @@ Environment variables:
|
||||
- `CLICKTHROUGH_EXEC_MAX_OUTPUT_CHARS` (default `20000`)
|
||||
- `CLICKTHROUGH_TESSERACT_CMD` (optional path to the `tesseract` executable)
|
||||
|
||||
Window management endpoints currently target Windows hosts. On non-Windows hosts they return `501` instead of guessing.
|
||||
|
||||
## Gitea CI
|
||||
|
||||
A Gitea Actions workflow is included at `.gitea/workflows/python-syntax.yml`.
|
||||
|
||||
Reference in New Issue
Block a user