feat(window): add window lifecycle and launch endpoints
All checks were successful
python-syntax / syntax-check (push) Successful in 28s

This commit is contained in:
2026-05-01 15:52:02 +02:00
parent 1429e90be2
commit 493e5499e8
4 changed files with 382 additions and 2 deletions

View File

@@ -36,6 +36,9 @@ The agent should not assume it can self-install this stack.
- `GET /displays` → detected displays in zero-based API order
- `GET /screen?screen=0` → full screenshot (JSON with base64 by default, or raw image with `asImage=true`)
- `POST /zoom?screen=0` → cropped screenshot around point/region (also supports `asImage=true`)
- `GET /windows` → discover visible desktop windows and their handles/processes
- `POST /windows/action` → focus/restore/minimize/maximize/close a matched window
- `POST /launch` → start an app/process without dropping to a shell
- `POST /ocr` → text extraction with bounding boxes from full screen, region, or provided image bytes
- `POST /action?screen=0` → single interaction (`move`, `click`, `scroll`, `type`, `hotkey`, ...)
- `POST /batch?screen=0` → sequential action list
@@ -123,11 +126,11 @@ Prefer structured GUI control first:
- `/action` or `/batch` to interact
Use `/exec` only when it is the cleanest available tool for the job, for example:
- launching an app that is not already visible
- querying machine state that the GUI does not expose well
- performing an explicit user-requested shell/system task
- recovering from a blocked GUI flow when normal interaction failed
Prefer `GET /windows`, `POST /windows/action`, and `POST /launch` for app lifecycle tasks before falling back to `/exec`.
Avoid using `/exec` for routine in-app clicks, menu navigation, or text entry when the GUI can be driven directly.
## Core workflow (mandatory)