feat(window): add window lifecycle and launch endpoints
All checks were successful
python-syntax / syntax-check (push) Successful in 28s
All checks were successful
python-syntax / syntax-check (push) Successful in 28s
This commit is contained in:
@@ -36,6 +36,9 @@ The agent should not assume it can self-install this stack.
|
||||
- `GET /displays` → detected displays in zero-based API order
|
||||
- `GET /screen?screen=0` → full screenshot (JSON with base64 by default, or raw image with `asImage=true`)
|
||||
- `POST /zoom?screen=0` → cropped screenshot around point/region (also supports `asImage=true`)
|
||||
- `GET /windows` → discover visible desktop windows and their handles/processes
|
||||
- `POST /windows/action` → focus/restore/minimize/maximize/close a matched window
|
||||
- `POST /launch` → start an app/process without dropping to a shell
|
||||
- `POST /ocr` → text extraction with bounding boxes from full screen, region, or provided image bytes
|
||||
- `POST /action?screen=0` → single interaction (`move`, `click`, `scroll`, `type`, `hotkey`, ...)
|
||||
- `POST /batch?screen=0` → sequential action list
|
||||
@@ -123,11 +126,11 @@ Prefer structured GUI control first:
|
||||
- `/action` or `/batch` to interact
|
||||
|
||||
Use `/exec` only when it is the cleanest available tool for the job, for example:
|
||||
- launching an app that is not already visible
|
||||
- querying machine state that the GUI does not expose well
|
||||
- performing an explicit user-requested shell/system task
|
||||
- recovering from a blocked GUI flow when normal interaction failed
|
||||
|
||||
Prefer `GET /windows`, `POST /windows/action`, and `POST /launch` for app lifecycle tasks before falling back to `/exec`.
|
||||
Avoid using `/exec` for routine in-app clicks, menu navigation, or text entry when the GUI can be driven directly.
|
||||
|
||||
## Core workflow (mandatory)
|
||||
|
||||
Reference in New Issue
Block a user